• Open

    AttentionHTR: Handwritten Text Recognition Based on Attention Encoder-Decoder Networks. (arXiv:2201.09390v3 [cs.CV] UPDATED)
    This work proposes an attention-based sequence-to-sequence model for handwritten word recognition and explores transfer learning for data-efficient training of HTR systems. To overcome training data scarcity, this work leverages models pre-trained on scene text images as a starting point towards tailoring the handwriting recognition models. ResNet feature extraction and bidirectional LSTM-based sequence modeling stages together form an encoder. The prediction stage consists of a decoder and a content-based attention mechanism. The effectiveness of the proposed end-to-end HTR system has been empirically evaluated on a novel multi-writer dataset Imgur5K and the IAM dataset. The experimental results evaluate the performance of the HTR framework, further supported by an in-depth analysis of the error cases. Source code and pre-trained models are available at https://github.com/dmitrijsk/AttentionHTR.  ( 2 min )
    Explaining Predictions from Machine Learning Models: Algorithms, Users, and Pedagogy. (arXiv:2209.05084v1 [cs.LG])
    Model explainability has become an important problem in machine learning (ML) due to the increased effect that algorithmic predictions have on humans. Explanations can help users understand not only why ML models make certain predictions, but also how these predictions can be changed. In this thesis, we examine the explainability of ML models from three vantage points: algorithms, users, and pedagogy, and contribute several novel solutions to the explainability problem.  ( 2 min )
    Simple and Effective Gradient-Based Tuning of Sequence-to-Sequence Models. (arXiv:2209.04683v1 [cs.CL])
    Recent trends towards training ever-larger language models have substantially improved machine learning performance across linguistic tasks. However, the huge cost of training larger models can make tuning them prohibitively expensive, motivating the study of more efficient methods. Gradient-based hyper-parameter optimization offers the capacity to tune hyper-parameters during training, yet has not previously been studied in a sequence-to-sequence setting. We apply a simple and general gradient-based hyperparameter optimization method to sequence-to-sequence tasks for the first time, demonstrating both efficiency and performance gains over strong baselines for both Neural Machine Translation and Natural Language Understanding (NLU) tasks (via T5 pretraining). For translation, we show the method generalizes across language pairs, is more efficient than Bayesian hyper-parameter optimization, and that learned schedules for some hyper-parameters can out-perform even optimal constant-valued tuning. For T5, we show that learning hyper-parameters during pretraining can improve performance across downstream NLU tasks. When learning multiple hyper-parameters concurrently, we show that the global learning rate can follow a schedule over training that improves performance and is not explainable by the `short-horizon bias' of greedy methods \citep{wu2018}. We release the code used to facilitate further research.  ( 2 min )
    Data-driven, multi-moment fluid modeling of Landau damping. (arXiv:2209.04726v1 [physics.plasm-ph])
    Deriving governing equations of complex physical systems based on first principles can be quite challenging when there are certain unknown terms and hidden physical mechanisms in the systems. In this work, we apply a deep learning architecture to learn fluid partial differential equations (PDEs) of a plasma system based on the data acquired from a fully kinetic model. The learned multi-moment fluid PDEs are demonstrated to incorporate kinetic effects such as Landau damping. Based on the learned fluid closure, the data-driven, multi-moment fluid modeling can well reproduce all the physical quantities derived from the fully kinetic model. The calculated damping rate of Landau damping is consistent with both the fully kinetic simulation and the linear theory. The data-driven fluid modeling of PDEs for complex physical systems may be applied to improve fluid closure and reduce the computational cost of multi-scale modeling of global systems.  ( 2 min )
    Resisting Deep Learning Models Against Adversarial Attack Transferability via Feature Randomization. (arXiv:2209.04930v1 [cs.CR])
    In the past decades, the rise of artificial intelligence has given us the capabilities to solve the most challenging problems in our day-to-day lives, such as cancer prediction and autonomous navigation. However, these applications might not be reliable if not secured against adversarial attacks. In addition, recent works demonstrated that some adversarial examples are transferable across different models. Therefore, it is crucial to avoid such transferability via robust models that resist adversarial manipulations. In this paper, we propose a feature randomization-based approach that resists eight adversarial attacks targeting deep learning models in the testing phase. Our novel approach consists of changing the training strategy in the target network classifier and selecting random feature samples. We consider the attacker with a Limited-Knowledge and Semi-Knowledge conditions to undertake the most prevalent types of adversarial attacks. We evaluate the robustness of our approach using the well-known UNSW-NB15 datasets that include realistic and synthetic attacks. Afterward, we demonstrate that our strategy outperforms the existing state-of-the-art approach, such as the Most Powerful Attack, which consists of fine-tuning the network model against specific adversarial attacks. Finally, our experimental results show that our methodology can secure the target network and resists adversarial attack transferability by over 60%.  ( 2 min )
    Phantom Sponges: Exploiting Non-Maximum Suppression to Attack Deep Object Detectors. (arXiv:2205.13618v2 [cs.CV] UPDATED)
    Adversarial attacks against deep learning-based object detectors have been studied extensively in the past few years. Most of the attacks proposed have targeted the model's integrity (i.e., caused the model to make incorrect predictions), while adversarial attacks targeting the model's availability, a critical aspect in safety-critical domains such as autonomous driving, have not yet been explored by the machine learning research community. In this paper, we propose a novel attack that negatively affects the decision latency of an end-to-end object detection pipeline. We craft a universal adversarial perturbation (UAP) that targets a widely used technique integrated in many object detector pipelines -- non-maximum suppression (NMS). Our experiments demonstrate the proposed UAP's ability to increase the processing time of individual frames by adding "phantom" objects that overload the NMS algorithm while preserving the detection of the original objects (which allows the attack to go undetected for a longer period of time).  ( 2 min )
    An adaptive music generation architecture for games based on the deep learning Transformer mode. (arXiv:2207.01698v2 [cs.SD] UPDATED)
    This paper presents an architecture for generating music for video games based on the Transformer deep learning model. Our motivation is to be able to customize the generation according to the taste of the player, who can select a corpus of training examples, corresponding to his preferred musical style. The system generates various musical layers, following the standard layering strategy currently used by composers designing video game music. To adapt the music generated to the game play and to the player(s) situation, we are using an arousal-valence model of emotions, in order to control the selection of musical layers. We discuss current limitations and prospects for the future, such as collaborative and interactive control of the musical components.  ( 2 min )
    Affinity-VAE for disentanglement, clustering and classification of objects in multidimensional image data. (arXiv:2209.04517v1 [cs.CV])
    In this work we present affinity-VAE: a framework for automatic clustering and classification of objects in multidimensional image data based on their similarity. The method expands on the concept of $\beta$-VAEs with an informed similarity-based loss component driven by an affinity matrix. The affinity-VAE is able to create rotationally-invariant, morphologically homogeneous clusters in the latent representation, with improved cluster separation compared with a standard $\beta$-VAE. We explore the extent of latent disentanglement and continuity of the latent spaces on both 2D and 3D image data, including simulated biological electron cryo-tomography (cryo-ET) volumes as an example of a scientific application.  ( 2 min )
    Class-Incremental Learning with Strong Pre-trained Models. (arXiv:2204.03634v2 [cs.CV] UPDATED)
    Class-incremental learning (CIL) has been widely studied under the setting of starting from a small number of classes (base classes). Instead, we explore an understudied real-world setting of CIL that starts with a strong model pre-trained on a large number of base classes. We hypothesize that a strong base model can provide a good representation for novel classes and incremental learning can be done with small adaptations. We propose a 2-stage training scheme, i) feature augmentation -- cloning part of the backbone and fine-tuning it on the novel data, and ii) fusion -- combining the base and novel classifiers into a unified classifier. Experiments show that the proposed method significantly outperforms state-of-the-art CIL methods on the large-scale ImageNet dataset (e.g. +10% overall accuracy than the best). We also propose and analyze understudied practical CIL scenarios, such as base-novel overlap with distribution shift. Our proposed method is robust and generalizes to all analyzed CIL settings. Code is available at https://github.com/amazon-research/sp-cil.  ( 2 min )
    On The Computational Complexity of Self-Attention. (arXiv:2209.04881v1 [cs.LG])
    Transformer architectures have led to remarkable progress in many state-of-art applications. However, despite their successes, modern transformers rely on the self-attention mechanism, whose time- and space-complexity is quadratic in the length of the input. Several approaches have been proposed to speed up self-attention mechanisms to achieve sub-quadratic running time; however, the large majority of these works are not accompanied by rigorous error guarantees. In this work, we establish lower bounds on the computational complexity of self-attention in a number of scenarios. We prove that the time complexity of self-attention is necessarily quadratic in the input length, unless the Strong Exponential Time Hypothesis (SETH) is false. This argument holds even if the attention computation is performed only approximately, and for a variety of attention mechanisms. As a complement to our lower bounds, we show that it is indeed possible to approximate dot-product self-attention using finite Taylor series in linear-time, at the cost of having an exponential dependence on the polynomial order.  ( 2 min )
    Data-Driven Blind Synchronization and Interference Rejection for Digital Communication Signals. (arXiv:2209.04871v1 [eess.SP])
    We study the potential of data-driven deep learning methods for separation of two communication signals from an observation of their mixture. In particular, we assume knowledge on the generation process of one of the signals, dubbed signal of interest (SOI), and no knowledge on the generation process of the second signal, referred to as interference. This form of the single-channel source separation problem is also referred to as interference rejection. We show that capturing high-resolution temporal structures (nonstationarities), which enables accurate synchronization to both the SOI and the interference, leads to substantial performance gains. With this key insight, we propose a domain-informed neural network (NN) design that is able to improve upon both "off-the-shelf" NNs and classical detection and interference rejection methods, as demonstrated in our simulations. Our findings highlight the key role communication-specific domain knowledge plays in the development of data-driven approaches that hold the promise of unprecedented gains.  ( 2 min )
    Explaining Results of Multi-Criteria Decision Making. (arXiv:2209.04582v1 [cs.AI])
    We introduce a method for explaining the results of various linear and hierarchical multi-criteria decision-making (MCDM) techniques such as WSM and AHP. The two key ideas are (A) to maintain a fine-grained representation of the values manipulated by these techniques and (B) to derive explanations from these representations through merging, filtering, and aggregating operations. An explanation in our model presents a high-level comparison of two alternatives in an MCDM problem, presumably an optimal and a non-optimal one, illuminating why one alternative was preferred over the other one. We show the usefulness of our techniques by generating explanations for two well-known examples from the MCDM literature. Finally, we show their efficacy by performing computational experiments.  ( 2 min )
    Instruction-driven history-aware policies for robotic manipulations. (arXiv:2209.04899v1 [cs.RO])
    In human environments, robots are expected to accomplish a variety of manipulation tasks given simple natural language instructions. Yet, robotic manipulation is extremely challenging as it requires fine-grained motor control, long-term memory as well as generalization to previously unseen tasks and environments. To address these challenges, we propose a unified transformer-based approach that takes into account multiple inputs. In particular, our transformer architecture integrates (i) natural language instructions and (ii) multi-view scene observations while (iii) keeping track of the full history of observations and actions. Such an approach enables learning dependencies between history and instructions and improves manipulation precision using multiple views. We evaluate our method on the challenging RLBench benchmark and on a real-world robot. Notably, our approach scales to 74 diverse RLBench tasks and outperforms the state of the art. We also address instruction-conditioned tasks and demonstrate excellent generalization to previously unseen variations.  ( 2 min )
    A Thermal Machine Learning Solver For Chip Simulation. (arXiv:2209.04741v1 [cs.LG])
    Thermal analysis provides deeper insights into electronic chips behavior under different temperature scenarios and enables faster design exploration. However, obtaining detailed and accurate thermal profile on chip is very time-consuming using FEM or CFD. Therefore, there is an urgent need for speeding up the on-chip thermal solution to address various system scenarios. In this paper, we propose a thermal machine-learning (ML) solver to speed-up thermal simulations of chips. The thermal ML-Solver is an extension of the recent novel approach, CoAEMLSim (Composable Autoencoder Machine Learning Simulator) with modifications to the solution algorithm to handle constant and distributed HTC. The proposed method is validated against commercial solvers, such as Ansys MAPDL, as well as a latest ML baseline, UNet, under different scenarios to demonstrate its enhanced accuracy, scalability, and generalizability.  ( 2 min )
    Learning Consumer Preferences from Bundle Sales Data. (arXiv:2209.04942v1 [stat.ML])
    Product bundling is a common selling mechanism used in online retailing. To set profitable bundle prices, the seller needs to learn consumer preferences from the transaction data. When customers purchase bundles or multiple products, classical methods such as discrete choice models cannot be used to estimate customers' valuations. In this paper, we propose an approach to learn the distribution of consumers' valuations toward the products using bundle sales data. The approach reduces it to an estimation problem where the samples are censored by polyhedral regions. Using the EM algorithm and Monte Carlo simulation, our approach can recover the distribution of consumers' valuations. The framework allows for unobserved no-purchases and clustered market segments. We provide theoretical results on the identifiability of the probability model and the convergence of the EM algorithm. The performance of the approach is also demonstrated numerically.  ( 2 min )
    Combined Pruning for Nested Cross-Validation to Accelerate Automated Hyperparameter Optimization for Embedded Feature Selection in High-Dimensional Data with Very Small Sample Sizes. (arXiv:2202.00598v2 [cs.LG] UPDATED)
    Background: Embedded feature selection in high-dimensional data with very small sample sizes requires optimized hyperparameters for the model building process. For this hyperparameter optimization, nested cross-validation must be applied to avoid a biased performance estimation. The resulting repeated training with high-dimensional data leads to very long computation times. Moreover, it is likely to observe a high variance in the individual performance evaluation metrics caused by outliers in tiny validation sets. Therefore, early stopping applying standard pruning algorithms to save time risks discarding promising hyperparameter sets. Result: To speed up feature selection for high-dimensional data with tiny sample size, we adapt the use of a state-of-the-art asynchronous successive halving pruner. In addition, we combine it with two complementary pruning strategies based on domain or prior knowledge. One pruning strategy immediately stops computing trials with semantically meaningless results for the selected hyperparameter combinations. The other is a new extrapolating threshold pruning strategy suitable for nested-cross-validation with a high variance of performance evaluation metrics. In repeated experiments, our combined pruning strategy keeps all promising trials. At the same time, the calculation time is substantially reduced compared to using a state-of-the-art asynchronous successive halving pruner alone. Up to 81.3\% fewer models were trained achieving the same optimization result. Conclusion: The proposed combined pruning strategy accelerates data analysis or enables deeper searches for hyperparameters within the same computation time. This leads to significant savings in time, money and energy consumption, opening the door to advanced, time-consuming analyses.  ( 3 min )
    Extended Feature Space-Based Automatic Melanoma Detection System. (arXiv:2209.04588v1 [cs.LG])
    Melanoma is the deadliest form of skin cancer. Uncontrollable growth of melanocytes leads to melanoma. Melanoma has been growing wildly in the last few decades. In recent years, the detection of melanoma using image processing techniques has become a dominant research field. The Automatic Melanoma Detection System (AMDS) helps to detect melanoma based on image processing techniques by accepting infected skin area images as input. A single lesion image is a source of multiple features. Therefore, It is crucial to select the appropriate features from the image of the lesion in order to increase the accuracy of AMDS. For melanoma detection, all extracted features are not important. Some of the extracted features are complex and require more computation tasks, which impacts the classification accuracy of AMDS. The feature extraction phase of AMDS exhibits more variability, therefore it is important to study the behaviour of AMDS using individual and extended feature extraction approaches. A novel algorithm ExtFvAMDS is proposed for the calculation of Extended Feature Vector Space. The six models proposed in the comparative study revealed that the HSV feature vector space for automatic detection of melanoma using Ensemble Bagged Tree classifier on Med-Node Dataset provided 99% AUC, 95.30% accuracy, 94.23% sensitivity, and 96.96% specificity.  ( 2 min )
    Explainable Image Quality Assessments in Teledermatological Photography. (arXiv:2209.04699v1 [cs.CV])
    Image quality is a crucial factor in the success of teledermatological consultations. However, up to 50% of images sent by patients have quality issues, thus increasing the time to diagnosis and treatment. An automated, easily deployable, explainable method for assessing image quality is necessary to improve the current teledermatological consultation flow. We introduce ImageQX, a convolutional neural network trained for image quality assessment with a learning mechanism for identifying the most common poor image quality explanations: bad framing, bad lighting, blur, low resolution, and distance issues. ImageQX was trained on 26635 photographs and validated on 9874 photographs, each annotated with image quality labels and poor image quality explanations by up to 12 board-certified dermatologists. The photographic images were taken between 2017-2019 using a mobile skin disease tracking application accessible worldwide. Our method achieves expert-level performance for both image quality assessment and poor image quality explanation. For image quality assessment, ImageQX obtains a macro F1-score of 0.73 which places it within standard deviation of the pairwise inter-rater F1-score of 0.77. For poor image quality explanations, our method obtains F1-scores of between 0.37 and 0.70, similar to the inter-rater pairwise F1-score of between 0.24 and 0.83. Moreover, with a size of only 15 MB, ImageQX is easily deployable on mobile devices. With an image quality detection performance similar to that of dermatologists, incorporating ImageQX into the teledermatology flow can reduce the image evaluation burden on dermatologists, while at the same time reducing the time to diagnosis and treatment for patients. We introduce ImageQX, a first of its kind explainable image quality assessor which leverages domain expertise to improve the quality and efficiency of dermatological care in a virtual setting.  ( 3 min )
    Batch Bayesian Optimization via Particle Gradient Flows. (arXiv:2209.04722v1 [stat.ML])
    Bayesian Optimisation (BO) methods seek to find global optima of objective functions which are only available as a black-box or are expensive to evaluate. Such methods construct a surrogate model for the objective function, quantifying the uncertainty in that surrogate through Bayesian inference. Objective evaluations are sequentially determined by maximising an acquisition function at each step. However, this ancilliary optimisation problem can be highly non-trivial to solve, due to the non-convexity of the acquisition function, particularly in the case of batch Bayesian optimisation, where multiple points are selected in every step. In this work we reformulate batch BO as an optimisation problem over the space of probability measures. We construct a new acquisition function based on multipoint expected improvement which is convex over the space of probability measures. Practical schemes for solving this `inner' optimisation problem arise naturally as gradient flows of this objective function. We demonstrate the efficacy of this new method on different benchmark functions and compare with state-of-the-art batch BO methods.  ( 2 min )
    Towards Sparsification of Graph Neural Networks. (arXiv:2209.04766v1 [cs.LG])
    As real-world graphs expand in size, larger GNN models with billions of parameters are deployed. High parameter count in such models makes training and inference on graphs expensive and challenging. To reduce the computational and memory costs of GNNs, optimization methods such as pruning the redundant nodes and edges in input graphs have been commonly adopted. However, model compression, which directly targets the sparsification of model layers, has been mostly limited to traditional Deep Neural Networks (DNNs) used for tasks such as image classification and object detection. In this paper, we utilize two state-of-the-art model compression methods (1) train and prune and (2) sparse training for the sparsification of weight layers in GNNs. We evaluate and compare the efficiency of both methods in terms of accuracy, training sparsity, and training FLOPs on real-world graphs. Our experimental results show that on the ia-email, wiki-talk, and stackoverflow datasets for link prediction, sparse training with much lower training FLOPs achieves a comparable accuracy with the train and prune method. On the brain dataset for node classification, sparse training uses a lower number FLOPs (less than 1/7 FLOPs of train and prune method) and preserves a much better accuracy performance under extreme model sparsity.  ( 2 min )
    Performance-Driven Controller Tuning via Derivative-Free Reinforcement Learning. (arXiv:2209.04854v1 [eess.SY])
    Choosing an appropriate parameter set for the designed controller is critical for the final performance but usually requires a tedious and careful tuning process, which implies a strong need for automatic tuning methods. However, among existing methods, derivative-free ones suffer from poor scalability or low efficiency, while gradient-based ones are often unavailable due to possibly non-differentiable controller structure. To resolve the issues, we tackle the controller tuning problem using a novel derivative-free reinforcement learning (RL) framework, which performs timestep-wise perturbation in parameter space during experience collection and integrates derivative-free policy updates into the advanced actor-critic RL architecture to achieve high versatility and efficiency. To demonstrate the framework's efficacy, we conduct numerical experiments on two concrete examples from autonomous driving, namely, adaptive cruise control with PID controller and trajectory tracking with MPC controller. Experimental results show that the proposed method outperforms popular baselines and highlight its strong potential for controller tuning.  ( 2 min )
    People detection and social distancing classification in smart cities for COVID-19 by using thermal images and deep learning algorithms. (arXiv:2209.04704v1 [cs.CV])
    COVID-19 is a disease caused by severe respiratory syndrome coronavirus. It was identified in December 2019 in Wuhan, China. It has resulted in an ongoing pandemic that caused infected cases including some deaths. Coronavirus is primarily spread between people during close contact. Motivating to this notion, this research proposes an artificial intelligence system for social distancing classification of persons by using thermal images. By exploiting YOLOv2 (you look at once), a deep learning detection technique is developed for detecting and tracking people in indoor and outdoor scenarios. An algorithm is also implemented for measuring and classifying the distance between persons and automatically check if social distancing rules are respected or not. Hence, this work aims at minimizing the spread of the COVID-19 virus by evaluating if and how persons comply with social distancing rules. The proposed approach is applied to images acquired through thermal cameras, to establish a complete AI system for people tracking, social distancing classification, and body temperature monitoring. The training phase is done with two datasets captured from different thermal cameras. Ground Truth Labeler app is used for labeling the persons in the images. The achieved results show that the proposed method is suitable for the creation of a smart surveillance system in smart cities for people detection, social distancing classification, and body temperature analysis.  ( 3 min )
    Delta Hedging Liquidity Positions on Automated Market Makers. (arXiv:2208.03318v2 [cs.CE] UPDATED)
    Liquidity Providers on Automated Market Makers generate millions of USD in transaction fees daily. However, the net value of a Liquidity Position is vulnerable to price changes in the underlying assets in the pool. The dominant measure of loss in a Liquidity Position is Impermanent Loss. Impermanent Loss for Constant Function Market Makers has been widely studied. We propose a new metric to measure Liquidity Position PNL based on price movement from the underlying assets. We show how this new metric more appropriately measures the change in the net value of a Liquidity Position as a function of price movement in the underlying assets. Our second contribution is an algorithm to delta hedge arbitrary Liquidity Positions on both uniform liquidity Automated Market Makers (such as Uniswap v2) and concentrated liquidity Automated Market Makers (such as Uniswap v3) via a combination of derivatives.
    Variational Autoencoder Kernel Interpretation and Selection for Classification. (arXiv:2209.04715v1 [cs.LG])
    This work proposed kernel selection approaches for probabilistic classifiers based on features produced by the convolutional encoder of a variational autoencoder. Particularly, the developed methodologies allow the selection of the most relevant subset of latent variables. In the proposed implementation, each latent variable was sampled from the distribution associated with a single kernel of the last encoder's convolution layer, as an individual distribution was created for each kernel. Therefore, choosing relevant features on the sampled latent variables makes it possible to perform kernel selection, filtering the uninformative features and kernels. Such leads to a reduction in the number of the model's parameters. Both wrapper and filter methods were evaluated for feature selection. The second was of particular relevance as it is based only on the distributions of the kernels. It was assessed by measuring the Kullback-Leibler divergence between all distributions, hypothesizing that the kernels whose distributions are more similar can be discarded. This hypothesis was confirmed since it was observed that the most similar kernels do not convey relevant information and can be removed. As a result, the proposed methodology is suitable for developing applications for resource-constrained devices.  ( 2 min )
    Fast Regression of the Tritium Breeding Ratio in Fusion Reactors. (arXiv:2104.04026v2 [physics.comp-ph] UPDATED)
    The tritium breeding ratio (TBR) is an essential quantity for the design of modern and next-generation D-T fueled nuclear fusion reactors. Representing the ratio between tritium fuel generated in breeding blankets and fuel consumed during reactor runtime, the TBR depends on reactor geometry and material properties in a complex manner. In this work, we explored the training of surrogate models to produce a cheap but high-quality approximation for a Monte Carlo TBR model in use at the UK Atomic Energy Authority. We investigated possibilities for dimensional reduction of its feature space, reviewed 9 families of surrogate models for potential applicability, and performed hyperparameter optimisation. Here we present the performance and scaling properties of these models, the fastest of which, an artificial neural network, demonstrated $R^2=0.985$ and a mean prediction time of $0.898\ \mu\mathrm{s}$, representing a relative speedup of $8\cdot 10^6$ with respect to the expensive MC model. We further present a novel adaptive sampling algorithm, Quality-Adaptive Surrogate Sampling, capable of interfacing with any of the individually studied surrogates. Our preliminary testing on a toy TBR theory has demonstrated the efficacy of this algorithm for accelerating the surrogate modelling process.
    Automatic Tuberculosis and COVID-19 cough classification using deep learning. (arXiv:2205.05480v2 [cs.LG] UPDATED)
    We present a deep learning based automatic cough classifier which can discriminate tuberculosis (TB) coughs from COVID-19 coughs and healthy coughs. Both TB and COVID-19 are respiratory diseases, contagious, have cough as a predominant symptom and claim thousands of lives each year. The cough audio recordings were collected at both indoor and outdoor settings and also uploaded using smartphones from subjects around the globe, thus containing various levels of noise. This cough data include 1.68 hours of TB coughs, 18.54 minutes of COVID-19 coughs and 1.69 hours of healthy coughs from 47 TB patients, 229 COVID-19 patients and 1498 healthy patients and were used to train and evaluate a CNN, LSTM and Resnet50. These three deep architectures were also pre-trained on 2.14 hours of sneeze, 2.91 hours of speech and 2.79 hours of noise for improved performance. The class-imbalance in our dataset was addressed by using SMOTE data balancing technique and using performance metrics such as F1-score and AUC. Our study shows that the highest F1-scores of 0.9259 and 0.8631 have been achieved from a pre-trained Resnet50 for two-class (TB vs COVID-19) and three-class (TB vs COVID-19 vs healthy) cough classification tasks, respectively. The application of deep transfer learning has improved the classifiers' performance and makes them more robust as they generalise better over the cross-validation folds. Their performances exceed the TB triage test requirements set by the world health organisation (WHO). The features producing the best performance contain higher order of MFCCs suggesting that the differences between TB and COVID-19 coughs are not perceivable by the human ear. This type of cough audio classification is non-contact, cost-effective and can easily be deployed on a smartphone, thus it can be an excellent tool for both TB and COVID-19 screening.
    Do Neural Networks Compress Manifolds Optimally?. (arXiv:2205.08518v2 [cs.IT] UPDATED)
    Artificial Neural-Network-based (ANN-based) lossy compressors have recently obtained striking results on several sources. Their success may be ascribed to an ability to identify the structure of low-dimensional manifolds in high-dimensional ambient spaces. Indeed, prior work has shown that ANN-based compressors can achieve the optimal entropy-distortion curve for some such sources. In contrast, we determine the optimal entropy-distortion tradeoffs for two low-dimensional manifolds with circular structure and show that state-of-the-art ANN-based compressors fail to optimally compress them.
    ScaleFace: Uncertainty-aware Deep Metric Learning. (arXiv:2209.01880v2 [cs.CV] UPDATED)
    The performance of modern deep learning-based systems dramatically depends on the quality of input objects. For example, face recognition quality would be lower for blurry or corrupted inputs. However, it is hard to predict the influence of input quality on the resulting accuracy in more complex scenarios. We propose an approach for deep metric learning that allows direct estimation of the uncertainty with almost no additional computational cost. The developed \textit{ScaleFace} algorithm uses trainable scale values that modify similarities in the space of embeddings. These input-dependent scale values represent a measure of confidence in the recognition result, thus allowing uncertainty estimation. We provide comprehensive experiments on face recognition tasks that show the superior performance of ScaleFace compared to other uncertainty-aware face recognition approaches. We also extend the results to the task of text-to-image retrieval showing that the proposed approach beats the competitors with significant margin.
    On the Optimization Landscape of Dynamic Output Feedback: A Case Study for Linear Quadratic Regulator. (arXiv:2209.05042v1 [cs.LG])
    The convergence of policy gradient algorithms in reinforcement learning hinges on the optimization landscape of the underlying optimal control problem. Theoretical insights into these algorithms can often be acquired from analyzing those of linear quadratic control. However, most of the existing literature only considers the optimization landscape for static full-state or output feedback policies (controllers). We investigate the more challenging case of dynamic output-feedback policies for linear quadratic regulation (abbreviated as dLQR), which is prevalent in practice but has a rather complicated optimization landscape. We first show how the dLQR cost varies with the coordinate transformation of the dynamic controller and then derive the optimal transformation for a given observable stabilizing controller. At the core of our results is the uniqueness of the stationary point of dLQR when it is observable, which is in a concise form of an observer-based controller with the optimal similarity transformation. These results shed light on designing efficient algorithms for general decision-making problems with partially observed information.
    Examining Uniqueness and Permanence of the WAY EEG GAL dataset toward User Authentication. (arXiv:2209.04802v1 [cs.LG])
    This study evaluates the discriminating capacity (uniqueness) of the EEG data from the WAY EEG GAL public dataset to authenticate individuals against one another as well as its permanence. In addition to the EEG data, Luciw et al. provide EMG (Electromyography), and kinematics data for engineers and researchers to utilize WAY EEG GAL for further studies. However, evaluating the EMG and kinematics data is outside the scope of this study. The goal of the state-of-the-art is to determine whether EEG data can be utilized to control prosthetic devices. On the other hand, this study aims to evaluate the separability of individuals through EEG data to perform user authentication. A feature importance algorithm is utilized to select the best features for each user to authenticate them against all others. The authentication platform implemented for this study is based on Machine Learning models/classifiers. As an initial test, two pilot studies are performed using Linear Discriminant Analysis (LDA) and Support Vector Machine (SVM) to observe the learning trends of the models by multi-labeling the EEG dataset. Utilizing kNN first as the classifier for user authentication, accuracy around 75% is observed. Thereafter to improve the performance both linear and non-linear SVMs are used to perform classification. The overall average accuracies of 85.18% and 86.92% are achieved using linear and non-linear SVMs respectively. In addition to accuracy, F1 scores are also calculated. The overall average F1 score of 87.51% and 88.94% are achieved for linear and non-linear SVMs respectively. Beyond the overall performance, high performing individuals with 95.3% accuracy (95.3% F1 score) using linear SVM and 97.4% accuracy (97.3% F1 score) using non-linear SVM are also observed.
    Efficiency Evaluation of Banks with Many Branches using a Heuristic Framework and Dynamic Data Envelopment Optimization Approach: A Real Case Study. (arXiv:2209.04822v1 [math.OC])
    Evaluating the efficiency of organizations and branches within an organization is a challenging issue for managers. Evaluation criteria allow organizations to rank their internal units, identify their position concerning their competitors, and implement strategies for improvement and development purposes. Among the methods that have been applied in the evaluation of bank branches, non-parametric methods have captured the attention of researchers in recent years. One of the most widely used non-parametric methods is the data envelopment analysis (DEA) which leads to promising results. However, the static DEA approaches do not consider the time in the model. Therefore, this paper uses a dynamic DEA (DDEA) method to evaluate the branches of a private Iranian bank over three years (2017-2019). The results are then compared with static DEA. After ranking the branches, they are clustered using the K-means method. Finally, a comprehensive sensitivity analysis approach is introduced to help the managers to decide about changing variables to shift a branch from one cluster to a more efficient one.
    Hyperbolic Self-supervised Contrastive Learning Based Network Anomaly Detection. (arXiv:2209.05049v1 [cs.SI])
    Anomaly detection on the attributed network has recently received increasing attention in many research fields, such as cybernetic anomaly detection and financial fraud detection. With the wide application of deep learning on graph representations, existing approaches choose to apply euclidean graph encoders as their backbone, which may lose important hierarchical information, especially in complex networks. To tackle this problem, we propose an efficient anomaly detection framework using hyperbolic self-supervised contrastive learning. Specifically, we first conduct the data augmentation by performing subgraph sampling. Then we utilize the hierarchical information in hyperbolic space through exponential mapping and logarithmic mapping and obtain the anomaly score by subtracting scores of the positive pairs from the negative pairs via a discriminating process. Finally, extensive experiments on four real-world datasets demonstrate that our approach performs superior over representative baseline approaches.
    AST-Probe: Recovering abstract syntax trees from hidden representations of pre-trained language models. (arXiv:2206.11719v2 [cs.CL] UPDATED)
    The objective of pre-trained language models is to learn contextual representations of textual data. Pre-trained language models have become mainstream in natural language processing and code modeling. Using probes, a technique to study the linguistic properties of hidden vector spaces, previous works have shown that these pre-trained language models encode simple linguistic properties in their hidden representations. However, none of the previous work assessed whether these models encode the whole grammatical structure of a programming language. In this paper, we prove the existence of a syntactic subspace, lying in the hidden representations of pre-trained language models, which contain the syntactic information of the programming language. We show that this subspace can be extracted from the models' representations and define a novel probing method, the AST-Probe, that enables recovering the whole abstract syntax tree (AST) of an input code snippet. In our experimentations, we show that this syntactic subspace exists in five state-of-the-art pre-trained language models. In addition, we highlight that the middle layers of the models are the ones that encode most of the AST information. Finally, we estimate the optimal size of this syntactic subspace and show that its dimension is substantially lower than those of the models' representation spaces. This suggests that pre-trained language models use a small part of their representation spaces to encode syntactic information of the programming languages.
    A Comparative Study on Unsupervised Anomaly Detection for Time Series: Experiments and Analysis. (arXiv:2209.04635v1 [cs.LG])
    The continued digitization of societal processes translates into a proliferation of time series data that cover applications such as fraud detection, intrusion detection, and energy management, where anomaly detection is often essential to enable reliability and safety. Many recent studies target anomaly detection for time series data. Indeed, area of time series anomaly detection is characterized by diverse data, methods, and evaluation strategies, and comparisons in existing studies consider only part of this diversity, which makes it difficult to select the best method for a particular problem setting. To address this shortcoming, we introduce taxonomies for data, methods, and evaluation strategies, provide a comprehensive overview of unsupervised time series anomaly detection using the taxonomies, and systematically evaluate and compare state-of-the-art traditional as well as deep learning techniques. In the empirical study using nine publicly available datasets, we apply the most commonly-used performance evaluation metrics to typical methods under a fair implementation standard. Based on the structuring offered by the taxonomies, we report on empirical studies and provide guidelines, in the form of comparative tables, for choosing the methods most suitable for particular application settings. Finally, we propose research directions for this dynamic field.
    Improving Model Training via Self-learned Label Representations. (arXiv:2209.04528v1 [cs.LG])
    Modern neural network architectures have shown remarkable success in several large-scale classification and prediction tasks. Part of the success of these architectures is their flexibility to transform the data from the raw input representations (e.g. pixels for vision tasks, or text for natural language processing tasks) to one-hot output encoding. While much of the work has focused on studying how the input gets transformed to the one-hot encoding, very little work has examined the effectiveness of these one-hot labels. In this work, we demonstrate that more sophisticated label representations are better for classification than the usual one-hot encoding. We propose Learning with Adaptive Labels (LwAL) algorithm, which simultaneously learns the label representation while training for the classification task. These learned labels can significantly cut down on the training time (usually by more than 50%) while often achieving better test accuracies. Our algorithm introduces negligible additional parameters and has a minimal computational overhead. Along with improved training times, our learned labels are semantically meaningful and can reveal hierarchical relationships that may be present in the data.
    Git Re-Basin: Merging Models modulo Permutation Symmetries. (arXiv:2209.04836v1 [cs.LG])
    The success of deep learning is thanks to our ability to solve certain massive non-convex optimization problems with relative ease. Despite non-convex optimization being NP-hard, simple algorithms -- often variants of stochastic gradient descent -- exhibit surprising effectiveness in fitting large neural networks in practice. We argue that neural network loss landscapes contain (nearly) a single basin, after accounting for all possible permutation symmetries of hidden units. We introduce three algorithms to permute the units of one model to bring them into alignment with units of a reference model. This transformation produces a functionally equivalent set of weights that lie in an approximately convex basin near the reference model. Experimentally, we demonstrate the single basin phenomenon across a variety of model architectures and datasets, including the first (to our knowledge) demonstration of zero-barrier linear mode connectivity between independently trained ResNet models on CIFAR-10 and CIFAR-100. Additionally, we identify intriguing phenomena relating model width and training time to mode connectivity across a variety of models and datasets. Finally, we discuss shortcomings of a single basin theory, including a counterexample to the linear mode connectivity hypothesis.
    Kernel Learning for Explainable Climate Science. (arXiv:2209.04947v1 [cs.LG])
    The Upper Indus Basin, Himalayas provides water for 270 million people and countless ecosystems. However, precipitation, a key component to hydrological modelling, is poorly understood in this area. A key challenge surrounding this uncertainty comes from the complex spatial-temporal distribution of precipitation across the basin. In this work we propose Gaussian processes with structured non-stationary kernels to model precipitation patterns in the UIB. Previous attempts to quantify or model precipitation in the Hindu Kush Karakoram Himalayan region have often been qualitative or include crude assumptions and simplifications which cannot be resolved at lower resolutions. This body of research also provides little to no error propagation. We account for the spatial variation in precipitation with a non-stationary Gibbs kernel parameterised with an input dependent lengthscale. This allows the posterior function samples to adapt to the varying precipitation patterns inherent in the distinct underlying topography of the Indus region. The input dependent lengthscale is governed by a latent Gaussian process with a stationary squared-exponential kernel to allow the function level hyperparameters to vary smoothly. In ablation experiments we motivate each component of the proposed kernel by demonstrating its ability to model the spatial covariance, temporal structure and joint spatio-temporal reconstruction. We benchmark our model with a stationary Gaussian process and a Deep Gaussian processes.
    Bilevel Optimization with a Lower-level Contraction: Optimal Sample Complexity without Warm-Start. (arXiv:2202.03397v2 [stat.ML] UPDATED)
    We analyze a general class of bilevel problems, in which the upper-level problem consists in the minimization of a smooth objective function and the lower-level problem is to find the fixed point of a smooth contraction map. This type of problems include instances of meta-learning, equilibrium models, hyperparameter optimization and data poisoning adversarial attacks. Several recent works have proposed algorithms which warm-start the lower level problem, i.e. they use the previous lower-level approximate solution as a staring point for the lower-level solver. This warm-start procedure allows one to improve the sample complexity in both the stochastic and deterministic settings, achieving in some cases the order-wise optimal sample complexity. However, there are situations, e.g., meta learning and equilibrium models, in which the warm-start procedure is not well-suited or ineffective. In this work we show that without warm-start, it is still possible to achieve order-wise optimal or near-optimal sample complexity. In particular, we propose a simple method which uses stochastic fixed point iterations at the lower-level and projected inexact gradient descent at the upper-level, that reaches an $\epsilon$-stationary point using $O(\epsilon^{-2})$ and $\tilde{O}(\epsilon^{-1})$ samples for the stochastic and the deterministic setting, respectively. Finally, compared to methods using warm-start, our approach yields a simpler analysis that does not need to study the coupled interactions between the upper-level and lower-level iterates
    What Do Deep Neural Networks Find in Disordered Structures of Glasses?. (arXiv:2208.00349v2 [cond-mat.dis-nn] UPDATED)
    Glass transitions are widely observed in various types of soft matter systems. However, the physical mechanism of these transitions remains {elusive}, despite years of ambitious research. In particular, an important unanswered question is whether the glass transition is accompanied by a divergence of the correlation lengths of the characteristic static structures. In this study, we develop a deep-neural-network-based method that is used to extract the characteristic local meso-structures solely from instantaneous {particle} configurations without any {information} about the dynamics. We first train a neural network to classify configurations of liquids and glasses correctly. Then, we obtain the characteristic structures by quantifying the grounds for the decisions made by the network using Gradient-weighted Class Activation Mapping (Grad-CAM). We considered two qualitatively different glass-forming binary systems, and through comparisons with several established structural indicators, we demonstrate that our system can be used to identify characteristic structures that depend on the details of the systems. Moreover, the extracted structures are remarkably correlated with the nonequilibrium aging dynamics in thermal fluctuations.
    Hybrid Supervised and Reinforcement Learning for the Design and Optimization of Nanophotonic Structures. (arXiv:2209.04447v1 [cs.LG])
    From higher computational efficiency to enabling the discovery of novel and complex structures, deep learning has emerged as a powerful framework for the design and optimization of nanophotonic circuits and components. However, both data-driven and exploration-based machine learning strategies have limitations in their effectiveness for nanophotonic inverse design. Supervised machine learning approaches require large quantities of training data to produce high-performance models and have difficulty generalizing beyond training data given the complexity of the design space. Unsupervised and reinforcement learning-based approaches on the other hand can have very lengthy training or optimization times associated with them. Here we demonstrate a hybrid supervised learning and reinforcement learning approach to the inverse design of nanophotonic structures and show this approach can reduce training data dependence, improve the generalizability of model predictions, and shorten exploratory training times by orders of magnitude. The presented strategy thus addresses a number of contemporary deep learning-based challenges, while opening the door for new design methodologies that leverage multiple classes of machine learning algorithms to produce more effective and practical solutions for photonic design.
    Decision Tree-Based Predictive Models for Academic Achievement Using College Students' Support Networks. (arXiv:2108.13947v2 [stat.ML] UPDATED)
    In this study, we examine a set of primary data collected from 484 students enrolled in a large public university in the Mid-Atlantic United States region during the early stages of the COVID-19 pandemic. The data, called Ties data, included students' demographic and support network information. The support network data comprised of information that highlighted the type of support, (i.e. emotional or educational; routine or intense). Using this data set, models for predicting students' academic achievement, quantified by their self-reported GPA, were created using Chi-Square Automatic Interaction Detection (CHAID), a decision tree algorithm, and cforest, a random forest algorithm that uses conditional inference trees. We compare the methods' accuracy and variation in the set of important variables suggested by each algorithm. Each algorithm found different variables important for different student demographics with some overlap. For White students, different types of educational support were important in predicting academic achievement, while for non-White students, different types of emotional support were important in predicting academic achievement. The presence of differing types of routine support were important in predicting academic achievement for cisgender women, while differing types of intense support were important in predicting academic achievement for cisgender men.
    An Extensive Data Processing Pipeline for MIMIC-IV. (arXiv:2204.13841v4 [cs.LG] UPDATED)
    An increasing amount of research is being devoted to applying machine learning methods to electronic health record (EHR) data for various clinical purposes. This growing area of research has exposed the challenges of the accessibility of EHRs. MIMIC is a popular, public, and free EHR dataset in a raw format that has been used in numerous studies. The absence of standardized pre-processing steps can be, however, a significant barrier to the wider adoption of this rare resource. Additionally, this absence can reduce the reproducibility of the developed tools and limit the ability to compare the results among similar studies. In this work, we provide a greatly customizable pipeline to extract, clean, and pre-process the data available in the fourth version of the MIMIC dataset (MIMIC-IV). The pipeline also presents an end-to-end wizard-like package supporting predictive model creations and evaluations. The pipeline covers a range of clinical prediction tasks which can be broadly classified into four categories - readmission, length of stay, mortality, and phenotype prediction. The tool is publicly available at https://github.com/healthylaife/MIMIC-IV-Data-Pipeline.
    A framework for bilevel optimization that enables stochastic and global variance reduction algorithms. (arXiv:2201.13409v2 [stat.ML] UPDATED)
    Bilevel optimization, the problem of minimizing a value function which involves the arg-minimum of another function, appears in many areas of machine learning. In a large scale empirical risk minimization setting where the number of samples is huge, it is crucial to develop stochastic methods, which only use a few samples at a time to progress. However, computing the gradient of the value function involves solving a linear system, which makes it difficult to derive unbiased stochastic estimates. To overcome this problem we introduce a novel framework, in which the solution of the inner problem, the solution of the linear system, and the main variable evolve at the same time. These directions are written as a sum, making it straightforward to derive unbiased estimates. The simplicity of our approach allows us to develop global variance reduction algorithms, where the dynamics of all variables is subject to variance reduction. We demonstrate that SABA, an adaptation of the celebrated SAGA algorithm in our framework, has $O(\frac1T)$ convergence rate, and that it achieves linear convergence under Polyak-Lojasciewicz assumption. This is the first stochastic algorithm for bilevel optimization that verifies either of these properties. Numerical experiments validate the usefulness of our method.
    Set-based value operators for non-stationary Markovian environments. (arXiv:2207.07271v2 [cs.LG] UPDATED)
    This paper analyzes finite state Markov Decision Processes (MDPs) with uncertain parameters in compact sets and re-examines results from robust MDP via set-based fixed point theory. We generalize the Bellman and policy evaluation operators to operators that contract on the space of value functions and denote them as \emph{value operators}. We generalize these value operators to act on the space of value function sets and denote them as \emph{set-based value operators}. We prove that these set-based value operators are contractions in the space of compact value function sets. Leveraging insights from set theory, we generalize the rectangularity condition for the Bellman operator from classic robust MDP literature to a \emph{containment condition} for a generic value operator, which is weaker and can be applied to a larger set of parameter-uncertain MDPs and contractive operators in dynamic programming and reinforcement learning. We prove that both the rectangularity condition and the containment condition sufficiently ensure that the set-based value operator's fixed point set contains its own supremum and infimum elements. For convex and compact sets of uncertain MDP parameters, we show equivalence between the classic robust value function and the supremum of the fixed point set of the set-based Bellman operator. Under dynamically changing MDP parameters in compact sets, we prove a set convergence result for value iteration, which otherwise may not converge to a single value function.
    TransPolymer: a Transformer-based Language Model for Polymer Property Predictions. (arXiv:2209.01307v2 [cs.LG] UPDATED)
    Accurate and efficient prediction of polymer properties is of great significance in polymer development and design. Conventionally, expensive and time-consuming experiments or simulations are required to evaluate the function of polymers. Recently, Transformer models, equipped with attention mechanisms, have exhibited superior performance in various natural language processing tasks. However, such methods have not been investigated in polymer sciences. Herein, we report TransPolymer, a Transformer-based language model for polymer property prediction. Owing to our proposed polymer tokenizer with chemical awareness, TransPolymer can learn representations directly from polymer sequences. The model learns expressive representations by pretraining on a large unlabeled dataset, followed by finetuning the model on downstream datasets concerning various polymer properties. TransPolymer achieves superior performance in all eight datasets and surpasses other baselines significantly on most downstream tasks. Moreover, the improvement by the pretrained TransPolymer over supervised TransPolymer and other language models strengthens the significant benefits of pretraining on large unlabeled data in representation learning. Experiment results further demonstrate the important role of the attention mechanism in understanding polymer sequences. We highlight this model as a promising computational tool for promoting rational polymer design and understanding structure-property relationships in a data science view.
    Robust Geometric Metric Learning. (arXiv:2202.11550v2 [stat.ML] UPDATED)
    This paper proposes new algorithms for the metric learning problem. We start by noticing that several classical metric learning formulations from the literature can be viewed as modified covariance matrix estimation problems. Leveraging this point of view, a general approach, called Robust Geometric Metric Learning (RGML), is then studied. This method aims at simultaneously estimating the covariance matrix of each class while shrinking them towards their (unknown) barycenter. We focus on two specific costs functions: one associated with the Gaussian likelihood (RGML Gaussian), and one with Tyler's M -estimator (RGML Tyler). In both, the barycenter is defined with the Riemannian distance, which enjoys nice properties of geodesic convexity and affine invariance. The optimization is performed using the Riemannian geometry of symmetric positive definite matrices and its submanifold of unit determinant. Finally, the performance of RGML is asserted on real datasets. Strong performance is exhibited while being robust to mislabeled data.
    Federated Unlearning: How to Efficiently Erase a Client in FL?. (arXiv:2207.05521v2 [cs.LG] UPDATED)
    With privacy legislation empowering users with the right to be forgotten, it has become essential to make a model forget about some of its training data. We explore the problem of removing any client's contribution in federated learning (FL). During FL rounds, each client performs local training to learn a model that minimizes the empirical loss on their private data. We propose to perform unlearning at the client (to be erased) by reversing the learning process, i.e., training a model to \emph{maximize} the local empirical loss. In particular, we formulate the unlearning problem as a constrained maximization problem by restricting to an $\ell_2$-norm ball around a suitably chosen reference model to help retain some knowledge learnt from the other clients' data. This allows the client to use projected gradient descent to perform unlearning. The method does neither require global access to the data used for training nor the history of the parameter updates to be stored by the aggregator (server) or any of the clients. Experiments on the MNIST dataset show that the proposed unlearning method is efficient and effective.
    Towards Better Evaluation for Dynamic Link Prediction. (arXiv:2207.10128v2 [cs.LG] UPDATED)
    Despite the prevalence of recent success in learning from static graphs, learning from time-evolving graphs remains an open challenge. In this work, we design new, more stringent evaluation procedures for link prediction specific to dynamic graphs, which reflect real-world considerations, to better compare the strengths and weaknesses of methods. First, we create two visualization techniques to understand the reoccurring patterns of edges over time and show that many edges reoccur at later time steps. Based on this observation, we propose a pure memorization baseline called EdgeBank. EdgeBank achieves surprisingly strong performance across multiple settings because easy negative edges are often used in the current evaluation setting. To evaluate against more difficult negative edges, we introduce two more challenging negative sampling strategies that improve robustness and better match real-world applications. Lastly, we introduce six new dynamic graph datasets from a diverse set of domains missing from current benchmarks, providing new challenges and opportunities for future research. Our code repository is accessible at https://github.com/fpour/DGB.git.
    Applications of Multi-Agent Reinforcement Learning in Future Internet: A Comprehensive Survey. (arXiv:2110.13484v3 [cs.AI] UPDATED)
    Future Internet involves several emerging technologies such as 5G and beyond 5G networks, vehicular networks, unmanned aerial vehicle (UAV) networks, and Internet of Things (IoTs). Moreover, future Internet becomes heterogeneous and decentralized with a large number of involved network entities. Each entity may need to make its local decision to improve the network performance under dynamic and uncertain network environments. Standard learning algorithms such as single-agent Reinforcement Learning (RL) or Deep Reinforcement Learning (DRL) have been recently used to enable each network entity as an agent to learn an optimal decision-making policy adaptively through interacting with the unknown environments. However, such an algorithm fails to model the cooperations or competitions among network entities, and simply treats other entities as a part of the environment that may result in the non-stationarity issue. Multi-agent Reinforcement Learning (MARL) allows each network entity to learn its optimal policy by observing not only the environments, but also other entities' policies. As a result, MARL can significantly improve the learning efficiency of the network entities, and it has been recently used to solve various issues in the emerging networks. In this paper, we thus review the applications of MARL in the emerging networks. In particular, we provide a tutorial of MARL and a comprehensive survey of applications of MARL in next generation Internet. In particular, we first introduce single-agent RL and MARL. Then, we review a number of applications of MARL to solve emerging issues in future Internet. The issues consist of network access, transmit power control, computation offloading, content caching, packet routing, trajectory design for UAV-aided networks, and network security issues.
    Learning from All Vehicles. (arXiv:2203.11934v3 [cs.RO] UPDATED)
    In this paper, we present a system to train driving policies from experiences collected not just from the ego-vehicle, but all vehicles that it observes. This system uses the behaviors of other agents to create more diverse driving scenarios without collecting additional data. The main difficulty in learning from other vehicles is that there is no sensor information. We use a set of supervisory tasks to learn an intermediate representation that is invariant to the viewpoint of the controlling vehicle. This not only provides a richer signal at training time but also allows more complex reasoning during inference. Learning how all vehicles drive helps predict their behavior at test time and can avoid collisions. We evaluate this system in closed-loop driving simulations. Our system outperforms all prior methods on the public CARLA Leaderboard by a wide margin, improving driving score by 25 and route completion rate by 24 points. Our method won the 2021 CARLA Autonomous Driving challenge. Code and data are available at https://github.com/dotchen/LAV.
    Statistical Estimation of Confounded Linear MDPs: An Instrumental Variable Approach. (arXiv:2209.05186v1 [stat.ML])
    In an Markov decision process (MDP), unobservable confounders may exist and have impacts on the data generating process, so that the classic off-policy evaluation (OPE) estimators may fail to identify the true value function of the target policy. In this paper, we study the statistical properties of OPE in confounded MDPs with observable instrumental variables. Specifically, we propose a two-stage estimator based on the instrumental variables and establish its statistical properties in the confounded MDPs with a linear structure. For non-asymptotic analysis, we prove a $\mathcal{O}(n^{-1/2})$-error bound where $n$ is the number of samples. For asymptotic analysis, we prove that the two-stage estimator is asymptotically normal with a typical rate of $n^{1/2}$. To the best of our knowledge, we are the first to show such statistical results of the two-stage estimator for confounded linear MDPs via instrumental variables.
    Support Recovery in Mixture Models with Sparse Parameters. (arXiv:2202.11940v2 [cs.LG] UPDATED)
    Mixture models are widely used to fit complex and multimodal datasets. In this paper we study mixtures with high dimensional sparse latent parameter vectors and consider the problem of support recovery of those vectors. While parameter learning in mixture models is well-studied, the sparsity constraint remains relatively unexplored. Sparsity of parameter vectors is a natural constraint in variety of settings, and support recovery is a major step towards parameter estimation. We provide efficient algorithms for support recovery that have a logarithmic sample complexity dependence on the dimensionality of the latent space. Our algorithms are quite general, namely they are applicable to 1) mixtures of many different canonical distributions including Uniform, Poisson, Laplace, Gaussians, etc. 2) Mixtures of linear regressions and linear classifiers with Gaussian covariates under different assumptions on the unknown parameters. In most of these settings, our results are the first guarantees on the problem while in the rest, our results provide improvements on existing works.
    Near-Optimal Distributed Linear-Quadratic Regulator for Networked Systems. (arXiv:2204.05551v2 [math.OC] UPDATED)
    This paper studies the trade-off between the degree of decentralization and the performance of a distributed controller in a linear-quadratic control setting. We study a system of interconnected agents over a graph and a distributed controller, called $\kappa$-distributed control, which lets the agents make control decisions based on the state information within distance $\kappa$ on the underlying graph. This controller can tune its degree of decentralization using the parameter $\kappa$ and thus allows a characterization of the relationship between decentralization and performance. We show that under mild assumptions, including stabilizability, detectability, and a subexponentially growing graph condition, the performance difference between $\kappa$-distributed control and centralized optimal control becomes exponentially small in $\kappa$. This result reveals that distributed control can achieve near-optimal performance with a moderate degree of decentralization, and thus it is an effective controller architecture for large-scale networked systems.
    Centroids Matching: an efficient Continual Learning approach operating in the embedding space. (arXiv:2208.02048v2 [cs.LG] UPDATED)
    Catastrophic forgetting (CF) occurs when a neural network loses the information previously learned while training on a set of samples from a different distribution, i.e., a new task. Existing approaches have achieved remarkable results in mitigating CF, especially in a scenario called task incremental learning. However, this scenario is not realistic, and limited work has been done to achieve good results on more realistic scenarios. In this paper, we propose a novel regularization method called Centroids Matching, that, inspired by meta-learning approaches, fights CF by operating in the feature space produced by the neural network, achieving good results while requiring a small memory footprint. Specifically, the approach classifies the samples directly using the feature vectors produced by the neural network, by matching those vectors with the centroids representing the classes from the current task, or all the tasks up to that point. Centroids Matching is faster than competing baselines, and it can be exploited to efficiently mitigate CF, by preserving the distances between the embedding space produced by the model when past tasks were over, and the one currently produced, leading to a method that achieves high accuracy on all the tasks, without using an external memory when operating on easy scenarios, or using a small one for more realistic ones. Extensive experiments demonstrate that Centroids Matching achieves accuracy gains on multiple datasets and scenarios.
    On the Nash equilibrium of moment-matching GANs for stationary Gaussian processes. (arXiv:2203.07136v3 [stat.ML] UPDATED)
    Generative Adversarial Networks (GANs) learn an implicit generative model from data samples through a two-player game. In this paper, we study the existence of Nash equilibrium of the game which is consistent as the number of data samples grows to infinity. In a realizable setting where the goal is to estimate the ground-truth generator of a stationary Gaussian process, we show that the existence of consistent Nash equilibrium depends crucially on the choice of the discriminator family. The discriminator defined from second-order statistical moments can result in non-existence of Nash equilibrium, existence of consistent non-Nash equilibrium, or existence and uniqueness of consistent Nash equilibrium, depending on whether symmetry properties of the generator family are respected. We further study empirically the local stability and global convergence of gradient descent-ascent methods towards consistent equilibrium.
    Stream-based Active Learning with Verification Latency in Non-stationary Environments. (arXiv:2204.06822v2 [cs.LG] UPDATED)
    Data stream classification is an important problem in the field of machine learning. Due to the non-stationary nature of the data where the underlying distribution changes over time (concept drift), the model needs to continuously adapt to new data statistics. Stream-based Active Learning (AL) approaches address this problem by interactively querying a human expert to provide new data labels for the most recent samples, within a limited budget. Existing AL strategies assume that labels are immediately available, while in a real-world scenario the expert requires time to provide a queried label (verification latency), and by the time the requested labels arrive they may not be relevant anymore. In this article, we investigate the influence of finite, time-variable, and unknown verification delay, in the presence of concept drift on AL approaches. We propose PRopagate (PR), a latency independent utility estimator which also predicts the requested, but not yet known, labels. Furthermore, we propose a drift-dependent dynamic budget strategy, which uses a variable distribution of the labelling budget over time, after a detected drift. Thorough experimental evaluation, with both synthetic and real-world non-stationary datasets, and different settings of verification latency and budget are conducted and analyzed. We empirically show that the proposed method consistently outperforms the state-of-the-art. Additionally, we demonstrate that with variable budget allocation in time, it is possible to boost the performance of AL strategies, without increasing the overall labeling budget.
    The Classification of Optical Galaxy Morphology Using Unsupervised Learning Techniques. (arXiv:2206.06165v2 [cs.LG] UPDATED)
    In recent years, large scale data intensive astronomical surveys have resulted in more detailed images being produced than scientists can manually classify. Even attempts to crowd-source this work will soon be outpaced by the large amount of data generated by modern surveys. This has brought into question the viability of human-based methods for classifying galaxy morphology. While supervised learning methods require datasets with existing labels, unsupervised learning techniques do not. Therefore, this paper implements unsupervised learning techniques to classify the Galaxy Zoo DECaLS dataset. A convolutional autoencoder feature extractor was trained and implemented. The resulting features were then clustered via k-means, fuzzy c-means and agglomerative clustering. These clusters were compared against the true volunteer classifications provided by the Galaxy Zoo DECaLS project. The best results, in general, were produced by the agglomerate clustering method. However, the increase in performance compared to k-means clustering was not significant considering the increase in clustering time. After undergoing the appropriate clustering algorithm optimizations, this approach could prove useful for classifying the better performing questions and could serve as the basis for a novel approach to generating more "human-like" galaxy morphology classifications from unsupervised techniques.
    Reconstruction of Long-Term Historical Demand Data. (arXiv:2209.04693v1 [cs.LG])
    Long-term planning of a robust power system requires the understanding of changing demand patterns. Electricity demand is highly weather sensitive. Thus, the supply side variation from introducing intermittent renewable sources, juxtaposed with variable demand, will introduce additional challenges in the grid planning process. By understanding the spatial and temporal variability of temperature over the US, the response of demand to natural variability and climate change-related effects on temperature can be separated, especially because the effects due to the former factor are not known. Through this project, we aim to better support the technology & policy development process for power systems by developing machine and deep learning 'back-forecasting' models to reconstruct multidecadal demand records and study the natural variability of temperature and its influence on demand.
    The Bayan Algorithm: Detecting Communities in Networks Through Exact and Approximate Optimization of Modularity. (arXiv:2209.04562v1 [cs.SI])
    Community detection is a classic problem in network science with extensive applications in various fields. The most commonly used methods are the algorithms designed to maximize a utility function, modularity, across different ways that a network can be partitioned into communities. Despite their name and design philosophy, current modularity maximization algorithms generally fail to maximize modularity or guarantee any proximity to an optimal solution. We propose the Bayan algorithm which, unlike the existing methods, returns network partitions with a guarantee of either optimality or proximity to an optimal solution. At the core of the Bayan algorithm is a branch-and-cut scheme that solves a sparse integer programming formulation of the modularity maximization problem to optimality or approximate it within a factor. We analyze the performance of Bayan against 22 existing algorithms using synthetic and real networks. Through extensive experiments, we demonstrate Bayan's distinctive capabilities not only in maximizing modularity, but more importantly in accurate retrieval of ground-truth communities. Bayan's comparative level of performance remains stable over variations in the amount of noise in the data (graph) generation process. The performance of Bayan as an exact modularity maximization algorithm also reveals the theoretical capability limits of maximum-modularity partitions in accurate retrieval of communities. Overall our analysis points to Bayan as a suitable choice for a methodologically grounded detection of communities through exact (approximate) maximization of modularity in networks with up to $\sim10^3$ edges (and larger networks). Prospective advances in graph optimization and integer programming can push these limits further.
    An Improved Lightweight YOLOv5 Model Based on Attention Mechanism for Face Mask Detection. (arXiv:2203.16506v3 [cs.CV] UPDATED)
    Coronavirus 2019 has brought severe challenges to social stability and public health worldwide. One effective way of curbing the epidemic is to require people to wear masks in public places and monitor mask-wearing states by utilizing suitable automatic detectors. However, existing deep learning based models struggle to simultaneously achieve the requirements of both high precision and real-time performance. To solve this problem, we propose an improved lightweight face mask detector based on YOLOv5, which can achieve an excellent balance of precision and speed. Firstly, a novel backbone ShuffleCANet that combines ShuffleNetV2 network with Coordinate Attention mechanism is proposed as the backbone. Afterwards, an efficient path aggression network BiFPN is applied as the feature fusion neck. Furthermore, the localization loss is replaced with alpha-CIoU in model training phase to obtain higher-quality anchors. Some valuable strategies such as data augmentation, adaptive image scaling, and anchor cluster operation are also utilized. Experimental results on AIZOO face mask dataset show the superiority of the proposed model. Compared with the original YOLOv5, the proposed model increases the inference speed by 28.3% while still improving the precision by 0.58%. It achieves the best mean average precision of 95.2% compared with other seven existing models, which is 4.4% higher than the baseline.
    Gradient Descent Temporal Difference-difference Learning. (arXiv:2209.04624v1 [cs.LG])
    Off-policy algorithms, in which a behavior policy differs from the target policy and is used to gain experience for learning, have proven to be of great practical value in reinforcement learning. However, even for simple convex problems such as linear value function approximation, these algorithms are not guaranteed to be stable. To address this, alternative algorithms that are provably convergent in such cases have been introduced, the most well known being gradient descent temporal difference (GTD) learning. This algorithm and others like it, however, tend to converge much more slowly than conventional temporal difference learning. In this paper we propose gradient descent temporal difference-difference (Gradient-DD) learning in order to improve GTD2, a GTD algorithm, by introducing second-order differences in successive parameter updates. We investigate this algorithm in the framework of linear value function approximation, theoretically proving its convergence by applying the theory of stochastic approximation. %analytically showing its improvement over GTD2. Studying the model empirically on the random walk task, the Boyan-chain task, and the Baird's off-policy counterexample, we find substantial improvement over GTD2 and, in several cases, better performance even than conventional TD learning.
    Active Learning for Optimal Intervention Design in Causal Models. (arXiv:2209.04744v1 [cs.LG])
    An important problem across disciplines is the discovery of interventions that produce a desired outcome. When the space of possible interventions is large, making an exhaustive search infeasible, experimental design strategies are needed. In this context, encoding the causal relationships between the variables, and thus the effect of interventions on the system, is critical in order to identify desirable interventions efficiently. We develop an iterative causal method to identify optimal interventions, as measured by the discrepancy between the post-interventional mean of the distribution and a desired target mean. We formulate an active learning strategy that uses the samples obtained so far from different interventions to update the belief about the underlying causal model, as well as to identify samples that are most informative about optimal interventions and thus should be acquired in the next batch. The approach employs a Bayesian update for the causal model and prioritizes interventions using a carefully designed, causally informed acquisition function. This acquisition function is evaluated in closed form, allowing for efficient optimization. The resulting algorithms are theoretically grounded with information-theoretic bounds and provable consistency results. We illustrate the method on both synthetic data and real-world biological data, namely gene expression data from Perturb-CITE-seq experiments, to identify optimal perturbations that induce a specific cell state transition; the proposed causal approach is observed to achieve better sample efficiency compared to several baselines. In both cases we observe that the causally informed acquisition function notably outperforms existing criteria allowing for optimal intervention design with significantly less experiments.
    Temporal Pattern Mining for Analysis of Longitudinal Clinical Data: Identifying Risk Factors for Alzheimer's Disease. (arXiv:2209.04793v1 [cs.LG])
    A novel framework is proposed for handling the complex task of modelling and analysis of longitudinal, multivariate, heterogeneous clinical data. This method uses temporal abstraction to convert the data into a more appropriate form for modelling, temporal pattern mining, to discover patterns in the complex, longitudinal data and machine learning models of survival analysis to select the discovered patterns. The method is applied to a real-world study of Alzheimer's disease (AD), a progressive neurodegenerative disease that has no cure. The patterns discovered were predictive of AD in survival analysis models with a Concordance index of up to 0.8. This is the first work that performs survival analysis of AD data using temporal data collections for AD. A visualisation module also provides a clear picture of the discovered patterns for ease of interpretability.
    Pathfinding in Random Partially Observable Environments with Vision-Informed Deep Reinforcement Learning. (arXiv:2209.04801v1 [cs.LG])
    Deep reinforcement learning is a technique for solving problems in a variety of environments, ranging from Atari video games to stock trading. This method leverages deep neural network models to make decisions based on observations of a given environment with the goal of maximizing a reward function that can incorporate cost and rewards for reaching goals. With the aim of pathfinding, reward conditions can include reaching a specified target area along with costs for movement. In this work, multiple Deep Q-Network (DQN) agents are trained to operate in a partially observable environment with the goal of reaching a target zone in minimal travel time. The agent operates based on a visual representation of its surroundings, and thus has a restricted capability to observe the environment. A comparison between DQN, DQN-GRU, and DQN-LSTM is performed to examine each models capabilities with two different types of input. Through this evaluation, it is been shown that with equivalent training and analogous model architectures, a DQN model is able to outperform its recurrent counterparts.
    Rethink Decision Tree Traversal. (arXiv:2209.04825v1 [cs.LG])
    We will show how to evaluate binary decision tree traversal in the language of matrix computation motivated by \textit{QuickScorer} in \cite{lucchese2015quickscorer}. Our main contribution is a novel matrix representation of the hierarchical structure of the decision tree. And we propose some equivalent algorithms of binary decision tree traversal based on rigorous theoretical analysis. The core idea is to find the relation between the input and exit leaf node. Here we not only understand decisions without the recursive traverse but also dive into the partitioning nature of tree-based methods.
    DeepSTI: Towards Tensor Reconstruction using Fewer Orientations in Susceptibility Tensor Imaging. (arXiv:2209.04504v1 [eess.IV])
    Susceptibility tensor imaging (STI) is an emerging magnetic resonance imaging technique that characterizes the anisotropic tissue magnetic susceptibility with a second-order tensor model. STI has the potential to provide information for both the reconstruction of white matter fiber pathways and detection of myelin changes in the brain at mm resolution or less, which would be of great value for understanding brain structure and function in healthy and diseased brain. However, the application of STI in vivo has been hindered by its cumbersome and time-consuming acquisition requirement of measuring susceptibility induced MR phase changes at multiple (usually more than six) head orientations. This complexity is enhanced by the limitation in head rotation angles due to physical constraints of the head coil. As a result, STI has not yet been widely applied in human studies in vivo. In this work, we tackle these issues by proposing an image reconstruction algorithm for STI that leverages data-driven priors. Our method, called DeepSTI, learns the data prior implicitly via a deep neural network that approximates the proximal operator of a regularizer function for STI. The dipole inversion problem is then solved iteratively using the learned proximal network. Experimental results using both simulation and in vivo human data demonstrate great improvement over state-of-the-art algorithms in terms of the reconstructed tensor image, principal eigenvector maps and tractography results, while allowing for tensor reconstruction with MR phase measured at much less than six different orientations. Notably, promising reconstruction results are achieved by our method from only one orientation in human in vivo, and we demonstrate a potential application of this technique for estimating lesion susceptibility anisotropy in patients with multiple sclerosis.
    Examining stability of machine learning methods for predicting dementia at early phases of the disease. (arXiv:2209.04643v1 [cs.LG])
    Dementia is a neuropsychiatric brain disorder that usually occurs when one or more brain cells stop working partially or at all. Diagnosis of this disorder in the early phases of the disease is a vital task to rescue patients lives from bad consequences and provide them with better healthcare. Machine learning methods have been proven to be accurate in predicting dementia in the early phases of the disease. The prediction of dementia depends heavily on the type of collected data which usually are gathered from Normalized Whole Brain Volume (nWBV) and Atlas Scaling Factor (ASF) which are normally measured and corrected from Magnetic Resonance Imaging (MRIs). Other biological features such as age and gender can also help in the diagnosis of dementia. Although many studies use machine learning for predicting dementia, we could not reach a conclusion on the stability of these methods for which one is more accurate under different experimental conditions. Therefore, this paper investigates the conclusion stability regarding the performance of machine learning algorithms for dementia prediction. To accomplish this, a large number of experiments were run using 7 machine learning algorithms and two feature reduction algorithms namely, Information Gain (IG) and Principal Component Analysis (PCA). To examine the stability of these algorithms, thresholds of feature selection were changed for the IG from 20% to 100% and the PCA dimension from 2 to 8. This has resulted in 7x9 + 7x7= 112 experiments. In each experiment, various classification evaluation data were recorded. The obtained results show that among seven algorithms the support vector machine and Naive Bayes are the most stable algorithms while changing the selection threshold. Also, it was found that using IG would seem more efficient than using PCA for predicting Dementia.
    Shape Analysis for Pediatric Upper Body Motor Function Assessment. (arXiv:2209.04710v1 [cs.LG])
    Neuromuscular disorders, such as Spinal Muscular Atrophy (SMA) and Duchenne Muscular Dystrophy (DMD), cause progressive muscular degeneration and loss of motor function for 1 in 6,000 children. Traditional upper limb motor function assessments do not quantitatively measure patient-performed motions, which makes it difficult to track progress for incremental changes. Assessing motor function in children with neuromuscular disorders is particularly challenging because they can be nervous or excited during experiments, or simply be too young to follow precise instructions. These challenges translate to confounding factors such as performing different parts of the arm curl slower or faster (phase variability) which affects the assessed motion quality. This paper uses curve registration and shape analysis to temporally align trajectories while simultaneously extracting a mean reference shape. Distances from this mean shape are used to assess the quality of motion. The proposed metric is invariant to confounding factors, such as phase variability, while suggesting several clinically relevant insights. First, there are statistically significant differences between functional scores for the control and patient populations (p$=$0.0213$\le$0.05). Next, several patients in the patient cohort are able to perform motion on par with the healthy cohort and vice versa. Our metric, which is computed based on wearables, is related to the Brooke's score ((p$=$0.00063$\le$0.05)), as well as motor function assessments based on dynamometry ((p$=$0.0006$\le$0.05)). These results show promise towards ubiquitous motion quality assessment in daily life.
    Analyzing Wearables Dataset to Predict ADLs and Falls: A Pilot Study. (arXiv:2209.04785v1 [cs.LG])
    Healthcare is an important aspect of human life. Use of technologies in healthcare has increased manifolds after the pandemic. Internet of Things based systems and devices proposed in literature can help elders, children and adults facing/experiencing health problems. This paper exhaustively reviews thirty-nine wearable based datasets which can be used for evaluating the system to recognize Activities of Daily Living and Falls. A comparative analysis on the SisFall dataset using five machine learning methods i.e., Logistic Regression, Linear Discriminant Analysis, K-Nearest Neighbor, Decision Tree and Naive Bayes is performed in python. The dataset is modified in two ways, in first all the attributes present in dataset are used as it is and labelled in binary form. In second, magnitude of three axes(x,y,z) for three sensors value are computed and then used in experiment with label attribute. The experiments are performed on one subject, ten subjects and all the subjects and compared in terms of accuracy, precision and recall. The results obtained from this study proves that KNN outperforms other machine learning methods in terms of accuracy, precision and recall. It is also concluded that personalization of data improves accuracy.
    Defend Data Poisoning Attacks on Voice Authentication. (arXiv:2209.04547v1 [cs.CR])
    With the advances in deep learning, speaker verification has achieved very high accuracy and is gaining popularity as a type of biometric authentication option in many scenes of our daily life, especially the growing market of web services. Compared to traditional passwords, "vocal passwords" are much more convenient as they relieve people from memorizing different passwords. However, new machine learning attacks are putting these voice authentication systems at risk. Without a strong security guarantee, attackers could access legitimate users' web accounts by fooling the deep neural network (DNN) based voice recognition models. In this paper, we demonstrate an easy-to-implement data poisoning attack to the voice authentication system, which can hardly be captured by existing defense mechanisms. Thus, we propose a more robust defense method, called Guardian, which is a convolutional neural network-based discriminator. The Guardian discriminator integrates a series of novel techniques including bias reduction, input augmentation, and ensemble learning. Our approach is able to distinguish about 95% of attacked accounts from normal accounts, which is much more effective than existing approaches with only 60% accuracy.
    Ask Before You Act: Generalising to Novel Environments by Asking Questions. (arXiv:2209.04665v1 [cs.AI])
    Solving temporally-extended tasks is a challenge for most reinforcement learning (RL) algorithms [arXiv:1906.07343]. We investigate the ability of an RL agent to learn to ask natural language questions as a tool to understand its environment and achieve greater generalisation performance in novel, temporally-extended environments. We do this by endowing this agent with the ability of asking "yes-no" questions to an all-knowing Oracle. This allows the agent to obtain guidance regarding the task at hand, while limiting the access to new information. To study the emergence of such natural language questions in the context of temporally-extended tasks we first train our agent in a Mini-Grid environment. We then transfer the trained agent to a different, harder environment. We observe a significant increase in generalisation performance compared to a baseline agent unable to ask questions. Through grounding its understanding of natural language in its environment, the agent can reason about the dynamics of its environment to the point that it can ask new, relevant questions when deployed in a novel environment.
    The Space of Adversarial Strategies. (arXiv:2209.04521v1 [cs.CR])
    Adversarial examples, inputs designed to induce worst-case behavior in machine learning models, have been extensively studied over the past decade. Yet, our understanding of this phenomenon stems from a rather fragmented pool of knowledge; at present, there are a handful of attacks, each with disparate assumptions in threat models and incomparable definitions of optimality. In this paper, we propose a systematic approach to characterize worst-case (i.e., optimal) adversaries. We first introduce an extensible decomposition of attacks in adversarial machine learning by atomizing attack components into surfaces and travelers. With our decomposition, we enumerate over components to create 576 attacks (568 of which were previously unexplored). Next, we propose the Pareto Ensemble Attack (PEA): a theoretical attack that upper-bounds attack performance. With our new attacks, we measure performance relative to the PEA on: both robust and non-robust models, seven datasets, and three extended lp-based threat models incorporating compute costs, formalizing the Space of Adversarial Strategies. From our evaluation we find that attack performance to be highly contextual: the domain, model robustness, and threat model can have a profound influence on attack efficacy. Our investigation suggests that future studies measuring the security of machine learning should: (1) be contextualized to the domain & threat models, and (2) go beyond the handful of known attacks used today.
    Revisiting Active Sets for Gaussian Process Decoders. (arXiv:2209.04636v1 [stat.ML])
    Decoders built on Gaussian processes (GPs) are enticing due to the marginalisation over the non-linear function space. Such models (also known as GP-LVMs) are often expensive and notoriously difficult to train in practice, but can be scaled using variational inference and inducing points. In this paper, we revisit active set approximations. We develop a new stochastic estimate of the log-marginal likelihood based on recently discovered links to cross-validation, and propose a computationally efficient approximation thereof. We demonstrate that the resulting stochastic active sets (SAS) approximation significantly improves the robustness of GP decoder training while reducing computational cost. The SAS-GP obtains more structure in the latent space, scales to many datapoints and learns better representations than variational autoencoders, which is rarely the case for GP decoders.
    Application of Machine Learning for Online Reputation Systems. (arXiv:2209.04650v1 [cs.LG])
    Users on the internet usually require venues to provide better purchasing recommendations. This can be provided by a reputation system that processes ratings to provide recommendations. The rating aggregation process is a main part of reputation system to produce global opinion about the product quality. Naive methods that are frequently used do not consider consumer profiles in its calculation and cannot discover unfair ratings and trends emerging in new ratings. Other sophisticated rating aggregation methods that use weighted average technique focus on one or a few aspects of consumers profile data. This paper proposes a new reputation system using machine learning to predict reliability of consumers from consumer profile. In particular, we construct a new consumer profile dataset by extracting a set of factors that have great impact on consumer reliability, which serve as an input to machine learning algorithms. The predicted weight is then integrated with a weighted average method to compute product reputation score. The proposed model has been evaluated over three MovieLens benchmarking datasets, using 10-Folds cross validation. Furthermore, the performance of the proposed model has been compared to previous published rating aggregation models. The obtained results were promising which suggest that the proposed approach could be a potential solution for reputation systems. The results of comparison demonstrated the accuracy of our models. Finally, the proposed approach can be integrated with online recommendation systems to provide better purchasing recommendations and facilitate user experience on online shopping markets.
    Bayesian Algorithm Execution for Tuning Particle Accelerator Emittance with Partial Measurements. (arXiv:2209.04587v1 [physics.acc-ph])
    Traditional black-box optimization methods are inefficient when dealing with multi-point measurement, i.e. when each query in the control domain requires a set of measurements in a secondary domain to calculate the objective. In particle accelerators, emittance tuning from quadrupole scans is an example of optimization with multi-point measurements. Although the emittance is a critical parameter for the performance of high-brightness machines, including X-ray lasers and linear colliders, comprehensive optimization is often limited by the time required for tuning. Here, we extend the recently-proposed Bayesian Algorithm Execution (BAX) to the task of optimization with multi-point measurements. BAX achieves sample-efficiency by selecting and modeling individual points in the joint control-measurement domain. We apply BAX to emittance minimization at the Linac Coherent Light Source (LCLS) and the Facility for Advanced Accelerator Experimental Tests II (FACET-II) particle accelerators. In an LCLS simulation environment, we show that BAX delivers a 20x increase in efficiency while also being more robust to noise compared to traditional optimization methods. Additionally, we ran BAX live at both LCLS and FACET-II, matching the hand-tuned emittance at FACET-II and achieving an optimal emittance that was 24% lower than that obtained by hand-tuning at LCLS. We anticipate that our approach can readily be adapted to other types of optimization problems involving multi-point measurements commonly found in scientific instruments.
    Deep Learning with Non-Linear Factor Models: Adaptability and Avoidance of Curse of Dimensionality. (arXiv:2209.04512v1 [stat.ML])
    In this paper, we connect deep learning literature with non-linear factor models and show that deep learning estimation makes a substantial improvement in the non-linear additive factor model literature. We provide bounds on the expected risk and show that these upper bounds are uniform over a set of multiple response variables by extending Schmidt-Hieber (2020) theorems. We show that our risk bound does not depend on the number of factors. In order to construct a covariance matrix estimator for asset returns, we develop a novel data-dependent estimator of the error covariance matrix in deep neural networks. The estimator refers to a flexible adaptive thresholding technique which is robust to outliers in the innovations. We prove that the estimator is consistent in spectral norm. Then using that result, we show consistency and rate of convergence of covariance matrix and precision matrix estimator for asset returns. The rate of convergence in both results do not depend on the number of factors, hence ours is a new result in the factor model literature due to the fact that number of factors are impediment to better estimation and prediction. Except from the precision matrix result, all our results are obtained even with number of assets are larger than the time span, and both quantities are growing. Various Monte Carlo simulations confirm our large sample findings and reveal superior accuracies of the DNN-FM in estimating the true underlying functional form which connects the factors and observable variables, as well as the covariance and precision matrix compared to competing approaches. Moreover, in an out-of-sample portfolio forecasting application it outperforms in most of the cases alternative portfolio strategies in terms of out-of-sample portfolio standard deviation and Sharpe ratio.
    Learning sparse auto-encoders for green AI image coding. (arXiv:2209.04448v1 [eess.IV])
    Recently, convolutional auto-encoders (CAE) were introduced for image coding. They achieved performance improvements over the state-of-the-art JPEG2000 method. However, these performances were obtained using massive CAEs featuring a large number of parameters and whose training required heavy computational power.\\ In this paper, we address the problem of lossy image compression using a CAE with a small memory footprint and low computational power usage. In order to overcome the computational cost issue, the majority of the literature uses Lagrangian proximal regularization methods, which are time consuming themselves.\\ In this work, we propose a constrained approach and a new structured sparse learning method. We design an algorithm and test it on three constraints: the classical $\ell_1$ constraint, the $\ell_{1,\infty}$ and the new $\ell_{1,1}$ constraint. Experimental results show that the $\ell_{1,1}$ constraint provides the best structured sparsity, resulting in a high reduction of memory and computational cost, with similar rate-distortion performance as with dense networks.
    Accelerated Primal-Dual Methods for Convex-Strongly-Concave Saddle Point Problems. (arXiv:2209.04604v1 [math.OC])
    In this work, we aim to investigate Primal-Dual (PD) methods for convex-strongly-concave saddle point problems (SPP). In many cases, the computation of the proximal oracle over the primal-only function is inefficient. Hence, we use its first-order linear approximation in the proximal step resulting in a Linearized PD (LPD) method. Even when the coupling term is bilinear, we observe that LPD has a suboptimal dependence on the Lipschitz constant of the primal-only function. In contrast, LPD has optimal convergence for the strongly-convex concave case. This observation induces us to present our accelerated linearized primal-dual (ALPD) algorithm to solve convex strongly-concave SPP. ALPD is a single-loop algorithm that combines features of Nesterov's accelerated gradient descent (AGD) and LPD. We show that when the coupling term is semi-linear (which contains bilinear as a specific case), ALPD obtains the optimal dependence on the Lipschitz constant of primal-only function. Hence, it is an optimal algorithm. When the coupling term has a general nonlinear form, the ALPD algorithm has suboptimal dependence on the Lipschitz constant of the primal part of the coupling term. To improve this dependence, we present an inexact APD algorithm. This algorithm performs AGD iterations in the inner loop to find an approximate solution to a proximal subproblem of APD. We show that inexact APD maintains optimal number of gradients evaluations (gradient complexity) of primal-only and dual parts of the problem. It also significantly improves the gradient-complexity of the primal coupling term.
    Robustness through Cognitive Dissociation Mitigation in Contrastive Adversarial Training. (arXiv:2203.08959v3 [cs.LG] UPDATED)
    In this paper, we introduce a novel neural network training framework that increases model's adversarial robustness to adversarial attacks while maintaining high clean accuracy by combining contrastive learning (CL) with adversarial training (AT). We propose to improve model robustness to adversarial attacks by learning feature representations that are consistent under both data augmentations and adversarial perturbations. We leverage contrastive learning to improve adversarial robustness by considering an adversarial example as another positive example, and aim to maximize the similarity between random augmentations of data samples and their adversarial example, while constantly updating the classification head in order to avoid a cognitive dissociation between the classification head and the embedding space. This dissociation is caused by the fact that CL updates the network up to the embedding space, while freezing the classification head which is used to generate new positive adversarial examples. We validate our method, Contrastive Learning with Adversarial Features(CLAF), on the CIFAR-10 dataset on which it outperforms both robust accuracy and clean accuracy over alternative supervised and self-supervised adversarial learning methods.
    Social-Implicit: Rethinking Trajectory Prediction Evaluation and The Effectiveness of Implicit Maximum Likelihood Estimation. (arXiv:2203.03057v2 [cs.CV] UPDATED)
    Best-of-N (BoN) Average Displacement Error (ADE)/ Final Displacement Error (FDE) is the most used metric for evaluating trajectory prediction models. Yet, the BoN does not quantify the whole generated samples, resulting in an incomplete view of the model's prediction quality and performance. We propose a new metric, Average Mahalanobis Distance (AMD) to tackle this issue. AMD is a metric that quantifies how close the whole generated samples are to the ground truth. We also introduce the Average Maximum Eigenvalue (AMV) metric that quantifies the overall spread of the predictions. Our metrics are validated empirically by showing that the ADE/FDE is not sensitive to distribution shifts, giving a biased sense of accuracy, unlike the AMD/AMV metrics. We introduce the usage of Implicit Maximum Likelihood Estimation (IMLE) as a replacement for traditional generative models to train our model, Social-Implicit. IMLE training mechanism aligns with AMD/AMV objective of predicting trajectories that are close to the ground truth with a tight spread. Social-Implicit is a memory efficient deep model with only 5.8K parameters that runs in real time of about 580Hz and achieves competitive results. Interactive demo of the problem can be seen at https://www.abduallahmohamed.com/social-implicit-amdamv-adefde-demo . Code is available at https://github.com/abduallahmohamed/Social-Implicit .
    TruVR: Trustworthy Cybersickness Detection using Explainable Machine Learning. (arXiv:2209.05257v1 [cs.HC])
    Cybersickness can be characterized by nausea, vertigo, headache, eye strain, and other discomforts when using virtual reality (VR) systems. The previously reported machine learning (ML) and deep learning (DL) algorithms for detecting (classification) and predicting (regression) VR cybersickness use black-box models; thus, they lack explainability. Moreover, VR sensors generate a massive amount of data, resulting in complex and large models. Therefore, having inherent explainability in cybersickness detection models can significantly improve the model's trustworthiness and provide insight into why and how the ML/DL model arrived at a specific decision. To address this issue, we present three explainable machine learning (xML) models to detect and predict cybersickness: 1) explainable boosting machine (EBM), 2) decision tree (DT), and 3) logistic regression (LR). We evaluate xML-based models with publicly available physiological and gameplay datasets for cybersickness. The results show that the EBM can detect cybersickness with an accuracy of 99.75% and 94.10% for the physiological and gameplay datasets, respectively. On the other hand, while predicting the cybersickness, EBM resulted in a Root Mean Square Error (RMSE) of 0.071 for the physiological dataset and 0.27 for the gameplay dataset. Furthermore, the EBM-based global explanation reveals exposure length, rotation, and acceleration as key features causing cybersickness in the gameplay dataset. In contrast, galvanic skin responses and heart rate are most significant in the physiological dataset. Our results also suggest that EBM-based local explanation can identify cybersickness-causing factors for individual samples. We believe the proposed xML-based cybersickness detection method can help future researchers understand, analyze, and design simpler cybersickness detection and reduction models.  ( 3 min )
    Advanced Manufacturing Configuration by Sample-efficient Batch Bayesian Optimization. (arXiv:2205.11827v2 [cs.LG] UPDATED)
    We propose a framework for the configuration and operation of expensive-to-evaluate advanced manufacturing methods, based on Bayesian optimization. The framework unifies a tailored acquisition function, a parallel acquisition procedure, and the integration of process information providing context to the optimization procedure. \cmtb{The novel acquisition function is demonstrated, analyzed and compared on state-of-the-art benchmarking problems. We apply the optimization approach to atmospheric plasma spraying and fused deposition modeling.} Our results demonstrate that the proposed framework can efficiently find input parameters that produce the desired outcome and minimize the process cost.
    Fully-automated patient-level malaria assessment on field-prepared thin blood film microscopy images, including Supplementary Information. (arXiv:1908.01901v2 [cs.LG] UPDATED)
    Malaria is a life-threatening disease affecting millions. Microscopy-based assessment of thin blood films is a standard method to (i) determine malaria species and (ii) quantitate high-parasitemia infections. Full automation of malaria microscopy by machine learning (ML) is a challenging task because field-prepared slides vary widely in quality and presentation, and artifacts often heavily outnumber relatively rare parasites. In this work, we describe a complete, fully-automated framework for thin film malaria analysis that applies ML methods, including convolutional neural nets (CNNs), trained on a large and diverse dataset of field-prepared thin blood films. Quantitation and species identification results are close to sufficiently accurate for the concrete needs of drug resistance monitoring and clinical use-cases on field-prepared samples. We focus our methods and our performance metrics on the field use-case requirements. We discuss key issues and important metrics for the application of ML methods to malaria microscopy.  ( 3 min )
    Federated Reinforcement Learning for Collective Navigation of Robotic Swarms. (arXiv:2202.01141v2 [cs.RO] UPDATED)
    The recent advancement of Deep Reinforcement Learning (DRL) contributed to robotics by allowing automatic controller design. The automatic controller design is a crucial approach for designing swarm robotic systems, which require more complex controllers than a single robot system to lead a desired collective behaviour. Although the DRL-based controller design method showed its effectiveness, the reliance on the central training server is a critical problem in real-world environments where robot-server communication is unstable or limited. We propose a novel Federated Learning (FL) based DRL training strategy (FLDDPG) for use in swarm robotic applications. Through the comparison with baseline strategies under a limited communication bandwidth scenario, it is shown that the FLDDPG method resulted in higher robustness and generalisation ability into a different environment and real robots, while the baseline strategies suffer from the limitation of communication bandwidth. This result suggests that the proposed method can benefit swarm robotic systems operating in environments with limited communication bandwidth, e.g., in high-radiation, underwater, or subterranean environments.
    Model-Based Deep Learning. (arXiv:2012.08405v3 [eess.SP] UPDATED)
    Signal processing, communications, and control have traditionally relied on classical statistical modeling techniques. Such model-based methods utilize mathematical formulations that represent the underlying physics, prior information and additional domain knowledge. Simple classical models are useful but sensitive to inaccuracies and may lead to poor performance when real systems display complex or dynamic behavior. On the other hand, purely data-driven approaches that are model-agnostic are becoming increasingly popular as datasets become abundant and the power of modern deep learning pipelines increases. Deep neural networks (DNNs) use generic architectures which learn to operate from data, and demonstrate excellent performance, especially for supervised problems. However, DNNs typically require massive amounts of data and immense computational resources, limiting their applicability for some signal processing scenarios. We are interested in hybrid techniques that combine principled mathematical models with data-driven systems to benefit from the advantages of both approaches. Such model-based deep learning methods exploit both partial domain knowledge, via mathematical structures designed for specific problems, as well as learning from limited data. In this article we survey the leading approaches for studying and designing model-based deep learning systems. We divide hybrid model-based/data-driven systems into categories based on their inference mechanism. We provide a comprehensive review of the leading approaches for combining model-based algorithms with deep learning in a systematic manner, along with concrete guidelines and detailed signal processing oriented examples from recent literature. Our aim is to facilitate the design and study of future systems on the intersection of signal processing and machine learning that incorporate the advantages of both domains.
    Learning to Compose Soft Prompts for Compositional Zero-Shot Learning. (arXiv:2204.03574v2 [cs.LG] UPDATED)
    We introduce compositional soft prompting (CSP), a parameter-efficient learning technique to improve the zero-shot compositionality of large-scale pretrained vision-language models (VLMs). VLMs can represent arbitrary classes as natural language prompts in their flexible text encoders, but they underperform state-of-the-art methods on compositional zero-shot benchmark tasks. To improve VLMs, we propose a novel form of soft prompting. We treat the attributes and objects that are composed to define classes as learnable tokens of vocabulary and tune them on multiple prompt compositions. During inference, we recompose the learned attribute-object vocabulary in new combinations. We show that CSP outperforms the original VLM on benchmark datasets by an average of 10.9 percentage points on AUC. CSP also outperforms CoOp, a soft prompting method that tunes the prefix context, by an average of 5.8 percentage points on AUC. We perform additional experiments to show that CSP improves generalization to attribute-only classification, higher-order attribute-attribute-object compositions, and combinations of pretrained attributes and fine-tuned objects.  ( 2 min )
    Iterative Teaching by Label Synthesis. (arXiv:2110.14432v4 [cs.LG] UPDATED)
    In this paper, we consider the problem of iterative machine teaching, where a teacher provides examples sequentially based on the current iterative learner. In contrast to previous methods that have to scan over the entire pool and select teaching examples from it in each iteration, we propose a label synthesis teaching framework where the teacher randomly selects input teaching examples (e.g., images) and then synthesizes suitable outputs (e.g., labels) for them. We show that this framework can avoid costly example selection while still provably achieving exponential teachability. We propose multiple novel teaching algorithms in this framework. Finally, we empirically demonstrate the value of our framework.  ( 2 min )
    Graph Neural Modeling of Network Flows. (arXiv:2209.05208v1 [cs.LG])
    Network flow problems, which involve distributing traffic over a network such that the underlying infrastructure is used effectively, are ubiquitous in transportation and logistics. Due to the appeal of data-driven optimization, these problems have increasingly been approached using graph learning methods. Among them, the Multi-Commodity Network Flow (MCNF) problem is of particular interest given its generality, since it concerns the distribution of multiple flows (also called demands) of different sizes between several sources and sinks. The widely-used objective that we focus on is the maximum utilization of any link in the network, given traffic demands and a routing strategy. In this paper, we propose a novel approach based on Graph Neural Networks (GNNs) for the MCNF problem which uses distinctly parametrized message functions along each link, akin to a relational model where all edge types are unique. We show that our proposed method yields substantial gains over existing graph learning methods that constrain the routing unnecessarily. We extensively evaluate the proposed approach by means of an Internet routing case study using 17 Service Provider topologies and two flow routing schemes. We find that, in many networks, an MLP is competitive with a generic GNN that does not use our mechanism. Furthermore, we shed some light on the relationship between graph structure and the difficulty of data-driven routing of flows, an aspect that has not been considered in the existing work in the area.  ( 3 min )
    Monkeypox virus detection using pre-trained deep learning-based approaches. (arXiv:2209.04444v1 [eess.IV])
    Monkeypox virus is emerging slowly with the decline of COVID-19 virus infections around the world. People are afraid of it, thinking that it would appear as a pandemic like COVID-19. As such, it is crucial to detect them earlier before widespread community transmission. AI-based detection could help identify them at the early stage. In this paper, we first compare 13 different pre-trained deep learning (DL) models for the Monkeypox virus detection. For this, we first fine-tune them with the addition of universal custom layers for all of them and analyse using four well-established measures: Precision, Recall, F1-score, and Accuracy. After the identification of the best-performing DL models, we ensemble them to improve the overall performance using a majority voting over the probabilistic outputs obtained from them. We perform our experiments on a publicly available dataset, which shows that our ensemble method provides Precision, Recall, F1-score, and Accuracy of 85.44\%, 85.47\%, 85.40\%, and 87.13\%, respectively. These encouraging results suggest that the proposed approach is applicable to health practitioners for mass screening.
    Gluformer: Transformer-Based Personalized Glucose Forecasting with Uncertainty Quantification. (arXiv:2209.04526v1 [cs.LG])
    Deep learning models achieve state-of-the art results in predicting blood glucose trajectories, with a wide range of architectures being proposed. However, the adaptation of such models in clinical practice is slow, largely due to the lack of uncertainty quantification of provided predictions. In this work, we propose to model the future glucose trajectory conditioned on the past as an infinite mixture of basis distributions (i.e., Gaussian, Laplace, etc.). This change allows us to learn the uncertainty and predict more accurately in the cases when the trajectory has a heterogeneous or multi-modal distribution. To estimate the parameters of the predictive distribution, we utilize the Transformer architecture. We empirically demonstrate the superiority of our method over existing state-of-the-art techniques both in terms of accuracy and uncertainty on the synthetic and benchmark glucose data sets.  ( 2 min )
    GFCL: A GRU-based Federated Continual Learning Framework against Data Poisoning Attacks in IoV. (arXiv:2204.11010v2 [cs.LG] UPDATED)
    Integration of machine learning (ML) in 5G-based Internet of Vehicles (IoV) networks has enabled intelligent transportation and smart traffic management. Nonetheless, the security against adversarial poisoning attacks is also increasingly becoming a challenging task. Specifically, Deep Reinforcement Learning (DRL) is one of the widely used ML designs in IoV applications. The standard ML security techniques are not effective in DRL where the algorithm learns to solve sequential decision-making through continuous interaction with the environment, and the environment is time-varying, dynamic, and mobile. In this paper, we propose a Gated Recurrent Unit (GRU)-based federated continual learning (GFCL) anomaly detection framework against Sybil-based data poisoning attacks in IoV. The objective is to present a lightweight and scalable framework that learns and detects the illegitimate behavior without having a-priori training dataset consisting of attack samples. We use GRU to predict a future data sequence to analyze and detect illegitimate behavior from vehicles in a federated learning-based distributed manner. We investigate the performance of our framework using real-world vehicle mobility traces. The results demonstrate the effectiveness of our proposed solution in terms of different performance metrics.  ( 3 min )
    Conditional Gradients for the Approximatel Vanishing Ideal. (arXiv:2202.03349v12 [cs.LG] UPDATED)
    The vanishing ideal of a set of points $X\subseteq \mathbb{R}^n$ is the set of polynomials that evaluate to $0$ over all points $\mathbf{x} \in X$ and admits an efficient representation by a finite set of polynomials called generators. To accommodate the noise in the data set, we introduce the pairwise conditional gradients approximate vanishing ideal algorithm (PCGAVI) that constructs a set of generators of the approximate vanishing ideal. The constructed generators capture polynomial structures in data and give rise to a feature map that can, for example, be used in combination with a linear classifier for supervised learning. In PCGAVI, we construct the set of generators by solving constrained convex optimization problems with the pairwise conditional gradients algorithm. Thus, PCGAVI not only constructs few but also sparse generators, making the corresponding feature transformation robust and compact. Furthermore, we derive several learning guarantees for PCGAVI that make the algorithm theoretically better motivated than related generator-constructing methods.  ( 3 min )
    Robust Uncertainty Bounds in Reproducing Kernel Hilbert Spaces: A Convex Optimization Approach. (arXiv:2104.09582v3 [cs.LG] UPDATED)
    The problem of establishing out-of-sample bounds for the values of an unkonwn ground-truth function is considered. Kernels and their associated Hilbert spaces are the main formalism employed herein along with an observational model where outputs are corrupted by bounded measurement noise. The noise can originate from any compactly supported distribution and no independence assumptions are made on the available data. In this setting, we show how computing tight, finite-sample uncertainty bounds amounts to solving parametric quadratically constrained linear programs. Next, properties of our approach are established and its relationship with another methods is studied. Numerical experiments are presented to exemplify how the theory can be applied in a number of scenarios, and to contrast it with other closed-form alternatives.  ( 2 min )
    Fine-grain Inference on Out-of-Distribution Data with Hierarchical Classification. (arXiv:2209.04493v1 [cs.LG])
    Machine learning methods must be trusted to make appropriate decisions in real-world environments, even when faced with out-of-distribution (OOD) samples. Many current approaches simply aim to detect OOD examples and alert the user when an unrecognized input is given. However, when the OOD sample significantly overlaps with the training data, a binary anomaly detection is not interpretable or explainable, and provides little information to the user. We propose a new model for OOD detection that makes predictions at varying levels of granularity as the inputs become more ambiguous, the model predictions become coarser and more conservative. Consider an animal classifier that encounters an unknown bird species and a car. Both cases are OOD, but the user gains more information if the classifier recognizes that its uncertainty over the particular species is too large and predicts bird instead of detecting it as OOD. Furthermore, we diagnose the classifiers performance at each level of the hierarchy improving the explainability and interpretability of the models predictions. We demonstrate the effectiveness of hierarchical classifiers for both fine- and coarse-grained OOD tasks.
    An Interactive Automation for Human Biliary Tree Diagnosis Using Computer Vision. (arXiv:2209.04646v1 [eess.IV])
    The biliary tree is a network of tubes that connects the liver to the gallbladder, an organ right beneath it. The bile duct is the major tube in the biliary tree. The dilatation of a bile duct is a key indicator for more major problems in the human body, such as stones and tumors, which are frequently caused by the pancreas or the papilla of vater. The detection of bile duct dilatation can be challenging for beginner or untrained medical personnel in many circumstances. Even professionals are unable to detect bile duct dilatation with the naked eye. This research presents a unique vision-based model for biliary tree initial diagnosis. To segment the biliary tree from the Magnetic Resonance Image, the framework used different image processing approaches (MRI). After the image's region of interest was segmented, numerous calculations were performed on it to extract 10 features, including major and minor axes, bile duct area, biliary tree area, compactness, and some textural features (contrast, mean, variance and correlation). This study used a database of images from King Hussein Medical Center in Amman, Jordan, which included 200 MRI images, 100 normal cases, and 100 patients with dilated bile ducts. After the characteristics are extracted, various classifiers are used to determine the patients' condition in terms of their health (normal or dilated). The findings demonstrate that the extracted features perform well with all classifiers in terms of accuracy and area under the curve. This study is unique in that it uses an automated approach to segment the biliary tree from MRI images, as well as scientifically correlating retrieved features with biliary tree status that has never been done before in the literature.
    Deep Baseline Network for Time Series Modeling and Anomaly Detection. (arXiv:2209.04561v1 [cs.LG])
    Deep learning has seen increasing applications in time series in recent years. For time series anomaly detection scenarios, such as in finance, Internet of Things, data center operations, etc., time series usually show very flexible baselines depending on various external factors. Anomalies unveil themselves by lying far away from the baseline. However, the detection is not always easy due to some challenges including baseline shifting, lacking of labels, noise interference, real time detection in streaming data, result interpretability, etc. In this paper, we develop a novel deep architecture to properly extract the baseline from time series, namely Deep Baseline Network (DBLN). By using this deep network, we can easily locate the baseline position and then provide reliable and interpretable anomaly detection result. Empirical evaluation on both synthetic and public real-world datasets shows that our purely unsupervised algorithm achieves superior performance compared with state-of-art methods and has good practical applications.
    Two-step reinforcement learning for model-free redesign of nonlinear optimal regulator. (arXiv:2103.03808v2 [eess.SY] UPDATED)
    In many practical control applications, the performance level of a closed-loop system degrades over time due to the change of plant characteristics. Thus, there is a strong need for redesigning a controller without going through the system modeling process, which is often difficult for closed-loop systems. Reinforcement learning (RL) is one of the promising approaches that enable model-free redesign of optimal controllers for nonlinear dynamical systems based only on the measurement of the closed-loop system. However, the learning process of RL requires a considerable number of trial-and-error experiments using the poorly controlled system that may accumulate wear on the plant. To overcome this limitation, we propose a model-free two-step design method that improves the transient learning performance of RL in an optimal regulator redesign problem for unknown nonlinear systems. Specifically, we first design a linear control law that attains some degree of control performance in a model-free manner, and then, train the nonlinear optimal control law with online RL by using the designed linear control law in parallel. We introduce an offline RL algorithm for the design of the linear control law and theoretically guarantee its convergence to the LQR controller under mild assumptions. Numerical simulations show that the proposed method improves the transient learning performance and efficiency in hyperparameter tuning of RL.  ( 3 min )
    Bilevel Optimization for Feature Selection in the Data-Driven Newsvendor Problem. (arXiv:2209.05093v1 [cs.LG])
    We study the feature-based newsvendor problem, in which a decision-maker has access to historical data consisting of demand observations and exogenous features. In this setting, we investigate feature selection, aiming to derive sparse, explainable models with improved out-of-sample performance. Up to now, state-of-the-art methods utilize regularization, which penalizes the number of selected features or the norm of the solution vector. As an alternative, we introduce a novel bilevel programming formulation. The upper-level problem selects a subset of features that minimizes an estimate of the out-of-sample cost of ordering decisions based on a held-out validation set. The lower-level problem learns the optimal coefficients of the decision function on a training set, using only the features selected by the upper-level. We present a mixed integer linear program reformulation for the bilevel program, which can be solved to optimality with standard optimization solvers. Our computational experiments show that the method accurately recovers ground-truth features already for instances with a sample size of a few hundred observations. In contrast, regularization-based techniques often fail at feature recovery or require thousands of observations to obtain similar accuracy. Regarding out-of-sample generalization, we achieve improved or comparable cost performance.  ( 2 min )
    Low Precision Decentralized Distributed Training over IID and non-IID Data. (arXiv:2111.09389v3 [cs.LG] UPDATED)
    Decentralized distributed learning is the key to enabling large-scale machine learning (training) on edge devices utilizing private user-generated local data, without relying on the cloud. However, the practical realization of such on-device training is limited by the communication and compute bottleneck. In this paper, we propose and show the convergence of low precision decentralized training that aims to reduce the computational complexity and communication cost of decentralized training. Many feedback-based compression techniques have been proposed in the literature to reduce communication costs. To the best of our knowledge, there is no work that applies and shows compute efficient training techniques such as quantization, pruning, etc., for peer-to-peer decentralized learning setups. Since real-world applications have a significant skew in the data distribution, we design "Range-EvoNorm" as the normalization activation layer which is better suited for low precision training over non-IID data. Moreover, we show that the proposed low precision training can be used in synergy with other communication compression methods decreasing the communication cost further. Our experiments indicate that 8-bit decentralized training has minimal accuracy loss compared to its full precision counterpart even with non-IID data. However, when low precision training is accompanied by communication compression through sparsification we observe a 1-2% drop in accuracy. The proposed low precision decentralized training decreases computational complexity, memory usage, and communication cost by 4x and compute energy by a factor of ~20x, while trading off less than a $1\%$ accuracy for both IID and non-IID data. In particular, with higher skew values, we observe an increase in accuracy (by ~ 0.5%) with low precision training, indicating the regularization effect of the quantization.  ( 3 min )
    Wake-Cough: cough spotting and cougher identification for personalised long-term cough monitoring. (arXiv:2110.03771v2 [cs.SD] UPDATED)
    We present `wake-cough', an application of wake-word spotting to coughs using a Resnet50 and the identification of coughers using i-vectors, for the purpose of a long-term, personalised cough monitoring system. Coughs, recorded in a quiet (73$\pm$5 dB) and noisy (34$\pm$17 dB) environment, were used to extract i-vectors, x-vectors and d-vectors, used as features to the classifiers. The system achieves 90.02\% accuracy when using an MLP to discriminate between 51 coughers using 2-sec long cough segments in the noisy environment. When discriminating between 5 and 14 coughers using longer (100 sec) segments in the quiet environment, this accuracy improves to 99.78% and 98.39% respectively. Unlike speech, i-vectors outperform x-vectors and d-vectors in identifying coughers. These coughs were added as an extra class to the Google Speech Commands dataset and features were extracted by preserving the end-to-end time-domain information in a trigger phrase. The highest accuracy of 88.58% is achieved in spotting coughs among 35 other trigger phrases using a Resnet50. Thus, wake-cough represents a personalised, non-intrusive cough monitoring system, which is power-efficient as on-device wake-word detection can keep a smartphone-based monitoring device mostly dormant. This makes wake-cough extremely attractive in multi-bed ward environments to monitor patients' long-term recovery from lung ailments such as tuberculosis (TB) and COVID-19.  ( 3 min )
    LEMON: LanguagE ModeL for Negative Sampling of Knowledge Graph Embeddings. (arXiv:2203.04703v2 [cs.AI] UPDATED)
    Knowledge Graph Embedding models have become an important area of machine learning.Those models provide a latent representation of entities and relations in a knowledge graph which can then be used in downstream machine learning tasks such as link prediction. The learning process of such models can be performed by contrasting positive and negative triples. While all triples of a KG are considered positive, negative triples are usually not readily available. Therefore, the choice of the sampling method to obtain the negative triples play a crucial role in the performance and effectiveness of Knowledge Graph Embedding models. Most of the current methods fetch negative samples from a random distribution of entities in the underlying Knowledge Graph which also often includes meaningless triples. Other known methods use adversarial techniques or generative neural networks which consequently reduce the efficiency of the process. In this paper, we propose an approach for generating informative negative samples considering available complementary knowledge about entities. Particularly, Pre-trained Language Models are used to form neighborhood clusters by utilizing the distances between entities to obtain representations of symbolic entities via their textual information. Our comprehensive evaluations demonstrate the effectiveness of the proposed approach on benchmark Knowledge Graphs with textual information for the link prediction task.
    A Note on the Efficient Evaluation of PAC-Bayes Bounds. (arXiv:2209.05188v1 [cs.LG])
    When utilising PAC-Bayes theory for risk certification, it is usually necessary to estimate and bound the Gibbs risk of the PAC-Bayes posterior. Many works in the literature employ a method for this which requires a large number of passes of the dataset, incurring high computational cost. This manuscript presents a very general alternative which makes computational savings on the order of the dataset size.  ( 2 min )
    The Interplay Between Implicit Bias and Benign Overfitting in Two-Layer Linear Networks. (arXiv:2108.11489v3 [stat.ML] UPDATED)
    The recent success of neural network models has shone light on a rather surprising statistical phenomenon: statistical models that perfectly fit noisy data can generalize well to unseen test data. Understanding this phenomenon of $\textit{benign overfitting}$ has attracted intense theoretical and empirical study. In this paper, we consider interpolating two-layer linear neural networks trained with gradient flow on the squared loss and derive bounds on the excess risk when the covariates satisfy sub-Gaussianity and anti-concentration properties, and the noise is independent and sub-Gaussian. By leveraging recent results that characterize the implicit bias of this estimator, our bounds emphasize the role of both the quality of the initialization as well as the properties of the data covariance matrix in achieving low excess risk.  ( 2 min )
    Preserving Privacy in Federated Learning with Ensemble Cross-Domain Knowledge Distillation. (arXiv:2209.04599v1 [cs.CR])
    Federated Learning (FL) is a machine learning paradigm where local nodes collaboratively train a central model while the training data remains decentralized. Existing FL methods typically share model parameters or employ co-distillation to address the issue of unbalanced data distribution. However, they suffer from communication bottlenecks. More importantly, they risk privacy leakage. In this work, we develop a privacy preserving and communication efficient method in a FL framework with one-shot offline knowledge distillation using unlabeled, cross-domain public data. We propose a quantized and noisy ensemble of local predictions from completely trained local models for stronger privacy guarantees without sacrificing accuracy. Based on extensive experiments on image classification and text classification tasks, we show that our privacy-preserving method outperforms baseline FL algorithms with superior performance in both accuracy and communication efficiency.
    Diffusion Models in Vision: A Survey. (arXiv:2209.04747v1 [cs.CV])
    Denoising diffusion models represent a recent emerging topic in computer vision, demonstrating remarkable results in the area of generative modeling. A diffusion model is a deep generative model that is based on two stages, a forward diffusion stage and a reverse diffusion stage. In the forward diffusion stage, the input data is gradually perturbed over several steps by adding Gaussian noise. In the reverse stage, a model is tasked at recovering the original input data by learning to gradually reverse the diffusion process, step by step. Diffusion models are widely appreciated for the quality and diversity of the generated samples, despite their known computational burdens, i.e. low speeds due to the high number of steps involved during sampling. In this survey, we provide a comprehensive review of articles on denoising diffusion models applied in vision, comprising both theoretical and practical contributions in the field. First, we identify and present three generic diffusion modeling frameworks, which are based on denoising diffusion probabilistic models, noise conditioned score networks, and stochastic differential equations. We further discuss the relations between diffusion models and other deep generative models, including variational auto-encoders, generative adversarial networks, energy-based models, autoregressive models and normalizing flows. Then, we introduce a multi-perspective categorization of diffusion models applied in computer vision. Finally, we illustrate the current limitations of diffusion models and envision some interesting directions for future research.
    Unsupervised Domain Adaptation for Extra Features in the Target Domain Using Optimal Transport. (arXiv:2209.04594v1 [cs.LG])
    Domain adaptation aims to transfer knowledge of labeled instances obtained from a source domain to a target domain to fill the gap between the domains. Most domain adaptation methods assume that the source and target domains have the same dimensionality. Methods that are applicable when the number of features is different in each domain have rarely been studied, especially when no label information is given for the test data obtained from the target domain. In this paper, it is assumed that common features exist in both domains and that extra (new additional) features are observed in the target domain; hence, the dimensionality of the target domain is higher than that of the source domain. To leverage the homogeneity of the common features, the adaptation between these source and target domains is formulated as an optimal transport (OT) problem. In addition, a learning bound in the target domain for the proposed OT-based method is derived. The proposed algorithm is validated using both simulated and real-world data.
    Structured Q-learning For Antibody Design. (arXiv:2209.04698v1 [cs.LG])
    Optimizing combinatorial structures is core to many real-world problems, such as those encountered in life sciences. For example, one of the crucial steps involved in antibody design is to find an arrangement of amino acids in a protein sequence that improves its binding with a pathogen. Combinatorial optimization of antibodies is difficult due to extremely large search spaces and non-linear objectives. Even for modest antibody design problems, where proteins have a sequence length of eleven, we are faced with searching over 2.05 x 10^14 structures. Applying traditional Reinforcement Learning algorithms such as Q-learning to combinatorial optimization results in poor performance. We propose Structured Q-learning (SQL), an extension of Q-learning that incorporates structural priors for combinatorial optimization. Using a molecular docking simulator, we demonstrate that SQL finds high binding energy sequences and performs favourably against baselines on eight challenging antibody design tasks, including designing antibodies for SARS-COV.
    Robust Adaptive Submodular Maximization. (arXiv:2107.11333v4 [cs.DS] UPDATED)
    The goal of a sequential decision making problem is to design an interactive policy that adaptively selects a group of items, each selection is based on the feedback from the past, in order to maximize the expected utility of selected items. It has been shown that the utility functions of many real-world applications are adaptive submodular. However, most of existing studies on adaptive submodular optimization focus on the average-case. Unfortunately, a policy that has a good average-case performance may have very poor performance under the worst-case realization. In this study, we propose to study two variants of adaptive submodular optimization problems, namely, worst-case adaptive submodular maximization and robust submodular maximization. The first problem aims to find a policy that maximizes the worst-case utility and the latter one aims to find a policy, if any, that achieves both near optimal average-case utility and worst-case utility simultaneously. We introduce a new class of stochastic functions, called \emph{worst-case submodular function}. For the worst-case adaptive submodular maximization problem subject to a $p$-system constraint, we develop an adaptive worst-case greedy policy that achieves a $\frac{1}{p+1}$ approximation ratio against the optimal worst-case utility if the utility function is worst-case submodular. For the robust adaptive submodular maximization problem subject to cardinality constraints (resp. partition matroid constraints), if the utility function is both worst-case submodular and adaptive submodular, we develop a hybrid adaptive policy that achieves an approximation close to $1-e^{-\frac{1}{2}}$ (resp. $1/3$) under both worst- and average-case settings simultaneously. We also describe several applications of our theoretical results, including pool-base active learning, stochastic submodular set cover and adaptive viral marketing.  ( 3 min )
    Free Energy Node Embedding via Generalized Skip-gram with Negative Sampling. (arXiv:2105.09182v2 [cs.LG] UPDATED)
    A widely established set of unsupervised node embedding methods can be interpreted as consisting of two distinctive steps: i) the definition of a similarity matrix based on the graph of interest followed by ii) an explicit or implicit factorization of such matrix. Inspired by this viewpoint, we propose improvements in both steps of the framework. On the one hand, we propose to encode node similarities based on the free energy distance, which interpolates between the shortest path and the commute time distances, thus, providing an additional degree of flexibility. On the other hand, we propose a matrix factorization method based on a loss function that generalizes that of the skip-gram model with negative sampling to arbitrary similarity matrices. Compared with factorizations based on the widely used $\ell_2$ loss, the proposed method can better preserve node pairs associated with higher similarity scores. Moreover, it can be easily implemented using advanced automatic differentiation toolkits and computed efficiently by leveraging GPU resources. Node clustering, node classification, and link prediction experiments on real-world datasets demonstrate the effectiveness of incorporating free-energy-based similarities as well as the proposed matrix factorization compared with state-of-the-art alternatives.  ( 3 min )
    A Deterministic Approximation to Neural SDEs. (arXiv:2006.08973v6 [cs.LG] UPDATED)
    Neural Stochastic Differential Equations (NSDEs) model the drift and diffusion functions of a stochastic process as neural networks. While NSDEs are known to make accurate predictions, their uncertainty quantification properties have been remained unexplored so far. We report the empirical finding that obtaining well-calibrated uncertainty estimations from NSDEs is computationally prohibitive. As a remedy, we develop a computationally affordable deterministic scheme which accurately approximates the transition kernel, when dynamics is governed by a NSDE. Our method introduces a bidimensional moment matching algorithm: vertical along the neural net layers and horizontal along the time direction, which benefits from an original combination of effective approximations. Our deterministic approximation of the transition kernel is applicable to both training and prediction. We observe in multiple experiments that the uncertainty calibration quality of our method can be matched by Monte Carlo sampling only after introducing high computational cost. Thanks to the numerical stability of deterministic training, our method also improves prediction accuracy.  ( 3 min )
    GRNN: Generative Regression Neural Network -- A Data Leakage Attack for Federated Learning. (arXiv:2105.00529v3 [cs.LG] UPDATED)
    Data privacy has become an increasingly important issue in Machine Learning (ML), where many approaches have been developed to tackle this challenge, e.g. cryptography (Homomorphic Encryption (HE), Differential Privacy (DP), etc.) and collaborative training (Secure Multi-Party Computation (MPC), Distributed Learning and Federated Learning (FL)). These techniques have a particular focus on data encryption or secure local computation. They transfer the intermediate information to the third party to compute the final result. Gradient exchanging is commonly considered to be a secure way of training a robust model collaboratively in Deep Learning (DL). However, recent researches have demonstrated that sensitive information can be recovered from the shared gradient. Generative Adversarial Network (GAN), in particular, has shown to be effective in recovering such information. However, GAN based techniques require additional information, such as class labels which are generally unavailable for privacy-preserved learning. In this paper, we show that, in the FL system, image-based privacy data can be easily recovered in full from the shared gradient only via our proposed Generative Regression Neural Network (GRNN). We formulate the attack to be a regression problem and optimize two branches of the generative model by minimizing the distance between gradients. We evaluate our method on several image classification tasks. The results illustrate that our proposed GRNN outperforms state-of-the-art methods with better stability, stronger robustness, and higher accuracy. It also has no convergence requirement to the global FL model. Moreover, we demonstrate information leakage using face re-identification. Some defense strategies are also discussed in this work.  ( 3 min )
    Continual learning benefits from multiple sleep mechanisms: NREM, REM, and Synaptic Downscaling. (arXiv:2209.05245v1 [cs.NE])
    Learning new tasks and skills in succession without losing prior learning (i.e., catastrophic forgetting) is a computational challenge for both artificial and biological neural networks, yet artificial systems struggle to achieve parity with their biological analogues. Mammalian brains employ numerous neural operations in support of continual learning during sleep. These are ripe for artificial adaptation. Here, we investigate how modeling three distinct components of mammalian sleep together affects continual learning in artificial neural networks: (1) a veridical memory replay process observed during non-rapid eye movement (NREM) sleep; (2) a generative memory replay process linked to REM sleep; and (3) a synaptic downscaling process which has been proposed to tune signal-to-noise ratios and support neural upkeep. We find benefits from the inclusion of all three sleep components when evaluating performance on a continual learning CIFAR-100 image classification benchmark. Maximum accuracy improved during training and catastrophic forgetting was reduced during later tasks. While some catastrophic forgetting persisted over the course of network training, higher levels of synaptic downscaling lead to better retention of early tasks and further facilitated the recovery of early task accuracy during subsequent training. One key takeaway is that there is a trade-off at hand when considering the level of synaptic downscaling to use - more aggressive downscaling better protects early tasks, but less downscaling enhances the ability to learn new tasks. Intermediate levels can strike a balance with the highest overall accuracies during training. Overall, our results both provide insight into how to adapt sleep components to enhance artificial continual learning systems and highlight areas for future neuroscientific sleep research to further such systems.  ( 3 min )
    Exploring Autoencoder-based Error-bounded Compression for Scientific Data. (arXiv:2105.11730v6 [cs.LG] UPDATED)
    Error-bounded lossy compression is becoming an indispensable technique for the success of today's scientific projects with vast volumes of data produced during the simulations or instrument data acquisitions. Not only can it significantly reduce data size, but it also can control the compression errors based on user-specified error bounds. Autoencoder (AE) models have been widely used in image compression, but few AE-based compression approaches support error-bounding features, which are highly required by scientific applications. To address this issue, we explore using convolutional autoencoders to improve error-bounded lossy compression for scientific data, with the following three key contributions. (1) We provide an in-depth investigation of the characteristics of various autoencoder models and develop an error-bounded autoencoder-based framework in terms of the SZ model. (2) We optimize the compression quality for main stages in our designed AE-based error-bounded compression framework, fine-tuning the block sizes and latent sizes and also optimizing the compression efficiency of latent vectors. (3) We evaluate our proposed solution using five real-world scientific datasets and comparing them with six other related works. Experiments show that our solution exhibits a very competitive compression quality from among all the compressors in our tests. In absolute terms, it can obtain a much better compression quality (100% ~ 800% improvement in compression ratio with the same data distortion) compared with SZ2.1 and ZFP in cases with a high compression ratio.  ( 3 min )
    ProductAE: Towards Training Larger Channel Codes based on Neural Product Codes. (arXiv:2110.04466v2 [cs.IT] UPDATED)
    There have been significant research activities in recent years to automate the design of channel encoders and decoders via deep learning. Due the dimensionality challenge in channel coding, it is prohibitively complex to design and train relatively large neural channel codes via deep learning techniques. Consequently, most of the results in the literature are limited to relatively short codes having less than 100 information bits. In this paper, we construct ProductAEs, a computationally efficient family of deep-learning driven (encoder, decoder) pairs, that aim at enabling the training of relatively large channel codes (both encoders and decoders) with a manageable training complexity. We build upon the ideas from classical product codes, and propose constructing large neural codes using smaller code components. More specifically, instead of directly training the encoder and decoder for a large neural code of dimension $k$ and blocklength $n$, we provide a framework that requires training neural encoders and decoders for the code parameters $(n_1,k_1)$ and $(n_2,k_2)$ such that $n_1 n_2=n$ and $k_1 k_2=k$. Our training results show significant gains, over all ranges of signal-to-noise ratio (SNR), for a code of parameters $(225,100)$ and a moderate-length code of parameters $(441,196)$, over polar codes under successive cancellation (SC) decoder. Moreover, our results demonstrate meaningful gains over Turbo Autoencoder (TurboAE) and state-of-the-art classical codes. This is the first work to design product autoencoders and a pioneering work on training large channel codes.  ( 3 min )
    Solving non-linear Kolmogorov equations in large dimensions by using deep learning: a numerical comparison of discretization schemes. (arXiv:2012.07747v3 [math.NA] UPDATED)
    Non-linear partial differential Kolmogorov equations are successfully used to describe a wide range of time dependent phenomena, in natural sciences, engineering or even finance. For example, in physical systems, the Allen-Cahn equation describes pattern formation associated to phase transitions. In finance, instead, the Black-Scholes equation describes the evolution of the price of derivative investment instruments. Such modern applications often require to solve these equations in high-dimensional regimes in which classical approaches are ineffective. Recently, an interesting new approach based on deep learning has been introduced by E, Han, and Jentzen [1][2]. The main idea is to construct a deep network which is trained from the samples of discrete stochastic differential equations underlying Kolmogorov's equation. The network is able to approximate, numerically at least, the solutions of the Kolmogorov equation with polynomial complexity in whole spatial domains. In this contribution we study variants of the deep networks by using different discretizations schemes of the stochastic differential equation. We compare the performance of the associated networks, on benchmarked examples, and show that, for some discretization schemes, improvements in the accuracy are possible without affecting the observed computational complexity.  ( 3 min )
    Monitoring of functional profiles combining the notion of Fr\'echet mean and the framework of deformation models with application in ambient air pollution surveillance. (arXiv:2010.02968v2 [stat.ME] UPDATED)
    A framework suitable for monitoring functional profiles combining the notion of Fr\'echet mean and the concept of deformation models is developed and proposed. The generalized sense of mean that the notion of the Fr\'echet mean offers is employed to capture the typical functional shape of the data, while the concept of deformation models allows for interpretable parameterizations of profile's deviations from the typical shape. Functional EWMA-type control charts are built and proposed based on shape characteristics of the functional data, allowing for (a) identifying shifts from the in-control behaviour and (b) providing causal relationships of the potential shifts with significant deviances of certain qualitative characteristics (e.g amplitude or phase deformations). The functional monitoring scheme is implemented to assess ambient air pollution. In particular, the method is implemented to a synthetic data example to assess its performance under various conditions, and to a real-world example using sensor data from an area in the city of Athens, where air pollutants profiles and their characteristics are successfully analyzed and out-of-control behaviours are identified.  ( 3 min )
    Memorization and Generalization in Neural Code Intelligence Models. (arXiv:2106.08704v3 [cs.LG] UPDATED)
    Deep Neural Networks (DNNs) are increasingly being used in software engineering and code intelligence tasks. These are powerful tools that are capable of learning highly generalizable patterns from large datasets through millions of parameters. At the same time, their large capacity can render them prone to memorizing data points. Recent work suggests that the memorization risk manifests especially strongly when the training dataset is noisy, involving many ambiguous or questionable samples, and memorization is the only recourse. The goal of this paper is to evaluate and compare the extent of memorization and generalization in neural code intelligence models. It aims to provide insights on how memorization may impact the learning behavior of neural models in code intelligence systems. To observe the extent of memorization in models, we add random noise to the original training dataset and use various metrics to quantify the impact of noise on various aspects of training and testing. We evaluate several state-of-the-art neural code intelligence models and benchmarks based on Java, Python, and Ruby codebases. Our results highlight important risks: millions of trainable parameters allow the neural networks to memorize anything, including noisy data, and provide a false sense of generalization. We observed all models manifest some forms of memorization. This can be potentially troublesome in most code intelligence tasks where they rely on rather noise-prone and repetitive data sources, such as code from GitHub. To the best of our knowledge, we provide the first study to quantify memorization effects in the domain of software engineering and code intelligence systems. This work raises awareness and provides new insights into important issues of training neural models in code intelligence systems that are usually overlooked by software engineering researchers.  ( 3 min )
    On the Hyperparameters in Stochastic Gradient Descent with Momentum. (arXiv:2108.03947v2 [cs.LG] UPDATED)
    Following the same routine as [SSJ20], we continue to present the theoretical analysis for stochastic gradient descent with momentum (SGD with momentum) in this paper. Differently, for SGD with momentum, we demonstrate it is the two hyperparameters together, the learning rate and the momentum coefficient, that play the significant role for the linear rate of convergence in non-convex optimization. Our analysis is based on the use of a hyperparameters-dependent stochastic differential equation (hp-dependent SDE) that serves as a continuous surrogate for SGD with momentum. Similarly, we establish the linear convergence for the continuous-time formulation of SGD with momentum and obtain an explicit expression for the optimal linear rate by analyzing the spectrum of the Kramers-Fokker-Planck operator. By comparison, we demonstrate how the optimal linear rate of convergence and the final gap for SGD only about the learning rate varies with the momentum coefficient increasing from zero to one when the momentum is introduced. Then, we propose a mathematical interpretation why the SGD with momentum converges faster and more robust about the learning rate than the standard SGD in practice. Finally, we show the Nesterov momentum under the existence of noise has no essential difference with the standard momentum.  ( 3 min )
    A Deep Learning Approach To Estimation Using Measurements Received Over a Network. (arXiv:2201.08020v2 [cs.LG] UPDATED)
    We propose a novel deep neural network (DNN) based approximation architecture to learn estimates of measurements. We detail an algorithm that enables training of the DNN. The DNN estimator only uses measurements, if and when they are received over a communication network. The measurements are communicated over a network as packets, at a rate unknown to the estimator. Packets may suffer drops and need retransmission. They may suffer waiting delays as they traverse a network path. Works on estimation often assume knowledge of the dynamic model of the measured system, which may not be available in practice. The DNN estimator doesn't assume knowledge of the dynamic system model or the communication network. It doesn't require a history of measurements, often used by other works. The DNN estimator results in significantly smaller average estimation error than the commonly used Time-varying Kalman Filter and the Unscented Kalman Filter, in simulations of linear and nonlinear dynamic systems. The DNN need not be trained separately for different communications network settings. It is robust to errors in estimation of network delays that occur due to imperfect time synchronization between the measurement source and the estimator. Last but not the least, our simulations shed light on the rate of updates that result in low estimation error.  ( 3 min )
    Exploration and Incentives in Reinforcement Learning. (arXiv:2103.00360v3 [cs.LG] UPDATED)
    How do you incentivize self-interested agents to $\textit{explore}$ when they prefer to $\textit{exploit}$? We consider complex exploration problems, where each agent faces the same (but unknown) MDP. In contrast with traditional formulations of reinforcement learning, agents control the choice of policies, whereas an algorithm can only issue recommendations. However, the algorithm controls the flow of information, and can incentivize the agents to explore via information asymmetry. We design an algorithm which explores all reachable states in the MDP. We achieve provable guarantees similar to those for incentivizing exploration in static, stateless exploration problems studied previously. To the best of our knowledge, this is the first work to consider mechanism design in a stateful, reinforcement learning setting.  ( 2 min )
    Forecasting Daily COVID-19 Related Calls in VA Health Care System: Predictive Model Development. (arXiv:2111.13980v3 [cs.LG] UPDATED)
    Background: COVID-19 has become a challenge worldwide and properly planning of medical resources is the key to combating COVID-19. In the US Veteran Affairs Health Care System (VA), many of the enrollees are susceptible to COVID-19. Predicting the COVID-19 to allocate medical resources promptly becomes a critical issue. When the VA enrollees have COVID-19 symptoms, it is recommended that their first step should be to call the VA Call Center. For confirmed COVID-19 patients, the median time from the first symptom to hospital admission was seven days. By predicting the number of COVID-19 related calls, we could predict imminent surges in healthcare use and plan medical resources ahead. Objective: The study aims to develop a method to forecast the daily number of COVID-19 related calls for each of the 110 VA medical centers. Methods: In the proposed method, we pre-trained a model using a cluster of medical centers and fine-tuned it for individual medical centers. At the cluster level, we performed feature selection to select significant features and automatic hyper-parameter search to select optimal hyper-parameter value combinations for the model. Conclusions: This study proposed an accurate method to forecast the daily number of COVID-19 related calls for VA medical centers. The proposed method was able to overcome modeling challenges by grouping similar medical centers into clusters to enlarge the dataset for training models, and using hyper-parameter search to automatically find optimal hyper-parameter value combinations for models. With the proposed method, surges in health care can be predicted ahead. This allows health care practitioners to better plan medical resources and combat COVID-19.  ( 3 min )
    SeanNet: Semantic Understanding Network for Localization Under Object Dynamics. (arXiv:2110.02276v2 [cs.RO] UPDATED)
    We aim for domestic robots to perform long-term indoor service. Under the object-level scene dynamics induced by daily human activities, a robot needs to robustly localize itself in the environment subject to scene uncertainties. Previous works have addressed visual-based localization in static environments, yet the object-level scene dynamics challenge existing methods for the long-term deployment of the robot. This paper proposes a SEmantic understANding Network (SeanNet) architecture that enables an effective learning process with coupled visual and semantic inputs. With a dataset that contains object dynamics, we propose a cascaded contrastive learning scheme to train the SeanNet for learning a vector scene embedding. Subsequently, we can measure the similarity between the current observed scene and the target scene, whereby enables robust localization under object-level dynamics. In our experiments, we benchmark SeanNet against state-of-the-art image-encoding networks (baselines) on scene similarity measures. The SeanNet architecture with the proposed training method can achieve an 85.02\% accuracy which is higher than baselines. We further integrate the SeanNet and the other networks as the localizers into a visual navigation application. We demonstrate that SeanNet achieves higher success rates compared to the baselines.  ( 3 min )
    Spotting Virus from Satellites: Modeling the Circulation of West Nile Virus Through Graph Neural Networks. (arXiv:2209.05251v1 [cs.CV])
    The occurrence of West Nile Virus (WNV) represents one of the most common mosquito-borne zoonosis viral infections. Its circulation is usually associated with climatic and environmental conditions suitable for vector proliferation and virus replication. On top of that, several statistical models have been developed to shape and forecast WNV circulation: in particular, the recent massive availability of Earth Observation (EO) data, coupled with the continuous advances in the field of Artificial Intelligence, offer valuable opportunities. In this paper, we seek to predict WNV circulation by feeding Deep Neural Networks (DNNs) with satellite images, which have been extensively shown to hold environmental and climatic features. Notably, while previous approaches analyze each geographical site independently, we propose a spatial-aware approach that considers also the characteristics of close sites. Specifically, we build upon Graph Neural Networks (GNN) to aggregate features from neighbouring places, and further extend these modules to consider multiple relations, such as the difference in temperature and soil moisture between two sites, as well as the geographical distance. Moreover, we inject time-related information directly into the model to take into account the seasonality of virus spread. We design an experimental setting that combines satellite images - from Landsat and Sentinel missions - with ground truth observations of WNV circulation in Italy. We show that our proposed Multi-Adjacency Graph Attention Network (MAGAT) consistently leads to higher performance when paired with an appropriate pre-training stage. Finally, we assess the importance of each component of MAGAT in our ablation studies.  ( 3 min )
    Weight Expansion: A New Perspective on Dropout and Generalization. (arXiv:2201.09209v2 [cs.LG] UPDATED)
    While dropout is known to be a successful regularization technique, insights into the mechanisms that lead to this success are still lacking. We introduce the concept of \emph{weight expansion}, an increase in the signed volume of a parallelotope spanned by the column or row vectors of the weight covariance matrix, and show that weight expansion is an effective means of increasing the generalization in a PAC-Bayesian setting. We provide a theoretical argument that dropout leads to weight expansion and extensive empirical support for the correlation between dropout and weight expansion. To support our hypothesis that weight expansion can be regarded as an \emph{indicator} of the enhanced generalization capability endowed by dropout, and not just as a mere by-product, we have studied other methods that achieve weight expansion (resp.\ contraction), and found that they generally lead to an increased (resp.\ decreased) generalization ability. This suggests that dropout is an attractive regularizer, because it is a computationally cheap method for obtaining weight expansion. This insight justifies the role of dropout as a regularizer, while paving the way for identifying regularizers that promise improved generalization through weight expansion.  ( 3 min )
    On the Stability of Nonlinear Receding Horizon Control: A Geometric Perspective. (arXiv:2103.15010v2 [math.OC] UPDATED)
    The widespread adoption of nonlinear Receding Horizon Control (RHC) strategies by industry has led to more than 30 years of intense research efforts to provide stability guarantees for these methods. However, current theoretical guarantees require that each (generally nonconvex) planning problem can be solved to (approximate) global optimality, which is an unrealistic requirement for the derivative-based local optimization methods generally used in practical implementations of RHC. This paper takes the first step towards understanding stability guarantees for nonlinear RHC when the inner planning problem is solved to first-order stationary points, but not necessarily global optima. Special attention is given to feedback linearizable systems, and a mixture of positive and negative results are provided. We establish that, under certain strong conditions, first-order solutions to RHC exponentially stabilize linearizable systems. Crucially, this guarantee requires that state costs applied to the planning problems are in a certain sense `compatible' with the global geometry of the system, and a simple counter-example demonstrates the necessity of this condition. These results highlight the need to rethink the role of global geometry in the context of optimization-based control.  ( 3 min )
    Gradient-Free Methods for Saddle-Point Problem. (arXiv:2005.05913v4 [math.OC] UPDATED)
    In the paper, we generalize the approach Gasnikov et. al, 2017, which allows to solve (stochastic) convex optimization problems with an inexact gradient-free oracle, to the convex-concave saddle-point problem. The proposed approach works, at least, like the best existing approaches. But for a special set-up (simplex type constraints and closeness of Lipschitz constants in 1 and 2 norms) our approach reduces $\frac{n}{\log n}$ times the required number of oracle calls (function calculations). Our method uses a stochastic approximation of the gradient via finite differences. In this case, the function must be specified not only on the optimization set itself, but in a certain neighbourhood of it. In the second part of the paper, we analyze the case when such an assumption cannot be made, we propose a general approach on how to modernize the method to solve this problem, and also we apply this approach to particular cases of some classical sets.  ( 3 min )
    Subquadratic Kronecker Regression with Applications to Tensor Decomposition. (arXiv:2209.04876v1 [cs.DS])
    Kronecker regression is a highly-structured least squares problem $\min_{\mathbf{x}} \lVert \mathbf{K}\mathbf{x} - \mathbf{b} \rVert_{2}^2$, where the design matrix $\mathbf{K} = \mathbf{A}^{(1)} \otimes \cdots \otimes \mathbf{A}^{(N)}$ is a Kronecker product of factor matrices. This regression problem arises in each step of the widely-used alternating least squares (ALS) algorithm for computing the Tucker decomposition of a tensor. We present the first subquadratic-time algorithm for solving Kronecker regression to a $(1+\varepsilon)$-approximation that avoids the exponential term $O(\varepsilon^{-N})$ in the running time. Our techniques combine leverage score sampling and iterative methods. By extending our approach to block-design matrices where one block is a Kronecker product, we also achieve subquadratic-time algorithms for (1) Kronecker ridge regression and (2) updating the factor matrix of a Tucker decomposition in ALS, which is not a pure Kronecker regression problem, thereby improving the running time of all steps of Tucker ALS. We demonstrate the speed and accuracy of this Kronecker regression algorithm on synthetic data and real-world image tensors.  ( 2 min )
    The shape and simplicity biases of adversarially robust ImageNet-trained CNNs. (arXiv:2006.09373v6 [cs.CV] UPDATED)
    Increasingly more similarities between human vision and convolutional neural networks (CNNs) have been revealed in the past few years. Yet, vanilla CNNs often fall short in generalizing to adversarial or out-of-distribution (OOD) examples which humans demonstrate superior performance. Adversarial training is a leading learning algorithm for improving the robustness of CNNs on adversarial and OOD data; however, little is known about the properties, specifically the shape bias and internal features learned inside adversarially-robust CNNs. In this paper, we perform a thorough, systematic study to understand the shape bias and some internal mechanisms that enable the generalizability of AlexNet, GoogLeNet, and ResNet-50 models trained via adversarial training. We find that while standard ImageNet classifiers have a strong texture bias, their R counterparts rely heavily on shapes. Remarkably, adversarial training induces three simplicity biases into hidden neurons in the process of "robustifying" CNNs. That is, each convolutional neuron in R networks often changes to detecting (1) pixel-wise smoother patterns, i.e., a mechanism that blocks high-frequency noise from passing through the network; (2) more lower-level features i.e. textures and colors (instead of objects);and (3) fewer types of inputs. Our findings reveal the interesting mechanisms that made networks more adversarially robust and also explain some recent findings e.g., why R networks benefit from a much larger capacity (Xie et al. 2020) and can act as a strong image prior in image synthesis (Santurkar et al. 2019).  ( 3 min )
    $\mu$DARTS: Model Uncertainty-Aware Differentiable Architecture Search. (arXiv:2107.11500v2 [cs.LG] UPDATED)
    We present a Model Uncertainty-aware Differentiable ARchiTecture Search ($\mu$DARTS) that optimizes neural networks to simultaneously achieve high accuracy and low uncertainty. We introduce concrete dropout within DARTS cells and include a Monte-Carlo regularizer within the training loss to optimize the concrete dropout probabilities. A predictive variance term is introduced in the validation loss to enable searching for architecture with minimal model uncertainty. The experiments on CIFAR10, CIFAR100, SVHN, and ImageNet verify the effectiveness of $\mu$DARTS in improving accuracy and reducing uncertainty compared to existing DARTS methods. Moreover, the final architecture obtained from $\mu$DARTS shows higher robustness to noise at the input image and model parameters compared to the architecture obtained from existing DARTS methods.  ( 2 min )
    A Nonparametric Contextual Bandit with Arm-level Eligibility Control for Customer Service Routing. (arXiv:2209.05278v1 [cs.LG])
    Amazon Customer Service provides real-time support for millions of customer contacts every year. While bot-resolver helps automate some traffic, we still see high demand for human agents, also called subject matter experts (SMEs). Customers outreach with questions in different domains (return policy, device troubleshooting, etc.). Depending on their training, not all SMEs are eligible to handle all contacts. Routing contacts to eligible SMEs turns out to be a non-trivial problem because SMEs' domain eligibility is subject to training quality and can change over time. To optimally recommend SMEs while simultaneously learning the true eligibility status, we propose to formulate the routing problem with a nonparametric contextual bandit algorithm (K-Boot) plus an eligibility control (EC) algorithm. K-Boot models reward with a kernel smoother on similar past samples selected by $k$-NN, and Bootstrap Thompson Sampling for exploration. EC filters arms (SMEs) by the initially system-claimed eligibility and dynamically validates the reliability of this information. The proposed K-Boot is a general bandit algorithm, and EC is applicable to other bandits. Our simulation studies show that K-Boot performs on par with state-of-the-art Bandit models, and EC boosts K-Boot performance when stochastic eligibility signal exists.  ( 2 min )
    Detecting Network-based Internet Censorship via Latent Feature Representation Learning. (arXiv:2209.05152v1 [cs.LG])
    Internet censorship is a phenomenon of societal importance and attracts investigation from multiple disciplines. Several research groups, such as Censored Planet, have deployed large scale Internet measurement platforms to collect network reachability data. However, existing studies generally rely on manually designed rules (i.e., using censorship fingerprints) to detect network-based Internet censorship from the data. While this rule-based approach yields a high true positive detection rate, it suffers from several challenges: it requires human expertise, is laborious, and cannot detect any censorship not captured by the rules. Seeking to overcome these challenges, we design and evaluate a classification model based on latent feature representation learning and an image-based classification model to detect network-based Internet censorship. To infer latent feature representations from network reachability data, we propose a sequence-to-sequence autoencoder to capture the structure and the order of data elements in the data. To estimate the probability of censorship events from the inferred latent features, we rely on a densely connected multi-layer neural network model. Our image-based classification model encodes a network reachability data record as a gray-scale image and classifies the image as censored or not using a dense convolutional neural network. We compare and evaluate both approaches using data sets from Censored Planet via a hold-out evaluation. Both classification models are capable of detecting network-based Internet censorship as we were able to identify instances of censorship not detected by the known fingerprints. Latent feature representations likely encode more nuances in the data since the latent feature learning approach discovers a greater quantity, and a more diverse set, of new censorship instances.  ( 3 min )
    Evading the Simplicity Bias: Training a Diverse Set of Models Discovers Solutions with Superior OOD Generalization. (arXiv:2105.05612v3 [cs.LG] UPDATED)
    Neural networks trained with SGD were recently shown to rely preferentially on linearly-predictive features and can ignore complex, equally-predictive ones. This simplicity bias can explain their lack of robustness out of distribution (OOD). The more complex the task to learn, the more likely it is that statistical artifacts (i.e. selection biases, spurious correlations) are simpler than the mechanisms to learn. We demonstrate that the simplicity bias can be mitigated and OOD generalization improved. We train a set of similar models to fit the data in different ways using a penalty on the alignment of their input gradients. We show theoretically and empirically that this induces the learning of more complex predictive patterns. OOD generalization fundamentally requires information beyond i.i.d. examples, such as multiple training environments, counterfactual examples, or other side information. Our approach shows that we can defer this requirement to an independent model selection stage. We obtain SOTA results in visual recognition on biased data and generalization across visual domains. The method - the first to evade the simplicity bias - highlights the need for a better understanding and control of inductive biases in deep learning.  ( 3 min )
    Stochastic Compositional Gradient Descent under Compositional Constraints. (arXiv:2012.09400v4 [cs.LG] UPDATED)
    This work studies constrained stochastic optimization problems where the objective and constraint functions are convex and expressed as compositions of stochastic functions. The problem arises in the context of fair classification, fair regression, and the design of queuing systems. Of particular interest is the large-scale setting where an oracle provides the stochastic gradients of the constituent functions, and the goal is to solve the problem with a minimal number of calls to the oracle. Owing to the compositional form, the stochastic gradients provided by the oracle do not yield unbiased estimates of the objective or constraint gradients. Instead, we construct approximate gradients by tracking the inner function evaluations, resulting in a quasi-gradient saddle point algorithm. We prove that the proposed algorithm is guaranteed to find the optimal and feasible solution almost surely. We further establish that the proposed algorithm requires $\mathcal{O}(1/\epsilon^4)$ data samples in order to obtain an $\epsilon$-approximate optimal point while also ensuring zero constraint violation. The result matches the sample complexity of the stochastic compositional gradient descent method for unconstrained problems and improves upon the best-known sample complexity results for the constrained settings. The efficacy of the proposed algorithm is tested on both fair classification and fair regression problems. The numerical results show that the proposed algorithm outperforms the state-of-the-art algorithms in terms of the convergence rate.  ( 3 min )
    Ordinal Graph Gamma Belief Network for Social Recommender Systems. (arXiv:2209.05106v1 [cs.IR])
    To build recommender systems that not only consider user-item interactions represented as ordinal variables, but also exploit the social network describing the relationships between the users, we develop a hierarchical Bayesian model termed ordinal graph factor analysis (OGFA), which jointly models user-item and user-user interactions. OGFA not only achieves good recommendation performance, but also extracts interpretable latent factors corresponding to representative user preferences. We further extend OGFA to ordinal graph gamma belief network, which is a multi-stochastic-layer deep probabilistic model that captures the user preferences and social communities at multiple semantic levels. For efficient inference, we develop a parallel hybrid Gibbs-EM algorithm, which exploits the sparsity of the graphs and is scalable to large datasets. Our experimental results show that the proposed models not only outperform recent baselines on recommendation datasets with explicit or implicit feedback, but also provide interpretable latent representations.  ( 2 min )
    Optimal mesh generation for a blade passage using deep reinforcement learning. (arXiv:2209.05280v1 [cs.LG])
    A mesh generation method that can generate an optimal mesh for a blade passage at a single attempt is developed using deep reinforcement learning (DRL). Unlike the conventional methods, where meshing parameters must be specified by the user or iteratively optimized from scratch for a newly given geometry, the developed method employs DRL-based multi-condition (MC) optimization to define meshing parameters for various geometries optimally. The method involves the following steps: (1) development of a base algorithm for structured mesh generation of a blade passage; (2) formulation of an MC optimization problem to optimize meshing parameters introduced while developing the base algorithm; and (3) development of a DRL-based mesh generation algorithm by solving the MC optimization problem using DRL. As a result, the developed algorithm is able to successfully generate optimal meshes at a single trial for various blades.  ( 2 min )
    Fairness in Forecasting of Observations of Linear Dynamical Systems. (arXiv:2209.05274v1 [cs.LG])
    In machine learning, training data often capture the behaviour of multiple subgroups of some underlying human population. When the nature of training data for subgroups are not controlled carefully, under-representation bias arises. To counter this effect we introduce two natural notions of subgroup fairness and instantaneous fairness to address such under-representation bias in time-series forecasting problems. Here we show globally convergent methods for the fairness-constrained learning problems using hierarchies of convexifications of non-commutative polynomial optimisation problems. Our empirical results on a biased data set motivated by insurance applications and the well-known COMPAS data set demonstrate the efficacy of our methods. We also show that by exploiting sparsity in the convexifications, we can reduce the run time of our methods considerably.  ( 2 min )
    Modeling Dependent Structure for Utterances in ASR Evaluation. (arXiv:2209.05281v1 [eess.AS])
    The bootstrap resampling method has been popular for performing significance analysis on word error rate (WER) in automatic speech recognition (ASR) evaluations. To deal with the issue of dependent speech data, the blockwise bootstrap approach is also proposed that by dividing utterances into uncorrelated blocks, it resamples these blocks instead of original data. However, it is always nontrivial to uncover the dependent structure among utterances, which could lead to subjective findings in statistical testing. In this paper, we present graphical lasso based methods to explicitly model such dependency and estimate the independent blocks of utterances in a rigorous way. Then the blockwise bootstrap is applied on top of the inferred blocks. We show that the resulting variance estimator for WER is consistent under mild conditions. We also demonstrate the validity of proposed approach on LibriSpeech data.  ( 2 min )
    A Differentiable Loss Function for Learning Heuristics in A*. (arXiv:2209.05206v1 [cs.LG])
    Optimization of heuristic functions for the A* algorithm, realized by deep neural networks, is usually done by minimizing square root loss of estimate of the cost to goal values. This paper argues that this does not necessarily lead to a faster search of A* algorithm since its execution relies on relative values instead of absolute ones. As a mitigation, we propose a L* loss, which upper-bounds the number of excessively expanded states inside the A* search. The L* loss, when used in the optimization of state-of-the-art deep neural networks for automated planning in maze domains like Sokoban and maze with teleports, significantly improves the fraction of solved problems, the quality of founded plans, and reduces the number of expanded states to approximately 50%  ( 2 min )
    Amortised Inference in Structured Generative Models with Explaining Away. (arXiv:2209.05212v1 [cs.LG])
    A key goal of unsupervised learning is to go beyond density estimation and sample generation to reveal the structure inherent within observed data. Such structure can be expressed in the pattern of interactions between explanatory latent variables captured through a probabilistic graphical model. Although the learning of structured graphical models has a long history, much recent work in unsupervised modelling has instead emphasised flexible deep-network-based generation, either transforming independent latent generators to model complex data or assuming that distinct observed variables are derived from different latent nodes. Here, we extend the output of amortised variational inference to incorporate structured factors over multiple variables, able to capture the observation-induced posterior dependence between latents that results from "explaining away" and thus allow complex observations to depend on multiple nodes of a structured graph. We show that appropriately parameterised factors can be combined efficiently with variational message passing in elaborate graphical structures. We instantiate the framework based on Gaussian Process Factor Analysis models, and empirically evaluate its improvement over existing methods on synthetic data with known generative processes. We then fit the structured model to high-dimensional neural spiking time-series from the hippocampus of freely moving rodents, demonstrating that the model identifies latent signals that correlate with behavioural covariates.  ( 2 min )
    An Improved Algorithm For Online Reranking. (arXiv:2209.04870v1 [cs.DS])
    We study a fundamental model of online preference aggregation, where an algorithm maintains an ordered list of $n$ elements. An input is a stream of preferred sets $R_1, R_2, \dots, R_t, \dots$. Upon seeing $R_t$ and without knowledge of any future sets, an algorithm has to rerank elements (change the list ordering), so that at least one element of $R_t$ is found near the list front. The incurred cost is a sum of the list update costs (the number of swaps of neighboring list elements) and access costs (position of the first element of $R_t$ on the list). This scenario occurs naturally in applications such as ordering items in an online shop using aggregated preferences of shop customers. The theoretical underpinning of this problem is known as Min-Sum Set Cover. Unlike previous work (Fotakis et al., ICALP 2020, NIPS 2020) that mostly studied the performance of an online algorithm ALG against the static optimal solution (a single optimal list ordering), in this paper, we study an arguably harder variant where the benchmark is the provably stronger optimal dynamic solution OPT (that may also modify the list ordering). In terms of an online shop, this means that the aggregated preferences of its user base evolve with time. We construct a computationally efficient randomized algorithm whose competitive ratio (ALG-to-OPT cost ratio) is $O(r^2)$ and prove the existence of a deterministic $O(r^4)$-competitive algorithm. Here, $r$ is the maximum cardinality of sets $R_t$. This is the first algorithm whose ratio does not depend on $n$: the previously best algorithm for this problem was $O(r^{3/2} \cdot \sqrt{n})$-competitive and $\Omega(r)$ is a lower bound on the performance of any deterministic online algorithm.  ( 3 min )
    Exploring Simple and Transferable Recognition-Aware Image Processing. (arXiv:1910.09185v4 [cs.CV] UPDATED)
    Recent progress in image recognition has stimulated the deployment of vision systems at an unprecedented scale. As a result, visual data are now often consumed not only by humans but also by machines. Existing image processing methods only optimize for better human perception, yet the resulting images may not be accurately recognized by machines. This can be undesirable, e.g., the images can be improperly handled by search engines or recommendation systems. In this work, we examine simple approaches to improve machine recognition of processed images: optimizing the recognition loss directly on the image processing network or through an intermediate input transformation model. Interestingly, the processing model's ability to enhance recognition quality can transfer when evaluated on models of different architectures, recognized categories, tasks and training datasets. This makes the methods applicable even when we do not have the knowledge of future recognition models, e.g., when uploading processed images to the Internet. We conduct experiments on multiple image processing tasks paired with ImageNet classification and PASCAL VOC detection as recognition tasks. With these simple yet effective methods, substantial accuracy gain can be achieved with strong transferability and minimal image quality loss. Through a user study we further show that the accuracy gain can transfer to a black-box cloud model. Finally, we try to explain this transferability phenomenon by demonstrating the similarities of different models' decision boundaries. Code is available at https://github.com/liuzhuang13/Transferable_RA .  ( 3 min )
    Reproducibility in machine learning for medical imaging. (arXiv:2209.05097v1 [cs.CV])
    Reproducibility is a cornerstone of science, as the replication of findings is the process through which they become knowledge. It is widely considered that many fields of science are undergoing a reproducibility crisis. This has led to the publications of various guidelines in order to improve research reproducibility. This didactic chapter intends at being an introduction to reproducibility for researchers in the field of machine learning for medical imaging. We first distinguish between different types of reproducibility. For each of them, we aim at defining it, at describing the requirements to achieve it and at discussing its utility. The chapter ends with a discussion on the benefits of reproducibility and with a plea for a non-dogmatic approach to this concept and its implementation in research practice.  ( 2 min )
    SmartKex: Machine Learning Assisted SSH Keys Extraction From The Heap Dump. (arXiv:2209.05243v1 [cs.CR])
    Digital forensics is the process of extracting, preserving, and documenting evidence in digital devices. A commonly used method in digital forensics is to extract data from the main memory of a digital device. However, the main challenge is identifying the important data to be extracted. Several pieces of crucial information reside in the main memory, like usernames, passwords, and cryptographic keys such as SSH session keys. In this paper, we propose SmartKex, a machine-learning assisted method to extract session keys from heap memory snapshots of an OpenSSH process. In addition, we release an openly available dataset and the corresponding toolchain for creating additional data. Finally, we compare SmartKex with naive brute-force methods and empirically show that SmartKex can extract the session keys with high accuracy and high throughput. With the provided resources, we intend to strengthen the research on the intersection between digital forensics, cybersecurity, and machine learning.  ( 2 min )
    Adaptive Perturbation-Based Gradient Estimation for Discrete Latent Variable Models. (arXiv:2209.04862v1 [cs.LG])
    The integration of discrete algorithmic components in deep learning architectures has numerous applications. Recently, Implicit Maximum Likelihood Estimation (IMLE, Niepert, Minervini, and Franceschi 2021), a class of gradient estimators for discrete exponential family distributions, was proposed by combining implicit differentiation through perturbation with the path-wise gradient estimator. However, due to the finite difference approximation of the gradients, it is especially sensitive to the choice of the finite difference step size which needs to be specified by the user. In this work, we present Adaptive IMLE (AIMLE) the first adaptive gradient estimator for complex discrete distributions: it adaptively identifies the target distribution for IMLE by trading off the density of gradient information with the degree of bias in the gradient estimates. We empirically evaluate our estimator on synthetic examples, as well as on Learning to Explain, Discrete Variational Auto-Encoders, and Neural Relational Inference tasks. In our experiments, we show that our adaptive gradient estimator can produce faithful estimates while requiring orders of magnitude fewer samples than other gradient estimators.  ( 2 min )
    Efficient Approximate Kernel Based Spike Sequence Classification. (arXiv:2209.04952v1 [cs.LG])
    Machine learning (ML) models, such as SVM, for tasks like classification and clustering of sequences, require a definition of distance/similarity between pairs of sequences. Several methods have been proposed to compute the similarity between sequences, such as the exact approach that counts the number of matches between $k$-mers (sub-sequences of length $k$) and an approximate approach that estimates pairwise similarity scores. Although exact methods yield better classification performance, they pose high computational costs, limiting their applicability to a small number of sequences. The approximate algorithms are proven to be more scalable and perform comparably to (sometimes better than) the exact methods -- they are designed in a "general" way to deal with different types of sequences (e.g., music, protein, etc.). Although general applicability is a desired property of an algorithm, it is not the case in all scenarios. For example, in the current COVID-19 (coronavirus) pandemic, there is a need for an approach that can deal specifically with the coronavirus. To this end, we propose a series of ways to improve the performance of the approximate kernel (using minimizers and information gain) in order to enhance its predictive performance pm coronavirus sequences. More specifically, we improve the quality of the approximate kernel using domain knowledge (computed using information gain) and efficient preprocessing (using minimizers computation) to classify coronavirus spike protein sequences corresponding to different variants (e.g., Alpha, Beta, Gamma). We report results using different classification and clustering algorithms and evaluate their performance using multiple evaluation metrics. Using two datasets, we show that our proposed method helps improve the kernel's performance compared to the baseline and state-of-the-art approaches in the healthcare domain.  ( 3 min )
    HandMime: Sign Language Fingerspelling Acquisition via Imitation Learning. (arXiv:2209.05135v1 [cs.RO])
    Learning fine-grained movements is among the most challenging topics in robotics. This holds true especially for robotic hands. Robotic sign language acquisition or, more specifically, fingerspelling sign language acquisition in robots can be considered a specific instance of such challenge. In this paper, we propose an approach for learning dexterous motor imitation from videos examples, without the use of any additional information. We build an URDF model of a robotic hand with a single actuator for each joint. By leveraging pre-trained deep vision models, we extract the 3D pose of the hand from RGB videos. Then, using state-of-the-art reinforcement learning algorithms for motion imitation (namely, proximal policy optimisation), we train a policy to reproduce the movement extracted from the demonstrations. We identify the best set of hyperparameters to perform imitation based on a reference motion. Additionally, we demonstrate the ability of our approach to generalise over 6 different fingerspelled letters.  ( 2 min )
    Personalized Federated Learning with Communication Compression. (arXiv:2209.05148v1 [cs.LG])
    In contrast to training traditional machine learning (ML) models in data centers, federated learning (FL) trains ML models over local datasets contained on resource-constrained heterogeneous edge devices. Existing FL algorithms aim to learn a single global model for all participating devices, which may not be helpful to all devices participating in the training due to the heterogeneity of the data across the devices. Recently, Hanzely and Richt\'{a}rik (2020) proposed a new formulation for training personalized FL models aimed at balancing the trade-off between the traditional global model and the local models that could be trained by individual devices using their private data only. They derived a new algorithm, called Loopless Gradient Descent (L2GD), to solve it and showed that this algorithms leads to improved communication complexity guarantees in regimes when more personalization is required. In this paper, we equip their L2GD algorithm with a bidirectional compression mechanism to further reduce the communication bottleneck between the local devices and the server. Unlike other compression-based algorithms used in the FL-setting, our compressed L2GD algorithm operates on a probabilistic communication protocol, where communication does not happen on a fixed schedule. Moreover, our compressed L2GD algorithm maintains a similar convergence rate as vanilla SGD without compression. To empirically validate the efficiency of our algorithm, we perform diverse numerical experiments on both convex and non-convex problems and using various compression techniques.  ( 3 min )
    Dimensionality Reduction using Elastic Measures. (arXiv:2209.04933v1 [cs.LG])
    With the recent surge in big data analytics for hyper-dimensional data there is a renewed interest in dimensionality reduction techniques for machine learning applications. In order for these methods to improve performance gains and understanding of the underlying data, a proper metric needs to be identified. This step is often overlooked and metrics are typically chosen without consideration of the underlying geometry of the data. In this paper, we present a method for incorporating elastic metrics into the t-distributed Stochastic Neighbor Embedding (t-SNE) and Uniform Manifold Approximation and Projection (UMAP). We apply our method to functional data, which is uniquely characterized by rotations, parameterization, and scale. If these properties are ignored, they can lead to incorrect analysis and poor classification performance. Through our method we demonstrate improved performance on shape identification tasks for three benchmark data sets (MPEG-7, Car data set, and Plane data set of Thankoor), where we achieve 0.77, 0.95, and 1.00 F1 score, respectively.  ( 2 min )
    Bias Challenges in Counterfactual Data Augmentation. (arXiv:2209.05104v1 [cs.LG])
    Deep learning models tend not to be out-of-distribution robust primarily due to their reliance on spurious features to solve the task. Counterfactual data augmentations provide a general way of (approximately) achieving representations that are counterfactual-invariant to spurious features, a requirement for out-of-distribution (OOD) robustness. In this work, we show that counterfactual data augmentations may not achieve the desired counterfactual-invariance if the augmentation is performed by a {\em context-guessing machine}, an abstract machine that guesses the most-likely context of a given input. We theoretically analyze the invariance imposed by such counterfactual data augmentations and describe an exemplar NLP task where counterfactual data augmentation by a context-guessing machine does not lead to robust OOD classifiers.  ( 2 min )
    A Comparative Study of Classical and Quantum Machine Learning Models for Sentimental Analysis. (arXiv:2209.05142v1 [quant-ph])
    We analyse and classify the sentiments of a text data constructed from movie reviews. For this, we use the kernel-based approach from quantum machine learning algorithms. In order to compose a quantum kernel, we use a circuit constructed using a combination of different Pauli rotational gates where the rotational parameter is a classical non-linear function of data points obtained from the text data. For analysing the performance of the proposed model, we analyse the quantum model using decision tree, gradient boosting classifier, and classical and quantum support vector machines. Our results show that quantum kernel model or quantum support vector machine outperforms all other algorithms used for analysis in terms of all evaluation metrics. In comparison to a classical support vector machine, the quantum support vector machine leads to significantly better results even with increased number of features or dimensions. The results clearly demonstrate increase in precision score by $9.4 \%$ using a quantum support vector machine as against a classical support vector machine if the number of features are $15$.  ( 2 min )
    CARE: Certifiably Robust Learning with Reasoning via Variational Inference. (arXiv:2209.05055v1 [cs.LG])
    Despite great recent advances achieved by deep neural networks (DNNs), they are often vulnerable to adversarial attacks. Intensive research efforts have been made to improve the robustness of DNNs; however, most empirical defenses can be adaptively attacked again, and the theoretically certified robustness is limited, especially on large-scale datasets. One potential root cause of such vulnerabilities for DNNs is that although they have demonstrated powerful expressiveness, they lack the reasoning ability to make robust and reliable predictions. In this paper, we aim to integrate domain knowledge to enable robust learning with the reasoning paradigm. In particular, we propose a certifiably robust learning with reasoning pipeline (CARE), which consists of a learning component and a reasoning component. Concretely, we use a set of standard DNNs to serve as the learning component to make semantic predictions, and we leverage the probabilistic graphical models, such as Markov logic networks (MLN), to serve as the reasoning component to enable knowledge/logic reasoning. However, it is known that the exact inference of MLN (reasoning) is #P-complete, which limits the scalability of the pipeline. To this end, we propose to approximate the MLN inference via variational inference based on an efficient expectation maximization algorithm. In particular, we leverage graph convolutional networks (GCNs) to encode the posterior distribution during variational inference and update the parameters of GCNs (E-step) and the weights of knowledge rules in MLN (M-step) iteratively. We conduct extensive experiments on different datasets and show that CARE achieves significantly higher certified robustness compared with the state-of-the-art baselines. We additionally conducted different ablation studies to demonstrate the empirical robustness of CARE and the effectiveness of different knowledge integration.  ( 3 min )
    Gradient-Free Methods for Deterministic and Stochastic Nonsmooth Nonconvex Optimization. (arXiv:2209.05045v1 [math.OC])
    Nonsmooth nonconvex optimization problems broadly emerge in machine learning and business decision making, whereas two core challenges impede the development of efficient solution methods with finite-time convergence guarantee: the lack of computationally tractable optimality criterion and the lack of computationally powerful oracles. The contributions of this paper are two-fold. First, we establish the relationship between the celebrated Goldstein subdifferential~\citep{Goldstein-1977-Optimization} and uniform smoothing, thereby providing the basis and intuition for the design of gradient-free methods that guarantee the finite-time convergence to a set of Goldstein stationary points. Second, we propose the gradient-free method (GFM) and stochastic GFM for solving a class of nonsmooth nonconvex optimization problems and prove that both of them can return a $(\delta,\epsilon)$-Goldstein stationary point of a Lipschitz function $f$ at an expected convergence rate at $O(d^{3/2}\delta^{-1}\epsilon^{-4})$ where $d$ is the problem dimension. Two-phase versions of GFM and SGFM are also proposed and proven to achieve improved large-deviation results. Finally, we demonstrate the effectiveness of 2-SGFM on training ReLU neural networks with the \textsc{Minst} dataset.  ( 2 min )
    SELTO: Sample-Efficient Learned Topology Optimization. (arXiv:2209.05098v1 [cs.LG])
    We present a sample-efficient deep learning strategy for topology optimization. Our end-to-end approach is supervised and includes physics-based preprocessing and equivariant networks. We analyze how different components of our deep learning pipeline influence the number of required training samples via a large-scale comparison. The results demonstrate that including physical concepts not only drastically improves the sample efficiency but also the predictions' physical correctness. Finally, we publish two topology optimization datasets containing problems and corresponding ground truth solutions. We are confident that these datasets will improve comparability and future progress in the field.  ( 2 min )
    TMSS: An End-to-End Transformer-based Multimodal Network for Segmentation and Survival Prediction. (arXiv:2209.05036v1 [eess.IV])
    When oncologists estimate cancer patient survival, they rely on multimodal data. Even though some multimodal deep learning methods have been proposed in the literature, the majority rely on having two or more independent networks that share knowledge at a later stage in the overall model. On the other hand, oncologists do not do this in their analysis but rather fuse the information in their brain from multiple sources such as medical images and patient history. This work proposes a deep learning method that mimics oncologists' analytical behavior when quantifying cancer and estimating patient survival. We propose TMSS, an end-to-end Transformer based Multimodal network for Segmentation and Survival prediction that leverages the superiority of transformers that lies in their abilities to handle different modalities. The model was trained and validated for segmentation and prognosis tasks on the training dataset from the HEad & NeCK TumOR segmentation and the outcome prediction in PET/CT images challenge (HECKTOR). We show that the proposed prognostic model significantly outperforms state-of-the-art methods with a concordance index of 0.763+/-0.14 while achieving a comparable dice score of 0.772+/-0.030 to a standalone segmentation model. The code is publicly available.  ( 2 min )
    "Calibeating": Beating Forecasters at Their Own Game. (arXiv:2209.04892v1 [econ.TH])
    In order to identify expertise, forecasters should not be tested by their calibration score, which can always be made arbitrarily small, but rather by their Brier score. The Brier score is the sum of the calibration score and the refinement score; the latter measures how good the sorting into bins with the same forecast is, and thus attests to "expertise." This raises the question of whether one can gain calibration without losing expertise, which we refer to as "calibeating." We provide an easy way to calibeat any forecast, by a deterministic online procedure. We moreover show that calibeating can be achieved by a stochastic procedure that is itself calibrated, and then extend the results to simultaneously calibeating multiple procedures, and to deterministic procedures that are continuously calibrated.  ( 2 min )
    A novel learning-based robust model predictive control energy management strategy for fuel cell electric vehicles. (arXiv:2209.04995v1 [cs.SY])
    The multi-source electromechanical coupling makes the energy management of fuel cell electric vehicles (FCEVs) relatively nonlinear and complex especially in the types of 4-wheel-drive (4WD) FCEVs. Accurate state observing for complicated nonlinear system is the basis for fantastic energy managing in FCEVs. Aiming at releasing the energy-saving potential of FCEVs, a novel learning-based robust model predictive control (LRMPC) strategy is proposed for a 4WD FCEV, contributing to suitable power distribution among multiple energy sources. The well-designed strategy based on machine learning (ML) translates the knowledge of the nonlinear system to the explicit controlling scheme with superior robust performance. To start with, ML methods with high regression accuracy and superior generalization ability are trained offline to establish the precise state observer for SOC. Then, explicit data tables for SOC generated by state observer are used for grabbing accurate state changing, whose input features include the vehicle status and the states of vehicle components. To be specific, the vehicle velocity estimation for providing future speed reference is constructed by deep forest. Next, the components including explicit data tables and vehicle velocity estimation are combined with model predictive control (MPC) to release the state-of-the-art energy-saving ability for the multi-freedom system in FCEVs, whose name is LRMPC. At last, the detailed assessment is performed in simulation test to validate the advancing performance of LRMPC. The corresponding results highlight the optimal control effect in energy-saving potential and strong real-time application ability of LRMPC.  ( 3 min )
    Symbolic Knowledge Extraction from Opaque Predictors Applied to Cosmic-Ray Data Gathered with LISA Pathfinder. (arXiv:2209.04697v1 [astro-ph.HE])
    Machine learning models are nowadays ubiquitous in space missions, performing a wide variety of tasks ranging from the prediction of multivariate time series through the detection of specific patterns in the input data. Adopted models are usually deep neural networks or other complex machine learning algorithms providing predictions that are opaque, i.e., human users are not allowed to understand the rationale behind the provided predictions. Several techniques exist in the literature to combine the impressive predictive performance of opaque machine learning models with human-intelligible prediction explanations, as for instance the application of symbolic knowledge extraction procedures. In this paper are reported the results of different knowledge extractors applied to an ensemble predictor capable of reproducing cosmic-ray data gathered on board the LISA Pathfinder space mission. A discussion about the readability/fidelity trade-off of the extracted knowledge is also presented.  ( 2 min )
    Vision Transformer with Convolutional Encoder-Decoder for Hand Gesture Recognition using 24 GHz Doppler Radar. (arXiv:2209.05032v1 [eess.SP])
    Transformers combined with convolutional encoders have been recently used for hand gesture recognition (HGR) using micro-Doppler signatures. We propose a vision-transformer-based architecture for HGR with multi-antenna continuous-wave Doppler radar receivers. The proposed architecture consists of three modules: a convolutional encoderdecoder, an attention module with three transformer layers, and a multi-layer perceptron. The novel convolutional decoder helps to feed patches with larger sizes to the attention module for improved feature extraction. Experimental results obtained with a dataset corresponding to a two-antenna continuous-wave Doppler radar receiver operating at 24 GHz (published by Skaria et al.) confirm that the proposed architecture achieves an accuracy of 98.3% which substantially surpasses the state-of-the-art on the used dataset.  ( 2 min )
    Learning When to Say "I Don't Know". (arXiv:2209.04944v1 [cs.CV])
    We propose a new Reject Option Classification technique to identify and remove regions of uncertainty in the decision space for a given neural classifier and dataset. Such existing formulations employ a learned rejection (remove)/selection (keep) function and require either a known cost for rejecting examples or strong constraints on the accuracy or coverage of the selected examples. We consider an alternative formulation by instead analyzing the complementary reject region and employing a validation set to learn per-class softmax thresholds. The goal is to maximize the accuracy of the selected examples subject to a natural randomness allowance on the rejected examples (rejecting more incorrect than correct predictions). We provide results showing the benefits of the proposed method over na\"ively thresholding calibrated/uncalibrated softmax scores with 2-D points, imagery, and text classification datasets using state-of-the-art pretrained models. Source code is available at https://github.com/osu-cvl/learning-idk.  ( 2 min )
    A Complex Network based Graph Embedding Method for Link Prediction. (arXiv:2209.04884v1 [cs.LG])
    Graph embedding methods aim at finding useful graph representations by mapping nodes to a low-dimensional vector space. It is a task with important downstream applications, such as link prediction, graph reconstruction, data visualization, node classification, and language modeling. In recent years, the field of graph embedding has witnessed a shift from linear algebraic approaches towards local, gradient-based optimization methods combined with random walks and deep neural networks to tackle the problem of embedding large graphs. However, despite this improvement in the optimization tools, graph embedding methods are still generically designed in a way that is oblivious to the particularities of real-life networks. Indeed, there has been significant progress in understanding and modeling complex real-life networks in recent years. However, the obtained results have had a minor influence on the development of graph embedding algorithms. This paper aims to remedy this by designing a graph embedding method that takes advantage of recent valuable insights from the field of network science. More precisely, we present a novel graph embedding approach based on the popularity-similarity and local attraction paradigms. We evaluate the performance of the proposed approach on the link prediction task on a large number of real-life networks. We show, using extensive experimental analysis, that the proposed method outperforms state-of-the-art graph embedding algorithms. We also demonstrate its robustness to data scarcity and the choice of embedding dimensionality.  ( 3 min )
    An Investigation of Smart Contract for Collaborative Machine Learning Model Training. (arXiv:2209.05017v1 [cs.LG])
    Machine learning (ML) has penetrated various fields in the era of big data. The advantage of collaborative machine learning (CML) over most conventional ML lies in the joint effort of decentralized nodes or agents that results in better model performance and generalization. As the training of ML models requires a massive amount of good quality data, it is necessary to eliminate concerns about data privacy and ensure high-quality data. To solve this problem, we cast our eyes on the integration of CML and smart contracts. Based on blockchain, smart contracts enable automatic execution of data preserving and validation, as well as the continuity of CML model training. In our simulation experiments, we define incentive mechanisms on the smart contract, investigate the important factors such as the number of features in the dataset (num_words), the size of the training data, the cost for the data holders to submit data, etc., and conclude how these factors impact the performance metrics of the model: the accuracy of the trained model, the gap between the accuracies of the model before and after simulation, and the time to use up the balance of bad agent. For instance, the increase of the value of num_words leads to higher model accuracy and eliminates the negative influence of malicious agents in a shorter time from our observation of the experiment results. Statistical analyses show that with the help of smart contracts, the influence of invalid data is efficiently diminished and model robustness is maintained. We also discuss the gap in existing research and put forward possible future directions for further works.  ( 3 min )
    On topological data analysis for structural dynamics: an introduction to persistent homology. (arXiv:2209.05134v1 [stat.ML])
    Topological methods can provide a way of proposing new metrics and methods of scrutinising data, that otherwise may be overlooked. In this work, a method of quantifying the shape of data, via a topic called topological data analysis will be introduced. The main tool within topological data analysis (TDA) is persistent homology. Persistent homology is a method of quantifying the shape of data over a range of length scales. The required background and a method of computing persistent homology is briefly discussed in this work. Ideas from topological data analysis are then used for nonlinear dynamics to analyse some common attractors, by calculating their embedding dimension, and then to assess their general topologies. A method will also be proposed, that uses topological data analysis to determine the optimal delay for a time-delay embedding. TDA will also be applied to a Z24 Bridge case study in structural health monitoring, where it will be used to scrutinise different data partitions, classified by the conditions at which the data were collected. A metric, from topological data analysis, is used to compare data between the partitions. The results presented demonstrate that the presence of damage alters the manifold shape more significantly than the effects present from temperature.  ( 3 min )
    Bounding The Rademacher Complexity of Fourier Neural Operator. (arXiv:2209.05150v1 [cs.LG])
    A Fourier neural operator (FNO) is one of the physics-inspired machine learning methods. In particular, it is a neural operator. In recent times, several types of neural operators have been developed, e.g., deep operator networks, GNO, and MWTO. Compared with other models, the FNO is computationally efficient and can learn nonlinear operators between function spaces independent of a certain finite basis. In this study, we investigated the bounding of the Rademacher complexity of the FNO based on specific group norms. Using capacity based on these norms, we bound the generalization error of the FNO model. In addition, we investigated the correlation between the empirical generalization error and the proposed capacity of FNO. Based on this investigation, we gained insight into the impact of the model architecture on the generalization error and estimated the amount of information about FNO models stored in various types of capacities.  ( 2 min )
    Graph Polynomial Convolution Models for Node Classification of Non-Homophilous Graphs. (arXiv:2209.05020v1 [cs.LG])
    We investigate efficient learning from higher-order graph convolution and learning directly from adjacency matrices for node classification. We revisit the scaled graph residual network and remove ReLU activation from residual layers and apply a single weight matrix at each residual layer. We show that the resulting model lead to new graph convolution models as a polynomial of the normalized adjacency matrix, the residual weight matrix, and the residual scaling parameter. Additionally, we propose adaptive learning between directly graph polynomial convolution models and learning directly from the adjacency matrix. Furthermore, we propose fully adaptive models to learn scaling parameters at each residual layer. We show that generalization bounds of proposed methods are bounded as a polynomial of eigenvalue spectrum, scaling parameters, and upper bounds of residual weights. By theoretical analysis, we argue that the proposed models can obtain improved generalization bounds by limiting the higher-orders of convolutions and direct learning from the adjacency matrix. Using a wide set of real-data, we demonstrate that the proposed methods obtain improved accuracy for node-classification of non-homophilous graphs.  ( 2 min )
    Clifford Neural Layers for PDE Modeling. (arXiv:2209.04934v1 [cs.LG])
    Partial differential equations (PDEs) see widespread use in sciences and engineering to describe simulation of physical processes as scalar and vector fields interacting and coevolving over time. Due to the computationally expensive nature of their standard solution methods, neural PDE surrogates have become an active research topic to accelerate these simulations. However, current methods do not explicitly take into account the relationship between different fields and their internal components, which are often correlated. Viewing the time evolution of such correlated fields through the lens of multivector fields allows us to overcome these limitations. Multivector fields consist of scalar, vector, as well as higher-order components, such as bivectors and trivectors. Their algebraic properties, such as multiplication, addition and other arithmetic operations can be described by Clifford algebras. To our knowledge, this paper presents the first usage of such multivector representations together with Clifford convolutions and Clifford Fourier transforms in the context of deep learning. The resulting Clifford neural layers are universally applicable and will find direct use in the areas of fluid dynamics, weather forecasting, and the modeling of physical systems in general. We empirically evaluate the benefit of Clifford neural layers by replacing convolution and Fourier operations in common neural PDE surrogates by their Clifford counterparts on two-dimensional Navier-Stokes and weather modeling tasks, as well as three-dimensional Maxwell equations. Clifford neural layers consistently improve generalization capabilities of the tested neural PDE surrogates.  ( 3 min )
  • Open

    Bounding The Rademacher Complexity of Fourier Neural Operator. (arXiv:2209.05150v1 [cs.LG])
    A Fourier neural operator (FNO) is one of the physics-inspired machine learning methods. In particular, it is a neural operator. In recent times, several types of neural operators have been developed, e.g., deep operator networks, GNO, and MWTO. Compared with other models, the FNO is computationally efficient and can learn nonlinear operators between function spaces independent of a certain finite basis. In this study, we investigated the bounding of the Rademacher complexity of the FNO based on specific group norms. Using capacity based on these norms, we bound the generalization error of the FNO model. In addition, we investigated the correlation between the empirical generalization error and the proposed capacity of FNO. Based on this investigation, we gained insight into the impact of the model architecture on the generalization error and estimated the amount of information about FNO models stored in various types of capacities.
    SocialInteractionGAN: Multi-person Interaction Sequence Generation. (arXiv:2103.05916v2 [cs.NE] UPDATED)
    Prediction of human actions in social interactions has important applications in the design of social robots or artificial avatars. In this paper, we focus on a unimodal representation of interactions and propose to tackle interaction generation in a data-driven fashion. In particular, we model human interaction generation as a discrete multi-sequence generation problem and present SocialInteractionGAN, a novel adversarial architecture for conditional interaction generation. Our model builds on a recurrent encoder-decoder generator network and a dual-stream discriminator, that jointly evaluates the realism of interactions and individual action sequences and operates at different time scales. Crucially, contextual information on interacting participants is shared among agents and reinjected in both the generation and the discriminator evaluation processes. Experiments show that albeit dealing with low dimensional data, SocialInteractionGAN succeeds in producing high realism action sequences of interacting people, comparing favorably to a diversity of recurrent and convolutional discriminator baselines, and we argue that this work will constitute a first stone towards higher dimensional and multimodal interaction generation. Evaluations are conducted using classical GAN metrics, that we specifically adapt for discrete sequential data. Our model is shown to properly learn the dynamics of interaction sequences, while exploiting the full range of available actions.
    Advanced Manufacturing Configuration by Sample-efficient Batch Bayesian Optimization. (arXiv:2205.11827v2 [cs.LG] UPDATED)
    We propose a framework for the configuration and operation of expensive-to-evaluate advanced manufacturing methods, based on Bayesian optimization. The framework unifies a tailored acquisition function, a parallel acquisition procedure, and the integration of process information providing context to the optimization procedure. \cmtb{The novel acquisition function is demonstrated, analyzed and compared on state-of-the-art benchmarking problems. We apply the optimization approach to atmospheric plasma spraying and fused deposition modeling.} Our results demonstrate that the proposed framework can efficiently find input parameters that produce the desired outcome and minimize the process cost.
    Statistical Learning Theory for Control: A Finite Sample Perspective. (arXiv:2209.05423v1 [eess.SY])
    This tutorial survey provides an overview of recent non-asymptotic advances in statistical learning theory as relevant to control and system identification. While there has been substantial progress across all areas of control, the theory is most well-developed when it comes to linear system identification and learning for the linear quadratic regulator, which are the focus of this manuscript. From a theoretical perspective, much of the labor underlying these advances has been in adapting tools from modern high-dimensional statistics and learning theory. While highly relevant to control theorists interested in integrating tools from machine learning, the foundational material has not always been easily accessible. To remedy this, we provide a self-contained presentation of the relevant material, outlining all the key ideas and the technical machinery that underpin recent results. We also present a number of open problems and future directions.
    Model interpretation using improved local regression with variable importance. (arXiv:2209.05371v1 [stat.ML])
    A fundamental question on the use of ML models concerns the explanation of their predictions for increasing transparency in decision-making. Although several interpretability methods have emerged, some gaps regarding the reliability of their explanations have been identified. For instance, most methods are unstable (meaning that they give very different explanations with small changes in the data), and do not cope well with irrelevant features (that is, features not related to the label). This article introduces two new interpretability methods, namely VarImp and SupClus, that overcome these issues by using local regressions fits with a weighted distance that takes into account variable importance. Whereas VarImp generates explanations for each instance and can be applied to datasets with more complex relationships, SupClus interprets clusters of instances with similar explanations and can be applied to simpler datasets where clusters can be found. We compare our methods with state-of-the art approaches and show that it yields better explanations according to several metrics, particularly in high-dimensional problems with irrelevant features, as well as when the relationship between features and target is non-linear.
    Fast Bayesian Variable Selection in Binomial and Negative Binomial Regression. (arXiv:2106.14981v2 [stat.ME] UPDATED)
    Bayesian variable selection is a powerful tool for data analysis, as it offers a principled method for variable selection that accounts for prior information and uncertainty. However, wider adoption of Bayesian variable selection has been hampered by computational challenges, especially in difficult regimes with a large number of covariates or non-conjugate likelihoods. Generalized linear models for count data, which are prevalent in biology, ecology, economics, and beyond, represent an important special case. Here we introduce an efficient MCMC scheme for variable selection in binomial and negative binomial regression that exploits Tempered Gibbs Sampling (Zanella and Roberts, 2019) and that includes logistic regression as a special case. In experiments we demonstrate the effectiveness of our approach, including on cancer data with seventeen thousand covariates.
    Decision Tree-Based Predictive Models for Academic Achievement Using College Students' Support Networks. (arXiv:2108.13947v2 [stat.ML] UPDATED)
    In this study, we examine a set of primary data collected from 484 students enrolled in a large public university in the Mid-Atlantic United States region during the early stages of the COVID-19 pandemic. The data, called Ties data, included students' demographic and support network information. The support network data comprised of information that highlighted the type of support, (i.e. emotional or educational; routine or intense). Using this data set, models for predicting students' academic achievement, quantified by their self-reported GPA, were created using Chi-Square Automatic Interaction Detection (CHAID), a decision tree algorithm, and cforest, a random forest algorithm that uses conditional inference trees. We compare the methods' accuracy and variation in the set of important variables suggested by each algorithm. Each algorithm found different variables important for different student demographics with some overlap. For White students, different types of educational support were important in predicting academic achievement, while for non-White students, different types of emotional support were important in predicting academic achievement. The presence of differing types of routine support were important in predicting academic achievement for cisgender women, while differing types of intense support were important in predicting academic achievement for cisgender men.  ( 3 min )
    Bias Challenges in Counterfactual Data Augmentation. (arXiv:2209.05104v1 [cs.LG])
    Deep learning models tend not to be out-of-distribution robust primarily due to their reliance on spurious features to solve the task. Counterfactual data augmentations provide a general way of (approximately) achieving representations that are counterfactual-invariant to spurious features, a requirement for out-of-distribution (OOD) robustness. In this work, we show that counterfactual data augmentations may not achieve the desired counterfactual-invariance if the augmentation is performed by a {\em context-guessing machine}, an abstract machine that guesses the most-likely context of a given input. We theoretically analyze the invariance imposed by such counterfactual data augmentations and describe an exemplar NLP task where counterfactual data augmentation by a context-guessing machine does not lead to robust OOD classifiers.  ( 2 min )
    Wasserstein Distributional Learning. (arXiv:2209.04991v1 [stat.ME])
    Learning conditional densities and identifying factors that influence the entire distribution are vital tasks in data-driven applications. Conventional approaches work mostly with summary statistics, and are hence inadequate for a comprehensive investigation. Recently, there have been developments on functional regression methods to model density curves as functional outcomes. A major challenge for developing such models lies in the inherent constraint of non-negativity and unit integral for the functional space of density outcomes. To overcome this fundamental issue, we propose Wasserstein Distributional Learning (WDL), a flexible density-on-scalar regression modeling framework that starts with the Wasserstein distance $W_2$ as a proper metric for the space of density outcomes. We then introduce a heterogeneous and flexible class of Semi-parametric Conditional Gaussian Mixture Models (SCGMM) as the model class $\mathfrak{F} \otimes \mathcal{T}$. The resulting metric space $(\mathfrak{F} \otimes \mathcal{T}, W_2)$ satisfies the required constraints and offers a dense and closed functional subspace. For fitting the proposed model, we further develop an efficient algorithm based on Majorization-Minimization optimization with boosted trees. Compared with methods in the previous literature, WDL better characterizes and uncovers the nonlinear dependence of the conditional densities, and their derived summary statistics. We demonstrate the effectiveness of the WDL framework through simulations and real-world applications.  ( 2 min )
    Statistical Estimation of Confounded Linear MDPs: An Instrumental Variable Approach. (arXiv:2209.05186v1 [stat.ML])
    In an Markov decision process (MDP), unobservable confounders may exist and have impacts on the data generating process, so that the classic off-policy evaluation (OPE) estimators may fail to identify the true value function of the target policy. In this paper, we study the statistical properties of OPE in confounded MDPs with observable instrumental variables. Specifically, we propose a two-stage estimator based on the instrumental variables and establish its statistical properties in the confounded MDPs with a linear structure. For non-asymptotic analysis, we prove a $\mathcal{O}(n^{-1/2})$-error bound where $n$ is the number of samples. For asymptotic analysis, we prove that the two-stage estimator is asymptotically normal with a typical rate of $n^{1/2}$. To the best of our knowledge, we are the first to show such statistical results of the two-stage estimator for confounded linear MDPs via instrumental variables.  ( 2 min )
    Kernel Learning for Explainable Climate Science. (arXiv:2209.04947v1 [cs.LG])
    The Upper Indus Basin, Himalayas provides water for 270 million people and countless ecosystems. However, precipitation, a key component to hydrological modelling, is poorly understood in this area. A key challenge surrounding this uncertainty comes from the complex spatial-temporal distribution of precipitation across the basin. In this work we propose Gaussian processes with structured non-stationary kernels to model precipitation patterns in the UIB. Previous attempts to quantify or model precipitation in the Hindu Kush Karakoram Himalayan region have often been qualitative or include crude assumptions and simplifications which cannot be resolved at lower resolutions. This body of research also provides little to no error propagation. We account for the spatial variation in precipitation with a non-stationary Gibbs kernel parameterised with an input dependent lengthscale. This allows the posterior function samples to adapt to the varying precipitation patterns inherent in the distinct underlying topography of the Indus region. The input dependent lengthscale is governed by a latent Gaussian process with a stationary squared-exponential kernel to allow the function level hyperparameters to vary smoothly. In ablation experiments we motivate each component of the proposed kernel by demonstrating its ability to model the spatial covariance, temporal structure and joint spatio-temporal reconstruction. We benchmark our model with a stationary Gaussian process and a Deep Gaussian processes.
    On the Hyperparameters in Stochastic Gradient Descent with Momentum. (arXiv:2108.03947v2 [cs.LG] UPDATED)
    Following the same routine as [SSJ20], we continue to present the theoretical analysis for stochastic gradient descent with momentum (SGD with momentum) in this paper. Differently, for SGD with momentum, we demonstrate it is the two hyperparameters together, the learning rate and the momentum coefficient, that play the significant role for the linear rate of convergence in non-convex optimization. Our analysis is based on the use of a hyperparameters-dependent stochastic differential equation (hp-dependent SDE) that serves as a continuous surrogate for SGD with momentum. Similarly, we establish the linear convergence for the continuous-time formulation of SGD with momentum and obtain an explicit expression for the optimal linear rate by analyzing the spectrum of the Kramers-Fokker-Planck operator. By comparison, we demonstrate how the optimal linear rate of convergence and the final gap for SGD only about the learning rate varies with the momentum coefficient increasing from zero to one when the momentum is introduced. Then, we propose a mathematical interpretation why the SGD with momentum converges faster and more robust about the learning rate than the standard SGD in practice. Finally, we show the Nesterov momentum under the existence of noise has no essential difference with the standard momentum.
    Understanding the Behavior of Belief Propagation. (arXiv:2209.05464v1 [cs.AI])
    Probabilistic graphical models are a powerful concept for modeling high-dimensional distributions. Besides modeling distributions, probabilistic graphical models also provide an elegant framework for performing statistical inference; because of the high-dimensional nature, however, one must often use approximate methods for this purpose. Belief propagation performs approximate inference, is efficient, and looks back on a long success-story. Yet, in most cases, belief propagation lacks any performance and convergence guarantees. Many realistic problems are presented by graphical models with loops, however, in which case belief propagation is neither guaranteed to provide accurate estimates nor that it converges at all. This thesis investigates how the model parameters influence the performance of belief propagation. We are particularly interested in their influence on (i) the number of fixed points, (ii) the convergence properties, and (iii) the approximation quality.  ( 2 min )
    A framework for bilevel optimization that enables stochastic and global variance reduction algorithms. (arXiv:2201.13409v2 [stat.ML] UPDATED)
    Bilevel optimization, the problem of minimizing a value function which involves the arg-minimum of another function, appears in many areas of machine learning. In a large scale empirical risk minimization setting where the number of samples is huge, it is crucial to develop stochastic methods, which only use a few samples at a time to progress. However, computing the gradient of the value function involves solving a linear system, which makes it difficult to derive unbiased stochastic estimates. To overcome this problem we introduce a novel framework, in which the solution of the inner problem, the solution of the linear system, and the main variable evolve at the same time. These directions are written as a sum, making it straightforward to derive unbiased estimates. The simplicity of our approach allows us to develop global variance reduction algorithms, where the dynamics of all variables is subject to variance reduction. We demonstrate that SABA, an adaptation of the celebrated SAGA algorithm in our framework, has $O(\frac1T)$ convergence rate, and that it achieves linear convergence under Polyak-Lojasciewicz assumption. This is the first stochastic algorithm for bilevel optimization that verifies either of these properties. Numerical experiments validate the usefulness of our method.
    A Note on the Efficient Evaluation of PAC-Bayes Bounds. (arXiv:2209.05188v1 [cs.LG])
    When utilising PAC-Bayes theory for risk certification, it is usually necessary to estimate and bound the Gibbs risk of the PAC-Bayes posterior. Many works in the literature employ a method for this which requires a large number of passes of the dataset, incurring high computational cost. This manuscript presents a very general alternative which makes computational savings on the order of the dataset size.
    On topological data analysis for structural dynamics: an introduction to persistent homology. (arXiv:2209.05134v1 [stat.ML])
    Topological methods can provide a way of proposing new metrics and methods of scrutinising data, that otherwise may be overlooked. In this work, a method of quantifying the shape of data, via a topic called topological data analysis will be introduced. The main tool within topological data analysis (TDA) is persistent homology. Persistent homology is a method of quantifying the shape of data over a range of length scales. The required background and a method of computing persistent homology is briefly discussed in this work. Ideas from topological data analysis are then used for nonlinear dynamics to analyse some common attractors, by calculating their embedding dimension, and then to assess their general topologies. A method will also be proposed, that uses topological data analysis to determine the optimal delay for a time-delay embedding. TDA will also be applied to a Z24 Bridge case study in structural health monitoring, where it will be used to scrutinise different data partitions, classified by the conditions at which the data were collected. A metric, from topological data analysis, is used to compare data between the partitions. The results presented demonstrate that the presence of damage alters the manifold shape more significantly than the effects present from temperature.
    Fully-automated patient-level malaria assessment on field-prepared thin blood film microscopy images, including Supplementary Information. (arXiv:1908.01901v2 [cs.LG] UPDATED)
    Malaria is a life-threatening disease affecting millions. Microscopy-based assessment of thin blood films is a standard method to (i) determine malaria species and (ii) quantitate high-parasitemia infections. Full automation of malaria microscopy by machine learning (ML) is a challenging task because field-prepared slides vary widely in quality and presentation, and artifacts often heavily outnumber relatively rare parasites. In this work, we describe a complete, fully-automated framework for thin film malaria analysis that applies ML methods, including convolutional neural nets (CNNs), trained on a large and diverse dataset of field-prepared thin blood films. Quantitation and species identification results are close to sufficiently accurate for the concrete needs of drug resistance monitoring and clinical use-cases on field-prepared samples. We focus our methods and our performance metrics on the field use-case requirements. We discuss key issues and important metrics for the application of ML methods to malaria microscopy.
    Granger Causal Chain Discovery for Sepsis-Associated Derangements via Multivariate Hawkes Processes. (arXiv:2209.04480v1 [stat.AP])
    Modern health care systems are conducting continuous, automated surveillance of the electronic medical record (EMR) to identify adverse events with increasing frequency; however, many events such as sepsis do not have clearly elucidated prodromes (i.e., event chains) that can be used to identify and intercept the adverse event early in its course. Currently there does not exist a reliable framework for discovering or describing causal chains that precede adverse hospital events. Clinically relevant and interpretable results require a framework that can (1) infer temporal interactions across multiple patient features found in EMR data (e.g., labs, vital signs, etc.) and (2) can identify pattern(s) which precede and are specific to an impending adverse event (e.g., sepsis). In this work, we propose a linear multivariate Hawkes process model, coupled with $g(x) = x^+$ link function to allow potential inhibition effects, in order to recover a Granger Causal (GC) graph. We develop a two-phase gradient-based scheme to maximize a surrogate of likelihood to estimate the problem parameters. This two-phase algorithm is scalable and shown to be effective via our numerical simulation. It is subsequently extended to a data set of patients admitted to Grady hospital system in Atalanta, GA, where the fitted Granger Causal graph identifies several highly interpretable chains that precede sepsis.  ( 3 min )
    Robust Geometric Metric Learning. (arXiv:2202.11550v2 [stat.ML] UPDATED)
    This paper proposes new algorithms for the metric learning problem. We start by noticing that several classical metric learning formulations from the literature can be viewed as modified covariance matrix estimation problems. Leveraging this point of view, a general approach, called Robust Geometric Metric Learning (RGML), is then studied. This method aims at simultaneously estimating the covariance matrix of each class while shrinking them towards their (unknown) barycenter. We focus on two specific costs functions: one associated with the Gaussian likelihood (RGML Gaussian), and one with Tyler's M -estimator (RGML Tyler). In both, the barycenter is defined with the Riemannian distance, which enjoys nice properties of geodesic convexity and affine invariance. The optimization is performed using the Riemannian geometry of symmetric positive definite matrices and its submanifold of unit determinant. Finally, the performance of RGML is asserted on real datasets. Strong performance is exhibited while being robust to mislabeled data.  ( 2 min )
    Monitoring of functional profiles combining the notion of Fr\'echet mean and the framework of deformation models with application in ambient air pollution surveillance. (arXiv:2010.02968v2 [stat.ME] UPDATED)
    A framework suitable for monitoring functional profiles combining the notion of Fr\'echet mean and the concept of deformation models is developed and proposed. The generalized sense of mean that the notion of the Fr\'echet mean offers is employed to capture the typical functional shape of the data, while the concept of deformation models allows for interpretable parameterizations of profile's deviations from the typical shape. Functional EWMA-type control charts are built and proposed based on shape characteristics of the functional data, allowing for (a) identifying shifts from the in-control behaviour and (b) providing causal relationships of the potential shifts with significant deviances of certain qualitative characteristics (e.g amplitude or phase deformations). The functional monitoring scheme is implemented to assess ambient air pollution. In particular, the method is implemented to a synthetic data example to assess its performance under various conditions, and to a real-world example using sensor data from an area in the city of Athens, where air pollutants profiles and their characteristics are successfully analyzed and out-of-control behaviours are identified.  ( 3 min )
    Revisiting Active Sets for Gaussian Process Decoders. (arXiv:2209.04636v1 [stat.ML])
    Decoders built on Gaussian processes (GPs) are enticing due to the marginalisation over the non-linear function space. Such models (also known as GP-LVMs) are often expensive and notoriously difficult to train in practice, but can be scaled using variational inference and inducing points. In this paper, we revisit active set approximations. We develop a new stochastic estimate of the log-marginal likelihood based on recently discovered links to cross-validation, and propose a computationally efficient approximation thereof. We demonstrate that the resulting stochastic active sets (SAS) approximation significantly improves the robustness of GP decoder training while reducing computational cost. The SAS-GP obtains more structure in the latent space, scales to many datapoints and learns better representations than variational autoencoders, which is rarely the case for GP decoders.  ( 2 min )
    Learning Consumer Preferences from Bundle Sales Data. (arXiv:2209.04942v1 [stat.ML])
    Product bundling is a common selling mechanism used in online retailing. To set profitable bundle prices, the seller needs to learn consumer preferences from the transaction data. When customers purchase bundles or multiple products, classical methods such as discrete choice models cannot be used to estimate customers' valuations. In this paper, we propose an approach to learn the distribution of consumers' valuations toward the products using bundle sales data. The approach reduces it to an estimation problem where the samples are censored by polyhedral regions. Using the EM algorithm and Monte Carlo simulation, our approach can recover the distribution of consumers' valuations. The framework allows for unobserved no-purchases and clustered market segments. We provide theoretical results on the identifiability of the probability model and the convergence of the EM algorithm. The performance of the approach is also demonstrated numerically.  ( 2 min )
    Modeling Dependent Structure for Utterances in ASR Evaluation. (arXiv:2209.05281v1 [eess.AS])
    The bootstrap resampling method has been popular for performing significance analysis on word error rate (WER) in automatic speech recognition (ASR) evaluations. To deal with the issue of dependent speech data, the blockwise bootstrap approach is also proposed that by dividing utterances into uncorrelated blocks, it resamples these blocks instead of original data. However, it is always nontrivial to uncover the dependent structure among utterances, which could lead to subjective findings in statistical testing. In this paper, we present graphical lasso based methods to explicitly model such dependency and estimate the independent blocks of utterances in a rigorous way. Then the blockwise bootstrap is applied on top of the inferred blocks. We show that the resulting variance estimator for WER is consistent under mild conditions. We also demonstrate the validity of proposed approach on LibriSpeech data.  ( 2 min )
    Time-uniform central limit theory, asymptotic confidence sequences, and anytime-valid causal inference. (arXiv:2103.06476v4 [math.ST] UPDATED)
    Confidence intervals based on the central limit theorem (CLT) are a cornerstone of classical statistics. Despite being only asymptotically valid, they are ubiquitous because they permit statistical inference under very weak assumptions, and can often be applied to problems even when nonasymptotic inference is impossible. This paper introduces time-uniform analogues of such asymptotic confidence intervals. To elaborate, our methods take the form of confidence sequences (CS) -- sequences of confidence intervals that are uniformly valid over time. CSs provide valid inference at arbitrary stopping times, incurring no penalties for "peeking" at the data, unlike classical confidence intervals which require the sample size to be fixed in advance. Existing CSs in the literature are nonasymptotic, and hence do not enjoy the aforementioned broad applicability of asymptotic confidence intervals. Our work bridges the gap by giving a definition for "asymptotic CSs", and deriving a universal asymptotic CS that requires only weak CLT-like assumptions. While the CLT approximates the distribution of a sample average by that of a Gaussian at a fixed sample size, we use strong invariance principles (stemming from the seminal 1970s work of Komlos, Major, and Tusnady) to uniformly approximate the entire sample average process by an implicit Gaussian process. We demonstrate their utility by deriving nonparametric asymptotic CSs for the average treatment effect based on doubly robust estimators in observational studies, for which no nonasymptotic methods can exist even in the fixed-time regime (due to confounding bias). These enable doubly robust causal inference that can be continuously monitored and adaptively stopped.  ( 3 min )
    Centroids Matching: an efficient Continual Learning approach operating in the embedding space. (arXiv:2208.02048v2 [cs.LG] UPDATED)
    Catastrophic forgetting (CF) occurs when a neural network loses the information previously learned while training on a set of samples from a different distribution, i.e., a new task. Existing approaches have achieved remarkable results in mitigating CF, especially in a scenario called task incremental learning. However, this scenario is not realistic, and limited work has been done to achieve good results on more realistic scenarios. In this paper, we propose a novel regularization method called Centroids Matching, that, inspired by meta-learning approaches, fights CF by operating in the feature space produced by the neural network, achieving good results while requiring a small memory footprint. Specifically, the approach classifies the samples directly using the feature vectors produced by the neural network, by matching those vectors with the centroids representing the classes from the current task, or all the tasks up to that point. Centroids Matching is faster than competing baselines, and it can be exploited to efficiently mitigate CF, by preserving the distances between the embedding space produced by the model when past tasks were over, and the one currently produced, leading to a method that achieves high accuracy on all the tasks, without using an external memory when operating on easy scenarios, or using a small one for more realistic ones. Extensive experiments demonstrate that Centroids Matching achieves accuracy gains on multiple datasets and scenarios.  ( 3 min )
    Deep Learning with Non-Linear Factor Models: Adaptability and Avoidance of Curse of Dimensionality. (arXiv:2209.04512v1 [stat.ML])
    In this paper, we connect deep learning literature with non-linear factor models and show that deep learning estimation makes a substantial improvement in the non-linear additive factor model literature. We provide bounds on the expected risk and show that these upper bounds are uniform over a set of multiple response variables by extending Schmidt-Hieber (2020) theorems. We show that our risk bound does not depend on the number of factors. In order to construct a covariance matrix estimator for asset returns, we develop a novel data-dependent estimator of the error covariance matrix in deep neural networks. The estimator refers to a flexible adaptive thresholding technique which is robust to outliers in the innovations. We prove that the estimator is consistent in spectral norm. Then using that result, we show consistency and rate of convergence of covariance matrix and precision matrix estimator for asset returns. The rate of convergence in both results do not depend on the number of factors, hence ours is a new result in the factor model literature due to the fact that number of factors are impediment to better estimation and prediction. Except from the precision matrix result, all our results are obtained even with number of assets are larger than the time span, and both quantities are growing. Various Monte Carlo simulations confirm our large sample findings and reveal superior accuracies of the DNN-FM in estimating the true underlying functional form which connects the factors and observable variables, as well as the covariance and precision matrix compared to competing approaches. Moreover, in an out-of-sample portfolio forecasting application it outperforms in most of the cases alternative portfolio strategies in terms of out-of-sample portfolio standard deviation and Sharpe ratio.  ( 3 min )
    Batch Bayesian Optimization via Particle Gradient Flows. (arXiv:2209.04722v1 [stat.ML])
    Bayesian Optimisation (BO) methods seek to find global optima of objective functions which are only available as a black-box or are expensive to evaluate. Such methods construct a surrogate model for the objective function, quantifying the uncertainty in that surrogate through Bayesian inference. Objective evaluations are sequentially determined by maximising an acquisition function at each step. However, this ancilliary optimisation problem can be highly non-trivial to solve, due to the non-convexity of the acquisition function, particularly in the case of batch Bayesian optimisation, where multiple points are selected in every step. In this work we reformulate batch BO as an optimisation problem over the space of probability measures. We construct a new acquisition function based on multipoint expected improvement which is convex over the space of probability measures. Practical schemes for solving this `inner' optimisation problem arise naturally as gradient flows of this objective function. We demonstrate the efficacy of this new method on different benchmark functions and compare with state-of-the-art batch BO methods.  ( 2 min )
    Weight Expansion: A New Perspective on Dropout and Generalization. (arXiv:2201.09209v2 [cs.LG] UPDATED)
    While dropout is known to be a successful regularization technique, insights into the mechanisms that lead to this success are still lacking. We introduce the concept of \emph{weight expansion}, an increase in the signed volume of a parallelotope spanned by the column or row vectors of the weight covariance matrix, and show that weight expansion is an effective means of increasing the generalization in a PAC-Bayesian setting. We provide a theoretical argument that dropout leads to weight expansion and extensive empirical support for the correlation between dropout and weight expansion. To support our hypothesis that weight expansion can be regarded as an \emph{indicator} of the enhanced generalization capability endowed by dropout, and not just as a mere by-product, we have studied other methods that achieve weight expansion (resp.\ contraction), and found that they generally lead to an increased (resp.\ decreased) generalization ability. This suggests that dropout is an attractive regularizer, because it is a computationally cheap method for obtaining weight expansion. This insight justifies the role of dropout as a regularizer, while paving the way for identifying regularizers that promise improved generalization through weight expansion.  ( 3 min )
    Convergence of Batch Stochastic Gradient Descent Methods with Approximate Gradients and/or Noisy Measurements: Theory and Computational Results. (arXiv:2209.05372v1 [math.OC])
    In this paper, we study convex optimization using a very general formulation called BSGD (Block Stochastic Gradient Descent). At each iteration, some but not necessary all components of the argument are updated. The direction of the update can be one of two possibilities: (i) A noise-corrupted measurement of the true gradient, or (ii) an approximate gradient computed using a first-order approximation, using function values that might themselves be corrupted by noise. This formulation embraces most of the currently used stochastic gradient methods. We establish conditions for BSGD to converge to the global minimum, based on stochastic approximation theory. Then we verify the predicted convergence through numerical experi- ments. Out results show that when approximate gradients are used, BSGD converges while momentum-based methods can diverge. However, not just our BSGD, but also standard (full- update) gradient descent, and various momentum-based methods, all converge, even with noisy gradients.  ( 2 min )
    Support Recovery in Mixture Models with Sparse Parameters. (arXiv:2202.11940v2 [cs.LG] UPDATED)
    Mixture models are widely used to fit complex and multimodal datasets. In this paper we study mixtures with high dimensional sparse latent parameter vectors and consider the problem of support recovery of those vectors. While parameter learning in mixture models is well-studied, the sparsity constraint remains relatively unexplored. Sparsity of parameter vectors is a natural constraint in variety of settings, and support recovery is a major step towards parameter estimation. We provide efficient algorithms for support recovery that have a logarithmic sample complexity dependence on the dimensionality of the latent space. Our algorithms are quite general, namely they are applicable to 1) mixtures of many different canonical distributions including Uniform, Poisson, Laplace, Gaussians, etc. 2) Mixtures of linear regressions and linear classifiers with Gaussian covariates under different assumptions on the unknown parameters. In most of these settings, our results are the first guarantees on the problem while in the rest, our results provide improvements on existing works.  ( 2 min )
    A Deterministic Approximation to Neural SDEs. (arXiv:2006.08973v6 [cs.LG] UPDATED)
    Neural Stochastic Differential Equations (NSDEs) model the drift and diffusion functions of a stochastic process as neural networks. While NSDEs are known to make accurate predictions, their uncertainty quantification properties have been remained unexplored so far. We report the empirical finding that obtaining well-calibrated uncertainty estimations from NSDEs is computationally prohibitive. As a remedy, we develop a computationally affordable deterministic scheme which accurately approximates the transition kernel, when dynamics is governed by a NSDE. Our method introduces a bidimensional moment matching algorithm: vertical along the neural net layers and horizontal along the time direction, which benefits from an original combination of effective approximations. Our deterministic approximation of the transition kernel is applicable to both training and prediction. We observe in multiple experiments that the uncertainty calibration quality of our method can be matched by Monte Carlo sampling only after introducing high computational cost. Thanks to the numerical stability of deterministic training, our method also improves prediction accuracy.  ( 3 min )
    If Influence Functions are the Answer, Then What is the Question?. (arXiv:2209.05364v1 [cs.LG])
    Influence functions efficiently estimate the effect of removing a single training data point on a model's learned parameters. While influence estimates align well with leave-one-out retraining for linear models, recent works have shown this alignment is often poor in neural networks. In this work, we investigate the specific factors that cause this discrepancy by decomposing it into five separate terms. We study the contributions of each term on a variety of architectures and datasets and how they vary with factors such as network width and training time. While practical influence function estimates may be a poor match to leave-one-out retraining for nonlinear networks, we show they are often a good approximation to a different object we term the proximal Bregman response function (PBRF). Since the PBRF can still be used to answer many of the questions motivating influence functions, such as identifying influential or mislabeled examples, our results suggest that current algorithms for influence function estimation give more informative results than previous error analyses would suggest.  ( 2 min )
    On the Nash equilibrium of moment-matching GANs for stationary Gaussian processes. (arXiv:2203.07136v3 [stat.ML] UPDATED)
    Generative Adversarial Networks (GANs) learn an implicit generative model from data samples through a two-player game. In this paper, we study the existence of Nash equilibrium of the game which is consistent as the number of data samples grows to infinity. In a realizable setting where the goal is to estimate the ground-truth generator of a stationary Gaussian process, we show that the existence of consistent Nash equilibrium depends crucially on the choice of the discriminator family. The discriminator defined from second-order statistical moments can result in non-existence of Nash equilibrium, existence of consistent non-Nash equilibrium, or existence and uniqueness of consistent Nash equilibrium, depending on whether symmetry properties of the generator family are respected. We further study empirically the local stability and global convergence of gradient descent-ascent methods towards consistent equilibrium.  ( 2 min )
    Bilevel Optimization with a Lower-level Contraction: Optimal Sample Complexity without Warm-Start. (arXiv:2202.03397v2 [stat.ML] UPDATED)
    We analyze a general class of bilevel problems, in which the upper-level problem consists in the minimization of a smooth objective function and the lower-level problem is to find the fixed point of a smooth contraction map. This type of problems include instances of meta-learning, equilibrium models, hyperparameter optimization and data poisoning adversarial attacks. Several recent works have proposed algorithms which warm-start the lower level problem, i.e. they use the previous lower-level approximate solution as a staring point for the lower-level solver. This warm-start procedure allows one to improve the sample complexity in both the stochastic and deterministic settings, achieving in some cases the order-wise optimal sample complexity. However, there are situations, e.g., meta learning and equilibrium models, in which the warm-start procedure is not well-suited or ineffective. In this work we show that without warm-start, it is still possible to achieve order-wise optimal or near-optimal sample complexity. In particular, we propose a simple method which uses stochastic fixed point iterations at the lower-level and projected inexact gradient descent at the upper-level, that reaches an $\epsilon$-stationary point using $O(\epsilon^{-2})$ and $\tilde{O}(\epsilon^{-1})$ samples for the stochastic and the deterministic setting, respectively. Finally, compared to methods using warm-start, our approach yields a simpler analysis that does not need to study the coupled interactions between the upper-level and lower-level iterates  ( 3 min )
    The Interplay Between Implicit Bias and Benign Overfitting in Two-Layer Linear Networks. (arXiv:2108.11489v3 [stat.ML] UPDATED)
    The recent success of neural network models has shone light on a rather surprising statistical phenomenon: statistical models that perfectly fit noisy data can generalize well to unseen test data. Understanding this phenomenon of $\textit{benign overfitting}$ has attracted intense theoretical and empirical study. In this paper, we consider interpolating two-layer linear neural networks trained with gradient flow on the squared loss and derive bounds on the excess risk when the covariates satisfy sub-Gaussianity and anti-concentration properties, and the noise is independent and sub-Gaussian. By leveraging recent results that characterize the implicit bias of this estimator, our bounds emphasize the role of both the quality of the initialization as well as the properties of the data covariance matrix in achieving low excess risk.  ( 2 min )
    Data Augmentation by Selecting Mixed Classes Considering Distance Between Classes. (arXiv:2209.05122v1 [cs.CV])
    Data augmentation is an essential technique for improving recognition accuracy in object recognition using deep learning. Methods that generate mixed data from multiple data sets, such as mixup, can acquire new diversity that is not included in the training data, and thus contribute significantly to accuracy improvement. However, since the data selected for mixing are randomly sampled throughout the training process, there are cases where appropriate classes or data are not selected. In this study, we propose a data augmentation method that calculates the distance between classes based on class probabilities and can select data from suitable classes to be mixed in the training process. Mixture data is dynamically adjusted according to the training trend of each class to facilitate training. The proposed method is applied in combination with conventional methods for generating mixed data. Evaluation experiments show that the proposed method improves recognition performance on general and long-tailed image recognition datasets.  ( 2 min )
    Reproducibility in machine learning for medical imaging. (arXiv:2209.05097v1 [cs.CV])
    Reproducibility is a cornerstone of science, as the replication of findings is the process through which they become knowledge. It is widely considered that many fields of science are undergoing a reproducibility crisis. This has led to the publications of various guidelines in order to improve research reproducibility. This didactic chapter intends at being an introduction to reproducibility for researchers in the field of machine learning for medical imaging. We first distinguish between different types of reproducibility. For each of them, we aim at defining it, at describing the requirements to achieve it and at discussing its utility. The chapter ends with a discussion on the benefits of reproducibility and with a plea for a non-dogmatic approach to this concept and its implementation in research practice.  ( 2 min )
  • Open

    [P] Mirage's Stable Diffusion Platform
    Hi! Our team at Mirage created a tool using NVIDIA's textual-inversion to personalize StabilityAI's diffusion model using 3-5 example images. Check it out at: https://www.app.mirageml.com Demo Video: https://www.loom.com/embed/3d20c1c93bf44915974b4a81d536966c I believe this post does not violate the community's rules. Mod's please let me know otherwise. submitted by /u/mirageml [link] [comments]  ( 88 min )
    [D] Job Market in 2022?
    Wondering how people who’re currently applying are fairing. I’m a Carnegie Mellon masters student in a vision program with a several years of production experience in vision, and aside from return offers and the odd start up, the pickings seem incredibly slim for both myself and really brilliant classmates. Between hiring freezes, many companies downsizing or stopping expansion of ML related departments, the startup bubble bursting (VCs not just throwing cash at every “AI” company), and increased competition from everyone and their mother thinking ML is the next “big thing”, it seems like an already competitive field has gotten even more challenging in the last year. I might be overly pessimistic, but I never recommend undergrads pursue ML anymore unless they have a serious passion or interest for it, and aren’t just hyped about the latest OpenAI release. Just about any other CS related specialization offers significantly more security, less competition, and for the degree of education and experience necessary, frequently more pay. I could have a pretty narrow perspective from where I’m at in my career though, so I’d love to hear everyone’s thoughts on the future in industry and academia. submitted by /u/RepsNRobots [link] [comments]  ( 89 min )

  • Open

    Zombie Apocalypse Post-apocalyptic punk cities ravaged
    submitted by /u/Available_Tadpole829 [link] [comments]  ( 87 min )
    Prompt: 31 AI Generated Poems
    submitted by /u/perfectlmao [link] [comments]  ( 87 min )
    Question about AI and urban planning!
    Hey everybody! I am a student who is going to go into an urban planning masters soon and I am curious about the effect you think AI will have on planning. With tools like stable diffusion and language models getting better by the day, do you think urban planning has a future? Or will it be handed over to AI systems. There has been some past discussion about this over on the planning subreddit and they seem pretty skeptical, but I wanted to see what people more familiar with AI's capabilities think! :) Thank you so much! :) submitted by /u/Jumpy_Ad830 [link] [comments]  ( 87 min )
    Win 1k Credits on Pixelz AI - See below
    submitted by /u/mdfnb [link] [comments]  ( 90 min )
    AI Dream 1hour EPIC 1000 Subscribers Celebration!
    submitted by /u/LordPewPew777 [link] [comments]  ( 86 min )
    Meta AI Open Sources Flashlight: Fast and Flexible Machine Learning Toolkit in C++
    While deep learning and machine learning ML frameworks perform well, customizing their underlying components has always been challenging. Low-level internals can be mistakenly obfuscated, closed-source, or hand-tuned for specific purposes, making it difficult and time-consuming to find the proper code to alter. To fuel ground-breaking research, FAIR developed Flashlight, a new open-source machine learning (ML) toolkit based in C++ that allows teams to quickly and efficiently change deep and ML frameworks to better suit their needs. Flashlight was built from the ground up to be fully adjustable by the user. It’s easy to use because it includes the fundamental elements of a study environment. Because of its basic design and lack of language bindings, rebuilding the whole Flashlight library and its training pipelines takes only a few seconds whenever its essential components are modified. Continue reading | Check out the paper and github link submitted by /u/ai-lover [link] [comments]  ( 87 min )
    AI image of moonshine Bill Hader
    submitted by /u/mandeheks [link] [comments]  ( 86 min )
    Diffusers-Interpret v0.4.0 is out! Explainability for Stable Diffusion
    submitted by /u/JClub [link] [comments]  ( 87 min )
    AI Dream 81 - Wild new Project! Part 6
    submitted by /u/LordPewPew777 [link] [comments]  ( 87 min )
    The Easiest Way to Use Stable Diffusion Right Now
    submitted by /u/pwillia7 [link] [comments]  ( 89 min )
    Stable Diffusion AI Art Deforum Notebook now with Cadence Mode Faster th...
    submitted by /u/prfitofthesngularity [link] [comments]  ( 87 min )
    PyTorch: Meta transitions AI framework to PyTorch Foundation
    submitted by /u/Zirius_Sadfaces [link] [comments]  ( 87 min )
    "Google’s New AI: Fly INTO Photos! 🐦"
    submitted by /u/the_anonymizer [link] [comments]  ( 87 min )
    Liquid Robotica
    submitted by /u/widgia [link] [comments]  ( 87 min )
    God Save The Queen by Sex Pistols, Lyrics Illustrated by Midjourney
    submitted by /u/Swisheater [link] [comments]  ( 101 min )
    9 Best Artificial Intelligence books for beginners to expert to read in 2022 -
    submitted by /u/Lakshmireddys [link] [comments]  ( 87 min )
    Best image upscaler?
    Hi, I am looking to increase the resolution of an image as I plan to print it on a poster (60cmx40cm) but the original image (749x1114) is a bit small for this project. Therefore, I am looking for a way to increase the resolution while maintaining the best possible quality. What do you recommend? I already tried with Topaz Gigapixel AI, but the letters at the bottom (the rules) don't look quite right when upscaled. I attach the original image. ​ https://preview.redd.it/i2f7nx6qhcn91.jpg?width=749&format=pjpg&auto=webp&s=a0ac977640c40b61383b6da892b2475bde617eae submitted by /u/eStwart [link] [comments]  ( 87 min )
    Google earth scanning AI
    Does anyone know if an AI exists that effectively scans google earth / maps in a given radius and identifies certain cars? To pretty much find rare cars. If not, how would one go about in making / training this ? (Never done this before) Thank You ! submitted by /u/Few_Sample7624 [link] [comments]  ( 87 min )
    Anyone know where to access ALL of the kim-jung-gi anatomy clips i see everywhere on youtube? Thanks :-)
    submitted by /u/Aggravating-Door80 [link] [comments]  ( 87 min )
    [P] envd: A command-line tool to create development environments for AI/ML
    GitHub: https://github.com/tensorchord/envd What is envd? envd (ɪnˈvdɪ) is a command-line tool that helps you create the container-based development environment for AI/ML. Development environments are full of python and system dependencies, CUDA, BASH scripts, Dockerfiles, SSH configurations, Kubernetes YAMLs, and many other clunky things that are always breaking. envd is to solve the problem: Declare the list of dependencies (CUDA, python packages, your favorite IDE, and so on) in build.envd Simply run envd up. Develop in an isolated environment. https://i.redd.it/jynmoy30vbn91.gif Why use envd? Environments built with envd provide the following features out-of-the-box: ❤️ Knowledge reuse in your team envd build functions can be reused. Use include function to import any gi…  ( 89 min )
  • Open

    Automation of Ports – Global Supply Chains
    International trade grew to a record $28.5 trillion in 2021, according to an estimate by UNCTAD - a 25 percent increase from 2020. Recently, Covid-19 Stop-and-Go maneuvers have highlighted the difficulties that modern ports face. It includes handling an overwhelming amount of goods, their corresponding paperwork, and other administrative tasks, congestion, delays, no coordination between terminal operators, and most importantly, the dependency on manpower. In addition to these, environmental issues have taken the front seat. The post Automation of Ports – Global Supply Chains appeared first on Data Science Central.  ( 20 min )
    How AI/ML will Impact iOS App Development in 2023
    Artificial intelligence has gained huge popularity in the last few years, with its application surging across every business sector. AI has impressively gained massive acceptance in the mobile tech world by bringing diverse facilities to our fingertips. The post How AI/ML will Impact iOS App Development in 2023 appeared first on Data Science Central.  ( 21 min )
    Art and AI: The Line Blurs Further
    Let's look at this story in more detail because in my view, it shows something much more fundamental which is often overlooked. In my view, the future of all jobs will be in collaborating with AI ! The post Art and AI: The Line Blurs Further appeared first on Data Science Central.  ( 20 min )
  • Open

    Save the date: Join AWS at NVIDIA GTC, September 19–22
    Register free for NVIDIA GTC to learn from experts on how AI and the evolution of the 3D internet are profoundly impacting industries—and society as a whole. We have prepared several AWS sessions to give you guidance on how to use AWS services powered by NVIDIA technology to meet your goals. Amazon Elastic Compute Cloud […]  ( 4 min )
    How Medidata used Amazon SageMaker asynchronous inference to accelerate ML inference predictions up to 30 times faster
    This post is co-written with Rajnish Jain, Priyanka Kulkarni and Daniel Johnson from Medidata. Medidata is leading the digital transformation of life sciences, creating hope for millions of patients. Medidata helps generate the evidence and insights to help pharmaceutical, biotech, medical devices, and diagnostics companies as well as academic researchers with accelerating value, minimizing risk, […]  ( 11 min )
  • Open

    [P] Introducing VDP: open-source visual data ETL
    Hello folks!Just want to share that we announced the release of our project VDP in Introducing VDP: open-source visual data ETL last month 🚀. The goal of VDP is to streamline the end-to-end visual data processing pipeline: Extract unstructured visual data from pre-built data sources such as cloud/on-prem storage, or IoT devices Transform it into analysable structured data by Vision AI models imported from various ML platforms Load the transformed data into warehouses, applications, or other destinations ⭐️ VDP on GitHub 👉 VDP documentation submitted by /u/SnooDogs5688 [link] [comments]  ( 88 min )
    [D] How do I add minimum size constraints to k-means clustering?
    Hi everyone, I'm trying to add cluster size constraints to my k-means clustering algorithm. How do I add minimum cluster size constraints to my k-means clustering algorithm? I was able to add max cluster size constraints using below code snippet : while cluster_size[assignment[ j ]]== max_size: j += 1 labels[ i ] = assignment[ j ] cluster_size[assignment[j]] += 1 I'm trying to modify and add min size constraints but couldn't. Can anyone please help me with this? Thanks in advance! submitted by /u/tinkerpal [link] [comments]  ( 90 min )
    [D] PyTorch Distributed Data Parallelism: Under The Hood
    https://lambdalabs.com/blog/multi-node-pytorch-distributed-training-guide/ This is a step-by-step guide that: Walks you through how to scale your PyTorch training across multiple nodes. Provides examples that showcase the boilerplate of PyTorch DDP training code. Shows you how to launch applications using PyTorch’s distributed.launch and torchrun methods, as well as Open MPI’s mpirun method. submitted by /u/mippie_moe [link] [comments]  ( 88 min )
    [D] Taichi VS Triton ?
    How do they compare? What has better optimization? Which is easier to learn? Potential task where I intend to use them : matrix multiplication for SA, augmentation for vision problems, edge deployment for IoT or browser (voice recognition, TTS, translation, etc) submitted by /u/tororo-in [link] [comments]  ( 89 min )
    [D] PyTorch is moving to the Linux Foundation
    https://pytorch.org/blog/PyTorchfoundation/ I wonder if this will lead to a lot of departures at Meta. submitted by /u/m___ke [link] [comments]  ( 94 min )
    [D] How to understand the bias term in language model head (when we tie the word embeddings)?
    I was learning the masked language modeling codebase in Huggingface Transformers. Just a question to understand the language model head. Here at the final linear layer where we project hidden size to vocab size (https://github.com/huggingface/transformers/blob/f2fbe4475386bfcfb3b83d0a3223ba216a3c3a91/src/transformers/models/bert/modeling_bert.py#L685-L702). python3 self.decoder = nn.Linear(config.hidden_size, config.vocab_size, bias=False) self.bias = nn.Parameter(torch.zeros(config.vocab_size)) self.decoder.bias = self.bias We set the bias term to zero at the moment. And later when we initialize the weight, we tie the weight of the linear layer and the word embedding. But we don't do such a thing for the bias term. I wonder how we can understand that and why we want to initialize the …  ( 89 min )
    [P] (code release) Fine-tune your own stable-diffusion vae decoder and dalle-mini decoder
    A few weeks ago, before stable-diffusion was officially released, I found that fine-tuning Dalle-mini's VQGAN decoder can improve the performance on anime images. See: https://preview.redd.it/eekf9hjt3gn91.png?width=1280&format=png&auto=webp&s=25938a4ad284e6cfff958ad0d69968cd2c01ed18 And with a few lines of code change, I was able to train the stable-diffusion VAE decoder. See: https://preview.redd.it/45xogflo5gn91.png?width=1129&format=png&auto=webp&s=43f98e863b918bba9d7471a0cfa7de4dcc8df98c You can find the exact training code used in this repo: https://github.com/cccntu/fine-tune-models/ More details about the models are also in the repo. And you can play with the former model at https://github.com/cccntu/anim_e submitted by /u/cccntu [link] [comments]  ( 89 min )
    [Project] Machine learning and dependent types, with Idris and XLA
    I've built a library to explore machine learning with functional programming and dependent types, including statically-verified shapes like transpose : Tensor [m, n] dtype -> Tensor [n, m] dtype It compiles with XLA, and shares a similar approach to JAX and Dex. I've made significant progress since my post last December: I've implemented much of the linear algebra API, and it now runs on GPU. I next want autodiff, gradient descent and vectorized map (a la JAX's vmap). I've started work on these but there's still much to do. submitted by /u/tmp-1379 [link] [comments]  ( 99 min )
    [Discussion] Embedding based on binary tests
    Hi everyone, I would like to build an embedding of people's tastes based on their preferences given binary choices. In other words : - we have many people - we have many products - we have asked many times people what product they prefer between two randomly given ones And we want to build an embedding for people's tastes. Does anyone have any idea what kind of mathematical problem this is and what kind of algorithm exists ? Thanks submitted by /u/marcollo63 [link] [comments]  ( 90 min )
    [Project] Has anyone used ML to categorize Facebook group posts data?
    I’m imagining a way to catalog all of the crucial “tribal knowledge” that’s currently just lost in the sands of the feed, or at least “prep” the post data with keyword metadata of some kind that could be manually sorted out. submitted by /u/mike_the_seventh [link] [comments]  ( 88 min )
    AlexaTM 20B Large Language Model [R]
    Alexa Teacher Models (AlexaTM) are transformer-based seq-to-seq large-scale multilingual language models. Given only a few examples of a task in a new language, AlexaTM can transfer what it knows to the new language with no extra human supervision. It achieves state-of-the-art (SOTA) performance on 1-shot summarization tasks, outperforming a much larger 540B PaLM decoder model. AlexaTM 20B also achieves SOTA in 1-shot machine translation, especially for low-resource languages, across almost all language pairs supported by the model (Arabic, English, French, German, Hindi, Italian, Japanese, Marathi, Portuguese, Spanish, Tamil, and Telugu) on Flores-101 dataset. In zero-shot setting, AlexaTM 20B outperforms GPT3 (175B) on SuperGLUE and SQuADv2 datasets and provides SOTA performance on multilingual tasks such as XNLI, XCOPA, Paws-X, and XWinograd. And its carbon footprint during training is only one-fifth of GPT-3’s. I have made a video on AlexaTM 20B. Do checkout. https://youtu.be/Wd8amyHlrA4 submitted by /u/Sea-Photo5230 [link] [comments]  ( 100 min )
    [Project] Create a ML model to classify spectrograms
    Hi All, I have a large sample of spectrograms of short audio samples (say 10 different categories). How do I build a Machine Learning model that classifies new Spectrograms into these 10 categories + 1 class? It's part of a small informal project I have picked up, and if possible, kindly recommend multiple methods, so I can evaluate which method works best. TIA! Apologies if this seems very trivial, but I'm kind of stuck on how to even do this. (not an ML student but my mentor insists on using ML for this task) ​ submitted by /u/geeksid2k [link] [comments]  ( 92 min )
    C++ cross platform library for conv-net inference only, recommendations? [D]
    Anyone got a recommendation for a library to do convolutional neural net inference (actually, image to image transformation) , running on as many platform as possible with priorities being: Windows, Linux, Mac OSX/Apple Silicon ; Nvidia GPUs, AMD APUs, Apple silicon as the most important targets. NOT CUDA because of the latter devices. seems the ML world revolves around python for training but I'm guessing an appetite for C++ libs must exist for embedded/"edge" use cases. I dont want to involve python in this at all. how safe is OpenCL these days? I've implemented CNN training myself many years ago in OpenCL, but I'm suspicious about it's future given apple's attitude these days; and rather than rolling my own, something well supported that has a chance of using all the accelerators these days would be better. Ideally I want to integrate this in an SDL2 project. I did a test running convolutions in GL shaders (webgl subset) but I think this gets limited by going through texture lookups instead of general purpose compute reads, but thats my baseline if I can't find something better.. submitted by /u/dobkeratops [link] [comments]  ( 90 min )
    [P] Choosing Edge Board for neural network inference in 2022
    Last few months, I published a few videos about the most popular Edge boards for computer vision (and other NN-models). I tried to compare cheap boards (<150 USD). And not only in terms of speed. I tried to compare the platforms by their “usability.” How easy it would be to export networks, how good the support is. And how easy to work with them. Here is summarising article - https://medium.com/@zlodeibaal/choosing-computer-vision-board-in-2022-b27eb4ca7a7c And here is the benchmark result - https://docs.google.com/spreadsheets/d/1BMj8WImysOSuiT-6O3g15gqHnYF-pUGUhi8VmhhAat4/edit submitted by /u/Wormkeeper [link] [comments]  ( 89 min )
    [R] Maritime Computer Vision Workshop (MaCVi)
    Some time ago, I asked for some advice regarding organizing a workhop as part of WACV 2023. Some people gave some feedback, which I highly appreciated. Now, the workhop titled "1st Workshop on Maritime Computer Vision (MaCVi)" will take place in January in Waikoloa, Hawaii as part of WACV 2023. Just wanted to drop the news in case any1 is interested. We will accept paper submissions (will be published in the WACV Workshop Proceedings and will be included in IEEE Xplore) offer 4 challenges (UAV-based Object Detection & Tracking and USV-based (autonomous boats) Obstacle Segmentation and Detection) provide keynote talks mostly on industry-related maritime CV (whaleseeker, searchwing,...) Key dates are end of October for submission of papers and participation in the challenge and January for the actual workshop. Find all dates on https://seadronessee.cs.uni-tuebingen.de/wacv23. Teaser here ;) https://www.youtube.com/watch?v=qjJP80Q9Xo4 ​ Let me know if you have any questions. Cheers! submitted by /u/SP4ETZUENDER [link] [comments]  ( 89 min )
    [D] How to understand the frequency of Convolution kernel in Graph Convolutional Network?
    I read this paper recently, There is opion GCN have favor of low frequency feature, combining differenct frequency feature will improve the result well in this paper. I don't know how to understand the frequency in GCN, anyone have some advice ? Sincerely thanks. submitted by /u/waa007 [link] [comments]  ( 101 min )
    [D] Masters Programmes or PhD (taught?) with a strong focus on Applied Mathematics and Statistics
    So I’m currently working at a research institute in applied research. My background has been interdisciplinary and I did my masters in CS. I really enjoy the research and I’m hoping to do a PhD once I solved my financial situation a little, my contract here ends and I have published a paper. My CS masters was a little bit of a downer and I certainly didn’t learn as much as I hoped for. I would say my ML knowledge is solid, I have an understanding of advanced topics and decent programming skills due to industry- and research experience. I do notice though I am lacking some foundational math and statistics skills. My undergraduate only involved some math and masters close to none. And I really notice this, I have this strong urge to really dive deep into understanding the maths. Especially conceptually. I’m comfortable with the basics like Linear Algebra, vector calculus. Bayesian statistics, and applied math is a little foreign to me. I hope to get a really solid understanding in maths and applied maths because I want to do applied ml research. I’m hoping to learn the skills that when I have a problem, and I know what I would like to optimize for, how to represent this mathematically, which formulas to use etc. And I really just want to dig deep into the math. Anyways I’m looking for a Programme in the EU where I would be learning exactly that. I tried teaching myself but my work projects suck up too much time. Next question would also be if it’s necessary to do another masters or if I can jump right into a PhD. Anyways, I’d appreciate recommendations on mathematically rigorous Programms with a focus on statistics and applied math! submitted by /u/ameli__c [link] [comments]  ( 90 min )
    [R] LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
    https://arxiv.org/abs/2208.07339 submitted by /u/That_Violinist_18 [link] [comments]  ( 101 min )
    [D] What would be the best way to match an image to a database of existing images?
    I run a business where we print customer artwork on a tees, hoodies and we want to build a system that we can take a photo of a printed garment, and associate it with an existing image in our database of our customer's logos to see which order this particular item belongs too. Is there any starting points for me to looking into building such a system? EDIT: Thanks for all the great answers! To add a bit more context, we only have a few hundred current and recent artworks we need to associate the printed tee shirt too. Some of them may be a complex full colour logo, others are simple words or short sentences or something in between. All are stored in our database as a 300 DPI transparent png file. The reason we need to associate the t-shirt to an order, is that by this association, we print the correct fedex or other courier label. We currently use a QR code sticker on each item, but these often fall off, and take a bit of time to scan. I was hoping to put a camera on the table where the t-shirts are folded so the system can identify which customer the t-shirt belongs too, so it can print the correct courier label. submitted by /u/onthedoleagain [link] [comments]  ( 97 min )
    [R] Learning with Differentiable Algorithms
    submitted by /u/hardmaru [link] [comments]  ( 103 min )
  • Open

    How do you compute rewards when you are using parallel environments?
    I am looking at this repo: https://github.com/marlbenchmark/on-policy/blob/af4dc22aaf05b281d9e2e4f43c9ebb9eca48137e/onpolicy/runner/separated/mpe_runner.py#L67 And I am wondering: how do you compute rewards when you have multiple parallel environments? Do you take an average? I don't see any reference here to multiple rollout threads, so what happens when you do have some? Thanks! submitted by /u/No_Possibility_7588 [link] [comments]  ( 89 min )
    What modifications can maximize the efficacy of the REINFORCE algorithm for a policy gradient task?
    I am straying out of my domain knowledge to attempt a basic reinforcement learning task in a toy environment and have become fairly familiar with the REINFORCE algorithm for policy gradient agents, especially PyTorch’s implementation (found here). It is clear to me now that there are superior methods to train RL agents (PPO for instance), but as I read, these feel beyond my current intellectual or time resources. As such, I’d like to eek out as much power through modifications of REINFORCE as possible before determining how I might move on. As such, are there modifications to the REINFORCE training algorithm that might yield benefits without straying far into new algorithm territory? Or perhaps, what is the “SOTA” version of REINFORCE? For instance, perhaps a simple gradient clip in some way approximates some of PPO’s benefits? Or maybe setting a baseline reward based on a rolling reward set of previous episodes? ----------------------------- If a specific context is useful, I’m applying this to a self-made simple grid environment where agents receive points namely for moving closer to and acquiring “targets.” There are other rewards of lesser significance, but key is the environment is set for a sort of “continuous play” such that agents are very frequently receiving rewards (due to closeness) and occasionally receiving reward spikes (due to getting a target), but there is no true episode definition other than an arbitrary timestep length. I am not using batches (perhaps that is useful?), as I have found that gradually stepping up the episode length allows the agents quicker access to simpler rewards and appears a sort of scaffolding to more complex behavior. Agents might reasonably gather the “large” rewards in as few as 10-25 steps. Generally things are working fine, and I am most interested in how to extract as much value from the agent updating mechanism (REINFORCE) as possible. submitted by /u/jshkk [link] [comments]  ( 90 min )
    Learning continuous actions from images.
    Hi everyone, I'm working on a robotics project where I'm learning a task from raw RGB images. the issue I'm having is that when I use a discrete action space it works very well whereas, whenever I use a continuous action space the model doesn't converge and the reward fluctuates a lot. anyone might have an explanation? submitted by /u/Many_Reception_4921 [link] [comments]  ( 87 min )
    How to view image a 3d image using custom callback
    Hi guys, i have a custom environment which has a 3d numpy image, and i am doing path planning in this 3d array/image ,what i want to do is create a custom call back such that i can do real time tracking of the path formed i.e i dont want to plot image everytime just the path on the image ,so how can i do tha submitted by /u/Historical-Stock-750 [link] [comments]  ( 88 min )
    How to convert timestep based learning to episodic learning
    I am using a custom environment to to path planning using ddpg algorithm , model = DDPG("MlpPolicy", env, action_noise=action_noise, verbose=1) model.learn(total_timesteps=10000, log_interval=1) model.save("sb3_ddpg_model") here model.learn is used for timestep based learning but i want to convert it to certain like 3000 steps per episode and have multiple episodes,how can i achieve that? submitted by /u/Historical-Stock-750 [link] [comments]  ( 99 min )
  • Open

    Graphing Japanese Prefectures
    The two previous posts looked at adjacency networks. The first used examples of US states and Texas counties. The second post made suggestions for using these networks in a classroom. This post is a continuation of the previous post using examples from Japan. Japan is divided into 8 regions and 47 prefectures. Here is a […] Graphing Japanese Prefectures first appeared on John D. Cook.  ( 5 min )
    Classroom exercise with networks
    In the previous post I looked at graphs created from representing geographic regions with nodes and connecting nodes with edges if the corresponding regions share a border. It’s an interesting exercise to recover the geographic regions from the network. For example, take a look at the graph for the continental United States. It’s easy to […] Classroom exercise with networks first appeared on John D. Cook.  ( 5 min )
    Adjacency networks
    Suppose you want to color a map with no two bordering regions having the same color. If this is a map on a plane, you can do this using only four colors, but maybe you’d like to use more. You can reduce the problem to coloring the nodes in a graph. Each node corresponds to […] Adjacency networks first appeared on John D. Cook.  ( 4 min )
  • Open

    Face skin analyzer with Fast.ai and Gradio
    Fast.ai is a revolutionary library created by Jeremy Howard who was a former Kaggle no 1 Grandmaster. He has developed the Fast.ai course…  ( 7 min )
  • Open

    A quick question about feeding the output of one neural network to another to adjust predicted values. Is it sensible or could it create/reinforce bias?
    Hello, So basically I am working in a very slow manner on the project that will try to predict sequential data. So basically, let's say that we have 2 neural nets: 1) LSTM neural net to predict next item from the given sequention 2) Common neural net (layer dense) to adjust prediction of the 1'st neural net. So let's say the LSTM predicts that the value will be 25, but in real life the value is 24.33, is there any sense in feeding the prediction of 25 with etiquete of 24.33 to the second neural network to adjust the prediction? My Data consists of 15 000 items in each attribute. I am just asking because I am not sure if it's even something that might give good results or something that will just reinforce bias or create one. And since creating, training and evaluating neural nets take a lot of time I thought I'd better ask :) Thanks for help in advance :) submitted by /u/skollehatti [link] [comments]  ( 89 min )
  • Open

    The need for a more human-centered approach to designing and validating transparent AI in medical image analysis -- Guidelines and Evidence from a Systematic Review. (arXiv:2112.12596v2 [cs.HC] UPDATED)
    Transparency in Machine Learning (ML), attempts to reveal the working mechanisms of complex models. Transparent ML promises to advance human factors engineering goals of human-centered AI in the target users. From a human-centered design perspective, transparency is not a property of the ML model but an affordance, i.e. a relationship between algorithm and user; as a result, iterative prototyping and evaluation with users is critical to attaining adequate solutions that afford transparency. However, following human-centered design principles in healthcare and medical image analysis is challenging due to the limited availability of and access to end users. To investigate the state of transparent ML in medical image analysis, we conducted a systematic review of the literature. Our review reveals multiple severe shortcomings in the design and validation of transparent ML for medical image analysis applications. We find that most studies to date approach transparency as a property of the model itself, similar to task performance, without considering end users during neither development nor evaluation. Additionally, the lack of user research, and the sporadic validation of transparency claims put contemporary research on transparent ML for medical image analysis at risk of being incomprehensible to users, and thus, clinically irrelevant. To alleviate these shortcomings in forthcoming research while acknowledging the challenges of human-centered design in healthcare, we introduce the INTRPRT guideline, a systematic design directive for transparent ML systems in medical image analysis. The INTRPRT guideline suggests formative user research as the first step of transparent model design to understand user needs and domain requirements. Following this process produces evidence to support design choices, and ultimately, increases the likelihood that the algorithms afford transparency.  ( 3 min )
    Explainability Is in the Mind of the Beholder: Establishing the Foundations of Explainable Artificial Intelligence. (arXiv:2112.14466v2 [cs.AI] UPDATED)
    Explainable artificial intelligence and interpretable machine learning are research domains growing in importance. Yet, the underlying concepts remain somewhat elusive and lack generally agreed definitions. While recent inspiration from social sciences has refocused the work on needs and expectations of human recipients, the field still misses a concrete conceptualisation. We take steps towards addressing this challenge by reviewing the philosophical and social foundations of human explainability, which we then translate into the technological realm. In particular, we scrutinise the notion of algorithmic black boxes and the spectrum of understanding determined by explanatory processes and explainees' background knowledge. This approach allows us to define explainability as (logical) reasoning applied to transparent insights (into, possibly black-box, predictive systems) interpreted under background knowledge and placed within a specific context -- a process that engenders understanding in a selected group of explainees. We then employ this conceptualisation to revisit strategies for evaluating explainability as well as the much disputed trade-off between transparency and predictive power, including its implications for ante-hoc and post-hoc techniques along with fairness and accountability established by explainability. We furthermore discuss components of the machine learning workflow that may be in need of interpretability, building on a range of ideas from human-centred explainability, with a particular focus on explainees, contrastive statements and explanatory processes. Our discussion reconciles and complements current research to help better navigate open questions -- rather than attempting to address any individual issue -- thus laying a solid foundation for a grounded discussion and future progress of explainable artificial intelligence and interpretable machine learning.  ( 3 min )
    Controllable Data Generation by Deep Learning: A Review. (arXiv:2207.09542v3 [cs.LG] UPDATED)
    Designing and generating new data under targeted properties has been attracting various critical applications such as molecule design, image editing and speech synthesis. Traditional hand-crafted approaches heavily rely on expertise experience and intensive human efforts, yet still suffer from the insufficiency of scientific knowledge and low throughput to support effective and efficient data generation. Recently, the advancement of deep learning induces expressive methods that can learn the underlying representation and properties of data. Such capability provides new opportunities in figuring out the mutual relationship between the structural patterns and functional properties of the data and leveraging such relationship to generate structural data given the desired properties. This article provides a systematic review of this promising research area, commonly known as controllable deep data generation. Firstly, the potential challenges are raised and preliminaries are provided. Then the controllable deep data generation is formally defined, a taxonomy on various techniques is proposed and the evaluation metrics in this specific domain are summarized. After that, exciting applications of controllable deep data generation are introduced and existing works are experimentally analyzed and compared. Finally, the promising future directions of controllable deep data generation are highlighted and five potential challenges are identified.  ( 3 min )
    Exact Recovery in the General Hypergraph Stochastic Block Model. (arXiv:2105.04770v2 [cs.IT] UPDATED)
    This paper investigates fundamental limits of exact recovery in the general d-uniform hypergraph stochastic block model (d-HSBM), wherein n nodes are partitioned into k disjoint communities with relative sizes (p1,..., pk). Each subset of nodes with cardinality d is generated independently as an order-d hyperedge with a certain probability that depends on the ground-truth communities that the d nodes belong to. The goal is to exactly recover the k hidden communities based on the observed hypergraph. We show that there exists a sharp threshold such that exact recovery is achievable above the threshold and impossible below the threshold (apart from a small regime of parameters that will be specified precisely). This threshold is represented in terms of a quantity which we term as the generalized Chernoff-Hellinger divergence between communities. Our result for this general model recovers prior results for the standard SBM and d-HSBM with two symmetric communities as special cases. En route to proving our achievability results, we develop a polynomial-time two-stage algorithm that meets the threshold. The first stage adopts a certain hypergraph spectral clustering method to obtain a coarse estimate of communities, and the second stage refines each node individually via local refinement steps to ensure exact recovery.  ( 3 min )
    A PDE-Based Analysis of the Symmetric Two-Armed Bernoulli Bandit. (arXiv:2202.05767v2 [cs.LG] UPDATED)
    This work addresses a version of the two-armed Bernoulli bandit problem where the sum of the means of the arms is one (the symmetric two-armed Bernoulli bandit). In a regime where the gap between these means goes to zero and the number of prediction periods approaches infinity, we obtain the leading order terms of the expected regret and pseudoregret for this problem by associating each of them with a solution of a linear parabolic partial differential equation. Our results improve upon the previously known results; specifically, we explicitly compute the leading order term of the optimal regret and pseudoregret in three different scaling regimes for the gap. Additionally, we obtain new non-asymptotic bounds for any given time horizon.  ( 2 min )
    Extending Open Bandit Pipeline to Simulate Industry Challenges. (arXiv:2209.04147v1 [cs.LG])
    Bandit algorithms are often used in the e-commerce industry to train Machine Learning (ML) systems when pre-labeled data is unavailable. However, the industry setting poses various challenges that make implementing bandit algorithms in practice non-trivial. In this paper, we elaborate on the challenges of off-policy optimisation, delayed reward, concept drift, reward design, and business rules constraints that practitioners at Booking.com encounter when applying bandit algorithms. Our main contributions is an extension to the Open Bandit Pipeline (OBP) framework. We provide simulation components for some of the above-mentioned challenges to provide future practitioners, researchers, and educators with a resource to address challenges encountered in the e-commerce industry.  ( 2 min )
    Self-Supervised Learning of Context-Aware Pitch Prosody Representations. (arXiv:2007.09060v4 [cs.SD] CROSS LISTED)
    In music and speech, meaning is derived at multiple levels of context. Affect, for example, can be inferred both by a short sound token and by sonic patterns over a longer temporal window such as an entire recording. In this letter, we focus on inferring meaning from this dichotomy of contexts. We show how contextual representations of short sung vocal lines can be implicitly learned from fundamental frequency ($F_0$) and thus be used as a meaningful feature space for downstream Music Information Retrieval (MIR) tasks. We propose three self-supervised deep learning paradigms which leverage pseudotask learning of these two levels of context to produce latent representation spaces. We evaluate the usefulness of these representations by embedding unseen pitch contours into each space and conducting downstream classification tasks. Our results show that contextual representation can enhance downstream classification by as much as 15\% as compared to using traditional statistical contour features.  ( 2 min )
    Saliency Guided Adversarial Training for Learning Generalizable Features with Applications to Medical Imaging Classification System. (arXiv:2209.04326v1 [eess.IV])
    This work tackles a central machine learning problem of performance degradation on out-of-distribution (OOD) test sets. The problem is particularly salient in medical imaging based diagnosis system that appears to be accurate but fails when tested in new hospitals/datasets. Recent studies indicate the system might learn shortcut and non-relevant features instead of generalizable features, so-called good features. We hypothesize that adversarial training can eliminate shortcut features whereas saliency guided training can filter out non-relevant features; both are nuisance features accounting for the performance degradation on OOD test sets. With that, we formulate a novel model training scheme for the deep neural network to learn good features for classification and/or detection tasks ensuring a consistent generalization performance on OOD test sets. The experimental results qualitatively and quantitatively demonstrate the superior performance of our method using the benchmark CXR image data sets on classification tasks.  ( 2 min )
    Using Probabilistic Machine Learning to Better Model Temporal Patterns in Parameterizations: a case study with the Lorenz 96 model. (arXiv:2203.14814v3 [cs.LG] UPDATED)
    The modelling of small-scale processes is a major source of error in climate models, hindering the accuracy of low-cost models which must approximate such processes through parameterization. Red noise is essential to many operational parameterization schemes, helping model temporal correlations. We show how to build on the successes of red noise by combining the known benefits of stochasticity with machine learning. This is done using a physically-informed recurrent neural network within a probabilistic framework. Our model is competitive and often superior to both a bespoke baseline and an existing probabilistic machine learning approach (GAN) when applied to the Lorenz 96 atmospheric simulation. This is due to its superior ability to model temporal patterns compared to standard first-order autoregressive schemes. It also generalises to unseen scenarios. We evaluate across a number of metrics from the literature, and also discuss the benefits of using the probabilistic metric of hold-out likelihood.  ( 3 min )
    JR2net: A Joint Non-Linear Representation and Recovery Network for Compressive Spectral Imaging. (arXiv:2205.07770v2 [eess.IV] UPDATED)
    Deep learning models are state-of-the-art in compressive spectral imaging (CSI) recovery. These methods use a deep neural network (DNN) as an image generator to learn non-linear mapping from compressed measurements to the spectral image. For instance, the deep spectral prior approach uses a convolutional autoencoder network (CAE) in the optimization algorithm to recover the spectral image by using a non-linear representation. However, the CAE training is detached from the recovery problem, which does not guarantee optimal representation of the spectral images for the CSI problem. This work proposes a joint non-linear representation and recovery network (JR2net), linking the representation and recovery task into a single optimization problem. JR2net consists of an optimization-inspired network following an ADMM formulation that learns a non-linear low-dimensional representation and simultaneously performs the spectral image recovery, trained via the end-to-end approach. Experimental results show the superiority of the proposed method with improvements up to 2.57 dB in PSNR and performance around 2000 times faster than state-of-the-art methods.  ( 2 min )
    Differentially Private Decoding in Large Language Models. (arXiv:2205.13621v2 [cs.CL] UPDATED)
    Recent large-scale natural language processing (NLP) systems use a pre-trained Large Language Model (LLM) on massive and diverse corpora as a headstart. In practice, the pre-trained model is adapted to a wide array of tasks via fine-tuning on task-specific datasets. LLMs, while effective, have been shown to memorize instances of training data thereby potentially revealing private information processed during pre-training. The potential leakage might further propagate to the downstream tasks for which LLMs are fine-tuned. On the other hand, privacy-preserving algorithms usually involve retraining from scratch, which is prohibitively expensive for LLMs. In this work, we propose a simple, easy to interpret, and computationally lightweight perturbation mechanism to be applied to an already trained model at the decoding stage. Our perturbation mechanism is model-agnostic and can be used in conjunction with any LLM. We provide theoretical analysis showing that the proposed mechanism is differentially private, and experimental results showing a privacy-utility trade-off.  ( 2 min )
    Review of Pedestrian Trajectory Prediction Methods: Comparing Deep Learning and Knowledge-based Approaches. (arXiv:2111.06740v2 [cs.LG] UPDATED)
    In crowd scenarios, predicting trajectories of pedestrians is a complex and challenging task depending on many external factors. The topology of the scene and the interactions between the pedestrians are just some of them. Due to advancements in data-science and data collection technologies deep learning methods have recently become a research hotspot in numerous domains. Therefore, it is not surprising that more and more researchers apply these methods to predict trajectories of pedestrians. This paper compares these relatively new deep learning algorithms with classical knowledge-based models that are widely used to simulate pedestrian dynamics. It provides a comprehensive literature review of both approaches, explores technical and application oriented differences, and addresses open questions as well as future development directions. Our investigations point out that the pertinence of knowledge-based models to predict local trajectories is nowadays questionable because of the high accuracy of the deep learning algorithms. Nevertheless, the ability of deep-learning algorithms for large-scale simulation and the description of collective dynamics remains to be demonstrated. Furthermore, the comparison shows that the combination of both approaches (the hybrid approach) seems to be promising to overcome disadvantages like the missing explainability of the deep learning approach.  ( 3 min )
    Rare but Severe Neural Machine Translation Errors Induced by Minimal Deletion: An Empirical Study on Chinese and English. (arXiv:2209.02145v2 [cs.CL] UPDATED)
    We examine the inducement of rare but severe errors in English-Chinese and Chinese-English in-domain neural machine translation by minimal deletion of the source text with character-based models. By deleting a single character, we find that we can induce severe errors in the translation. We categorize these errors and compare the results of deleting single characters and single words. We also examine the effect of training data size on the number and types of pathological cases induced by these minimal perturbations, finding significant variation.  ( 2 min )
    Data-Driven Deep Learning Based Hybrid Beamforming for Aerial Massive MIMO-OFDM Systems with Implicit CSI. (arXiv:2201.06778v3 [eess.SP] UPDATED)
    In an aerial hybrid massive multiple-input multiple-output (MIMO) and orthogonal frequency division multiplexing (OFDM) system, how to design a spectral-efficient broadband multi-user hybrid beamforming with a limited pilot and feedback overhead is challenging. To this end, by modeling the key transmission modules as an end-to-end (E2E) neural network, this paper proposes a data-driven deep learning (DL)-based unified hybrid beamforming framework for both the time division duplex (TDD) and frequency division duplex (FDD) systems with implicit channel state information (CSI). For TDD systems, the proposed DL-based approach jointly models the uplink pilot combining and downlink hybrid beamforming modules as an E2E neural network. While for FDD systems, we jointly model the downlink pilot transmission, uplink CSI feedback, and downlink hybrid beamforming modules as an E2E neural network. Different from conventional approaches separately processing different modules, the proposed solution simultaneously optimizes all modules with the sum rate as the optimization object. Therefore, by perceiving the inherent property of air-to-ground massive MIMO-OFDM channel samples, the DL-based E2E neural network can establish the mapping function from the channel to the beamformer, so that the explicit channel reconstruction can be avoided with reduced pilot and feedback overhead. Besides, practical low-resolution phase shifters (PSs) introduce the quantization constraint, leading to the intractable gradient backpropagation when training the neural network. To mitigate the performance loss caused by the phase quantization error, we adopt the transfer learning strategy to further fine-tune the E2E neural network based on a pre-trained network that assumes the ideal infinite-resolution PSs. Numerical results show that our DL-based schemes have considerable advantages over state-of-the-art schemes.  ( 3 min )
    The Distributed Discrete Gaussian Mechanism for Federated Learning with Secure Aggregation. (arXiv:2102.06387v4 [cs.LG] UPDATED)
    We consider training models on private data that are distributed across user devices. To ensure privacy, we add on-device noise and use secure aggregation so that only the noisy sum is revealed to the server. We present a comprehensive end-to-end system, which appropriately discretizes the data and adds discrete Gaussian noise before performing secure aggregation. We provide a novel privacy analysis for sums of discrete Gaussians and carefully analyze the effects of data quantization and modular summation arithmetic. Our theoretical guarantees highlight the complex tension between communication, privacy, and accuracy. Our extensive experimental results demonstrate that our solution is essentially able to match the accuracy to central differential privacy with less than 16 bits of precision per value.  ( 2 min )
    EDeNN: Event Decay Neural Networks for low latency vision. (arXiv:2209.04362v1 [cs.CV])
    Despite the success of neural networks in computer vision tasks, digital 'neurons' are a very loose approximation of biological neurons. Today's learning approaches are designed to function on digital devices with digital data representations such as image frames. In contrast, biological vision systems are generally much more capable and efficient than state-of-the-art digital computer vision algorithms. Event cameras are an emerging sensor technology which imitates biological vision with asynchronously firing pixels, eschewing the concept of the image frame. To leverage modern learning techniques, many event-based algorithms are forced to accumulate events back to image frames, somewhat squandering the advantages of event cameras. We follow the opposite paradigm and develop a new type of neural network which operates closer to the original event data stream. We demonstrate state-of-the-art performance in angular velocity regression and competitive optical flow estimation, while avoiding difficulties related to training SNN. Furthermore, the processing latency of our proposed approached is less than 1/10 any other implementation, while continuous inference increases this improvement by another order of magnitude.  ( 2 min )
    Random Vector Functional Link Networks for Function Approximation on Manifolds. (arXiv:2007.15776v2 [stat.ML] UPDATED)
    The learning speed of feed-forward neural networks is notoriously slow and has presented a bottleneck in deep learning applications for several decades. For instance, gradient-based learning algorithms, which are used extensively to train neural networks, tend to work slowly when all of the network parameters must be iteratively tuned. To counter this, both researchers and practitioners have tried introducing randomness to reduce the learning requirement. Based on the original construction of Igelnik and Pao, single layer neural-networks with random input-to-hidden layer weights and biases have seen success in practice, but the necessary theoretical justification is lacking. In this paper, we begin to fill this theoretical gap. We provide a (corrected) rigorous proof that the Igelnik and Pao construction is a universal approximator for continuous functions on compact domains, with approximation error decaying asymptotically like $O(1/\sqrt{n})$ for the number $n$ of network nodes. We then extend this result to the non-asymptotic setting, proving that one can achieve any desired approximation error with high probability provided $n$ is sufficiently large. We further adapt this randomized neural network architecture to approximate functions on smooth, compact submanifolds of Euclidean space, providing theoretical guarantees in both the asymptotic and non-asymptotic forms. Finally, we illustrate our results on manifolds with numerical experiments.  ( 3 min )
    Hcore-Init: Neural Network Initialization based on Graph Degeneracy. (arXiv:2004.07636v2 [cs.LG] UPDATED)
    Neural networks are the pinnacle of Artificial Intelligence, as in recent years we witnessed many novel architectures, learning and optimization techniques for deep learning. Capitalizing on the fact that neural networks inherently constitute multipartite graphs among neuron layers, we aim to analyze directly their structure to extract meaningful information that can improve the learning process. To our knowledge graph mining techniques for enhancing learning in neural networks have not been thoroughly investigated. In this paper we propose an adapted version of the k-core structure for the complete weighted multipartite graph extracted from a deep learning architecture. As a multipartite graph is a combination of bipartite graphs, that are in turn the incidence graphs of hypergraphs, we design k-hypercore decomposition, the hypergraph analogue of k-core degeneracy. We applied k-hypercore to several neural network architectures, more specifically to convolutional neural networks and multilayer perceptrons for image recognition tasks after a very short pretraining. Then we used the information provided by the hypercore numbers of the neurons to re-initialize the weights of the neural network, thus biasing the gradient optimization scheme. Extensive experiments proved that k-hypercore outperforms the state-of-the-art initialization methods.  ( 3 min )
    Trustworthy Federated Learning via Blockchain. (arXiv:2209.04418v1 [cs.LG])
    The safety-critical scenarios of artificial intelligence (AI), such as autonomous driving, Internet of Things, smart healthcare, etc., have raised critical requirements of trustworthy AI to guarantee the privacy and security with reliable decisions. As a nascent branch for trustworthy AI, federated learning (FL) has been regarded as a promising privacy preserving framework for training a global AI model over collaborative devices. However, security challenges still exist in the FL framework, e.g., Byzantine attacks from malicious devices, and model tampering attacks from malicious server, which will degrade or destroy the accuracy of trained global AI model. In this paper, we shall propose a decentralized blockchain based FL (B-FL) architecture by using a secure global aggregation algorithm to resist malicious devices, and deploying practical Byzantine fault tolerance consensus protocol with high effectiveness and low energy consumption among multiple edge servers to prevent model tampering from the malicious server. However, to implement B-FL system at the network edge, multiple rounds of cross-validation in blockchain consensus protocol will induce long training latency. We thus formulate a network optimization problem that jointly considers bandwidth and power allocation for the minimization of long-term average training latency consisting of progressive learning rounds. We further propose to transform the network optimization problem as a Markov decision process and leverage the deep reinforcement learning based algorithm to provide high system performance with low computational complexity. Simulation results demonstrate that B-FL can resist malicious attacks from edge devices and servers, and the training latency of B-FL can be significantly reduced by deep reinforcement learning based algorithm compared with baseline algorithms.  ( 3 min )
    The Surprising Positive Knowledge Transfer in Continual 3D Object Shape Reconstruction. (arXiv:2101.07295v5 [cs.LG] UPDATED)
    Continual learning has been extensively studied for classification tasks with methods developed to primarily avoid catastrophic forgetting, a phenomenon where earlier learned concepts are forgotten at the expense of more recent samples. In this work, we present a set of continual 3D object shape reconstruction tasks, including complete 3D shape reconstruction from different input modalities, as well as visible surface (2.5D) reconstruction which, surprisingly demonstrate positive knowledge (backward and forward) transfer when training with solely standard SGD and without additional heuristics. We provide evidence that continuously updated representation learning of single-view 3D shape reconstruction improves the performance on learned and novel categories over time. We provide a novel analysis of knowledge transfer ability by looking at the output distribution shift across sequential learning tasks. Finally, we show that the robustness of these tasks leads to the potential of having a proxy representation learning task for continual classification. The codebase, dataset and pre-trained models released with this article can be found at https://github.com/rehg-lab/CLRec  ( 3 min )
    MICO: Selective Search with Mutual Information Co-training. (arXiv:2209.04378v1 [cs.IR])
    In contrast to traditional exhaustive search, selective search first clusters documents into several groups before all the documents are searched exhaustively by a query, to limit the search executed within one group or only a few groups. Selective search is designed to reduce the latency and computation in modern large-scale search systems. In this study, we propose MICO, a Mutual Information CO-training framework for selective search with minimal supervision using the search logs. After training, MICO does not only cluster the documents, but also routes unseen queries to the relevant clusters for efficient retrieval. In our empirical experiments, MICO significantly improves the performance on multiple metrics of selective search and outperforms a number of existing competitive baselines.  ( 2 min )
    Differentially Private Stochastic Gradient Descent with Low-Noise. (arXiv:2209.04188v1 [stat.ML])
    In this paper, by introducing a low-noise condition, we study privacy and utility (generalization) performances of differentially private stochastic gradient descent (SGD) algorithms in a setting of stochastic convex optimization (SCO) for both pointwise and pairwise learning problems. For pointwise learning, we establish sharper excess risk bounds of order $\mathcal{O}\Big( \frac{\sqrt{d\log(1/\delta)}}{n\epsilon} \Big)$ and $\mathcal{O}\Big( {n^{- \frac{1+\alpha}{2}}}+\frac{\sqrt{d\log(1/\delta)}}{n\epsilon}\Big)$ for the $(\epsilon,\delta)$-differentially private SGD algorithm for strongly smooth and $\alpha$-H\"older smooth losses, respectively, where $n$ is the sample size and $d$ is the dimensionality. For pairwise learning, inspired by \cite{lei2020sharper,lei2021generalization}, we propose a simple private SGD algorithm based on gradient perturbation which satisfies $(\epsilon,\delta)$-differential privacy, and develop novel utility bounds for the proposed algorithm. In particular, we prove that our algorithm can achieve excess risk rates $\mathcal{O}\Big(\frac{1}{\sqrt{n}}+\frac{\sqrt{d\log(1/\delta)}}{n\epsilon}\Big)$ with gradient complexity $\mathcal{O}(n)$ and $\mathcal{O}\big(n^{\frac{2-\alpha}{1+\alpha}}+n\big)$ for strongly smooth and $\alpha$-H\"older smooth losses, respectively. Further, faster learning rates are established in a low-noise setting for both smooth and non-smooth losses. To the best of our knowledge, this is the first utility analysis which provides excess population bounds better than $\mathcal{O}\Big(\frac{1}{\sqrt{n}}+\frac{\sqrt{d\log(1/\delta)}}{n\epsilon}\Big)$ for privacy-preserving pairwise learning.  ( 2 min )
    BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies?. (arXiv:2105.04949v4 [cs.CL] UPDATED)
    Analogies play a central role in human commonsense reasoning. The ability to recognize analogies such as "eye is to seeing what ear is to hearing", sometimes referred to as analogical proportions, shape how we structure knowledge and understand language. Surprisingly, however, the task of identifying such analogies has not yet received much attention in the language model era. In this paper, we analyze the capabilities of transformer-based language models on this unsupervised task, using benchmarks obtained from educational settings, as well as more commonly used datasets. We find that off-the-shelf language models can identify analogies to a certain extent, but struggle with abstract and complex relations, and results are highly sensitive to model architecture and hyperparameters. Overall the best results were obtained with GPT-2 and RoBERTa, while configurations using BERT were not able to outperform word embedding models. Our results raise important questions for future work about how, and to what extent, pre-trained language models capture knowledge about abstract semantic relations.  ( 3 min )
    Investigation of a Machine learning methodology for the SKA pulsar search pipeline. (arXiv:2209.04430v1 [astro-ph.IM])
    The SKA pulsar search pipeline will be used for real time detection of pulsars. Modern radio telescopes such as SKA will be generating petabytes of data in their full scale of operation. Hence experience-based and data-driven algorithms become indispensable for applications such as candidate detection. Here we describe our findings from testing a state of the art object detection algorithm called Mask R-CNN to detect candidate signatures in the SKA pulsar search pipeline. We have trained the Mask R-CNN model to detect candidate images. A custom annotation tool was developed to mark the regions of interest in large datasets efficiently. We have successfully demonstrated this algorithm by detecting candidate signatures on a simulation dataset. The paper presents details of this work with a highlight on the future prospects.  ( 2 min )
    Zydeco-Style Spike Sorting Low Power VLSI Architecture for IoT BCI Implants. (arXiv:2209.04427v1 [cs.AR])
    Brain Computer Interface (BCI) has great potential for solving many brain signal analysis limitations, mental disorder resolutions, and restoring missing limb functionality via neural-controlled implants. However, there is no single available, and safe implant for daily life usage exists yet. Most of the proposed implants have several implementation issues, such as infection hazards and heat dissipation, which limits their usability and makes it more challenging to pass regulations and quality control production. The wireless implant does not require a chronic wound in the skull. However, the current complex clustering neuron identification algorithms inside the implant chip consume a lot of power and bandwidth, causing higher heat dissipation issues and draining the implant's battery. The spike sorting is the core unit of an invasive BCI chip, which plays a significant role in power consumption, accuracy, and area. Therefore, in this study, we propose a low-power adaptive simplified VLSI architecture, "Zydeco-Style," for BCI spike sorting that is computationally less complex with higher accuracy that performs up to 93.5% in the worst-case scenario. The architecture uses a low-power Bluetooth Wireless communication module with external IoT medical ICU devices. The proposed architecture was implemented and simulated in Verilog. In addition, we are proposing an implant conceptual design.  ( 3 min )
    Survey: Leakage and Privacy at Inference Time. (arXiv:2107.01614v2 [cs.LG] UPDATED)
    Leakage of data from publicly available Machine Learning (ML) models is an area of growing significance as commercial and government applications of ML can draw on multiple sources of data, potentially including users' and clients' sensitive data. We provide a comprehensive survey of contemporary advances on several fronts, covering involuntary data leakage which is natural to ML models, potential malevolent leakage which is caused by privacy attacks, and currently available defence mechanisms. We focus on inference-time leakage, as the most likely scenario for publicly available models. We first discuss what leakage is in the context of different data, tasks, and model architectures. We then propose a taxonomy across involuntary and malevolent leakage, available defences, followed by the currently available assessment metrics and applications. We conclude with outstanding challenges and open questions, outlining some promising directions for future research.  ( 2 min )
    Risk-Averse Multi-Armed Bandits with Unobserved Confounders: A Case Study in Emotion Regulation in Mobile Health. (arXiv:2209.04356v1 [cs.LG])
    In this paper, we consider a risk-averse multi-armed bandit (MAB) problem where the goal is to learn a policy that minimizes the risk of low expected return, as opposed to maximizing the expected return itself, which is the objective in the usual approach to risk-neutral MAB. Specifically, we formulate this problem as a transfer learning problem between an expert and a learner agent in the presence of contexts that are only observable by the expert but not by the learner. Thus, such contexts are unobserved confounders (UCs) from the learner's perspective. Given a dataset generated by the expert that excludes the UCs, the goal for the learner is to identify the true minimum-risk arm with fewer online learning steps, while avoiding possible biased decisions due to the presence of UCs in the expert's data.  ( 2 min )
    Fast Neural Kernel Embeddings for General Activations. (arXiv:2209.04121v1 [cs.LG])
    Infinite width limit has shed light on generalization and optimization aspects of deep learning by establishing connections between neural networks and kernel methods. Despite their importance, the utility of these kernel methods was limited in large-scale learning settings due to their (super-)quadratic runtime and memory complexities. Moreover, most prior works on neural kernels have focused on the ReLU activation, mainly due to its popularity but also due to the difficulty of computing such kernels for general activations. In this work, we overcome such difficulties by providing methods to work with general activations. First, we compile and expand the list of activation functions admitting exact dual activation expressions to compute neural kernels. When the exact computation is unknown, we present methods to effectively approximate them. We propose a fast sketching method that approximates any multi-layered Neural Network Gaussian Process (NNGP) kernel and Neural Tangent Kernel (NTK) matrices for a wide range of activation functions, going beyond the commonly analyzed ReLU activation. This is done by showing how to approximate the neural kernels using the truncated Hermite expansion of any desired activation functions. While most prior works require data points on the unit sphere, our methods do not suffer from such limitations and are applicable to any dataset of points in $\mathbb{R}^d$. Furthermore, we provide a subspace embedding for NNGP and NTK matrices with near input-sparsity runtime and near-optimal target dimension which applies to any \emph{homogeneous} dual activation functions with rapidly convergent Taylor expansion. Empirically, with respect to exact convolutional NTK (CNTK) computation, our method achieves $106\times$ speedup for approximate CNTK of a 5-layer Myrtle network on CIFAR-10 dataset.  ( 3 min )
    Fuzzy Attention Neural Network to Tackle Discontinuity in Airway Segmentation. (arXiv:2209.02048v2 [eess.IV] UPDATED)
    Airway segmentation is crucial for the examination, diagnosis, and prognosis of lung diseases, while its manual delineation is unduly burdensome. To alleviate this time-consuming and potentially subjective manual procedure, researchers have proposed methods to automatically segment airways from computerized tomography (CT) images. However, some small-sized airway branches (e.g., bronchus and terminal bronchioles) significantly aggravate the difficulty of automatic segmentation by machine learning models. In particular, the variance of voxel values and the severe data imbalance in airway branches make the computational module prone to discontinuous and false-negative predictions. especially for cohorts with different lung diseases. Attention mechanism has shown the capacity to segment complex structures, while fuzzy logic can reduce the uncertainty in feature representations. Therefore, the integration of deep attention networks and fuzzy theory, given by the fuzzy attention layer, should be an escalated solution for better generalization and robustness. This paper presents an efficient method for airway segmentation, comprising a novel fuzzy attention neural network and a comprehensive loss function to enhance the spatial continuity of airway segmentation. The deep fuzzy set is formulated by a set of voxels in the feature map and a learnable Gaussian membership function. Different from the existing attention mechanism, the proposed channel-specific fuzzy attention addresses the issue of heterogeneous features in different channels. Furthermore, a novel evaluation metric is proposed to assess both the continuity and completeness of airway structures. The efficiency, generalization and robustness of the proposed method have been proved by training on normal lung disease while testing on datasets of lung cancer, COVID-19 and pulmonary fibrosis.
    On a Conjecture Regarding the Adam Optimizer. (arXiv:2111.08162v4 [cs.LG] UPDATED)
    Why does the Adam optimizer work so well in deep-learning applications? Adam's originators, Kingma and Ba, presented a mathematical argument that was meant to help explain its success, but Bock and colleagues have since reported that a key piece is missing from that argument $-$ an unproven lemma which we will call Bock's conjecture. Here we show that this conjecture is false, but we prove a modified version of it $-$ a generalization of a result of Reddi and colleagues $-$ which can take its place in analyses of Adam.
    Grounding Hindsight Instructions in Multi-Goal Reinforcement Learning for Robotics. (arXiv:2204.04308v2 [cs.LG] UPDATED)
    This paper focuses on robotic reinforcement learning with sparse rewards for natural language goal representations. An open problem is the sample-inefficiency that stems from the compositionality of natural language, and from the grounding of language in sensory data and actions. We address these issues with three contributions. We first present a mechanism for hindsight instruction replay utilizing expert feedback. Second, we propose a seq2seq model to generate linguistic hindsight instructions. Finally, we present a novel class of language-focused learning tasks. We show that hindsight instructions improve the learning performance, as expected. In addition, we also provide an unexpected result: We show that the learning performance of our agent can be improved by one third if, in a sense, the agent learns to talk to itself in a self-supervised manner. We achieve this by learning to generate linguistic instructions that would have been appropriate as a natural language goal for an originally unintended behavior. Our results indicate that the performance gain increases with the task-complexity.
    PhML-DyR: A Physics-Informed ML framework for Dynamic Reconfiguration in Power Systems. (arXiv:2206.06789v2 [eess.SY] UPDATED)
    A transformation of the US electricity sector is underway with aggressive targets to achieve 100% carbon pollution-free electricity by 2035. To achieve this objective while maintaining a safe and reliable power grid, new operating paradigms are needed, of computationally fast and accurate decision making in a dynamic and uncertain environment. We propose a novel physics-informed machine learning framework for the decision of dynamic grid reconfiguration (PhML-DyR), a key task in power systems. Dynamic reconfiguration (DyR) is a process by which switch-states are dynamically set so as to lead to an optimal grid topology that minimizes line losses. To address the underlying computational complexities of NP-hardness due to the mixed nature of the decision variables, we propose the use of physics-informed ML (PhML) which integrates both operating constraints and topological and connectivity constraints into a neural network framework. Our PhML approach learns to simultaneously optimize grid topology and generator dispatch to meet loads, increase efficiency, and remain within safe operating limits. We demonstrate the effectiveness of PhML-DyR on a canonical grid, showing a reduction in electricity loss by 23%, and improved voltage profiles. We also show a reduction in constraint violations by an order of magnitude as well as in training time using PhML-DyR.
    Newton methods based convolution neural networks using parallel processing. (arXiv:2112.01401v2 [cs.LG] UPDATED)
    Training of convolutional neural networks is a high dimensional and a non-convex optimization problem. At present, it is inefficient in situations where parametric learning rates can not be confidently set. Some past works have introduced Newton methods for training deep neural networks. Newton methods for convolutional neural networks involve complicated operations. Finding the Hessian matrix in second-order methods becomes very complex as we mainly use the finite differences method with the image data. Newton methods for convolutional neural networks deals with this by using the sub-sampled Hessian Newton methods. In this paper, we have used the complete data instead of the sub-sampled methods that only handle partial data at a time. Further, we have used parallel processing instead of serial processing in mini-batch computations. The results obtained using parallel processing in this study, outperform the time taken by the previous approach.
    On Positional and Structural Node Features for Graph Neural Networks on Non-attributed Graphs. (arXiv:2107.01495v2 [cs.LG] UPDATED)
    Graph neural networks (GNNs) have been widely used in various graph-related problems such as node classification and graph classification, where superior performance is mainly established when natural node features are available. However, it is not well understood how GNNs work without natural node features, especially regarding the various ways to construct artificial ones. In this paper, we point out the two types of artificial node features, i.e., positional and structural node features, and provide insights on why each of them is more appropriate for certain tasks, i.e., positional node classification, structural node classification, and graph classification. Extensive experimental results on 10 benchmark datasets validate our insights, thus leading to a practical guideline on the choices between different artificial node features for GNNs on non-attributed graphs. The code is available at https://github.com/zjzijielu/gnn-positional-structural-node-features.
    Condensing Graphs via One-Step Gradient Matching. (arXiv:2206.07746v3 [cs.LG] UPDATED)
    As training deep learning models on large dataset takes a lot of time and resources, it is desired to construct a small synthetic dataset with which we can train deep learning models sufficiently. There are recent works that have explored solutions on condensing image datasets through complex bi-level optimization. For instance, dataset condensation (DC) matches network gradients w.r.t. large-real data and small-synthetic data, where the network weights are optimized for multiple steps at each outer iteration. However, existing approaches have their inherent limitations: (1) they are not directly applicable to graphs where the data is discrete; and (2) the condensation process is computationally expensive due to the involved nested optimization. To bridge the gap, we investigate efficient dataset condensation tailored for graph datasets where we model the discrete graph structure as a probabilistic model. We further propose a one-step gradient matching scheme, which performs gradient matching for only one single step without training the network weights. Our theoretical analysis shows this strategy can generate synthetic graphs that lead to lower classification loss on real graphs. Extensive experiments on various graph datasets demonstrate the effectiveness and efficiency of the proposed method. In particular, we are able to reduce the dataset size by 90% while approximating up to 98% of the original performance and our method is significantly faster than multi-step gradient matching (e.g. 15x in CIFAR10 for synthesizing 500 graphs). Code is available at \url{https://github.com/amazon-research/DosCond}.
    Bit-Metric Decoding Rate in Multi-User MIMO Systems: Applications. (arXiv:2203.06273v3 [cs.IT] UPDATED)
    This is the second part of a two-part paper that focuses on link-adaptation (LA) and physical layer (PHY) abstraction for multi-user MIMO (MU-MIMO) systems with non-linear receivers. The first part proposes a new metric, called bit-metric decoding rate (BMDR) for a detector, as being the equivalent of post-equalization signal-to-interference-noise ratio (SINR) for non-linear receivers. Since this BMDR does not have a closed form expression, a machine-learning based approach to estimate it effectively is presented. In this part, the concepts developed in the first part are utilized to develop novel algorithms for LA, dynamic detector selection from a list of available detectors, and PHY abstraction in MU-MIMO systems with arbitrary receivers. Extensive simulation results that substantiate the efficacy of the proposed algorithms are presented.
    RASR: Risk-Averse Soft-Robust MDPs with EVaR and Entropic Risk. (arXiv:2209.04067v1 [cs.LG])
    Prior work on safe Reinforcement Learning (RL) has studied risk-aversion to randomness in dynamics (aleatory) and to model uncertainty (epistemic) in isolation. We propose and analyze a new framework to jointly model the risk associated with epistemic and aleatory uncertainties in finite-horizon and discounted infinite-horizon MDPs. We call this framework that combines Risk-Averse and Soft-Robust methods RASR. We show that when the risk-aversion is defined using either EVaR or the entropic risk, the optimal policy in RASR can be computed efficiently using a new dynamic program formulation with a time-dependent risk level. As a result, the optimal risk-averse policies are deterministic but time-dependent, even in the infinite-horizon discounted setting. We also show that particular RASR objectives reduce to risk-averse RL with mean posterior transition probabilities. Our empirical results show that our new algorithms consistently mitigate uncertainty as measured by EVaR and other standard risk measures.
    HyperMAML: Few-Shot Adaptation of Deep Models with Hypernetworks. (arXiv:2205.15745v2 [cs.LG] UPDATED)
    The aim of Few-Shot learning methods is to train models which can easily adapt to previously unseen tasks, based on small amounts of data. One of the most popular and elegant Few-Shot learning approaches is Model-Agnostic Meta-Learning (MAML). The main idea behind this method is to learn the general weights of the meta-model, which are further adapted to specific problems in a small number of gradient steps. However, the model's main limitation lies in the fact that the update procedure is realized by gradient-based optimisation. In consequence, MAML cannot always modify weights to the essential level in one or even a few gradient iterations. On the other hand, using many gradient steps results in a complex and time-consuming optimization procedure, which is hard to train in practice, and may lead to overfitting. In this paper, we propose HyperMAML, a novel generalization of MAML, where the training of the update procedure is also part of the model. Namely, in HyperMAML, instead of updating the weights with gradient descent, we use for this purpose a trainable Hypernetwork. Consequently, in this framework, the model can generate significant updates whose range is not limited to a fixed number of gradient steps. Experiments show that HyperMAML consistently outperforms MAML and performs comparably to other state-of-the-art techniques in a number of standard Few-Shot learning benchmarks.
    ANITA: An Optimal Loopless Accelerated Variance-Reduced Gradient Method. (arXiv:2103.11333v3 [math.OC] UPDATED)
    In this paper, we propose a novel accelerated gradient method called ANITA for solving the fundamental finite-sum optimization problems. Concretely, we consider both general convex and strongly convex settings: i) For general convex finite-sum problems, ANITA improves previous state-of-the-art result given by Varag (Lan et al., 2019). In particular, for large-scale problems or the convergence error is not very small, i.e., $n \geq \frac{1}{\epsilon^2}$, ANITA obtains the \emph{first} optimal result $O(n)$, matching the lower bound $\Omega(n)$ provided by Woodworth and Srebro (2016), while previous results are $O(n \log \frac{1}{\epsilon})$ of Varag (Lan et al., 2019) and $O(\frac{n}{\sqrt{\epsilon}})$ of Katyusha (Allen-Zhu, 2017). ii) For strongly convex finite-sum problems, we also show that ANITA can achieve the optimal convergence rate $O\big((n+\sqrt{\frac{nL}{\mu}})\log\frac{1}{\epsilon}\big)$ matching the lower bound $\Omega\big((n+\sqrt{\frac{nL}{\mu}})\log\frac{1}{\epsilon}\big)$ provided by Lan and Zhou (2015). Besides, ANITA enjoys a simpler loopless algorithmic structure unlike previous accelerated algorithms such as Varag (Lan et al., 2019) and Katyusha (Allen-Zhu, 2017) where they use double-loop structures. Moreover, we provide a novel \emph{dynamic multi-stage convergence analysis}, which is the key technical part for improving previous results to the optimal rates. We believe that our new theoretical rates and novel convergence analysis for the fundamental finite-sum problem will directly lead to key improvements for many other related problems, such as distributed/federated/decentralized optimization problems (e.g., Li and Richt\'arik, 2021). Finally, the numerical experiments show that ANITA converges faster than the previous state-of-the-art Varag (Lan et al., 2019), validating our theoretical results and confirming the practical superiority of ANITA.
    Constrained multi-objective optimization of process design parameters in settings with scarce data: an application to adhesive bonding. (arXiv:2112.08760v2 [cs.NE] UPDATED)
    Adhesive joints are increasingly used in industry for a wide variety of applications because of their favorable characteristics such as high strength-to-weight ratio, design flexibility, limited stress concentrations, planar force transfer, good damage tolerance and fatigue resistance. Finding the optimal process parameters for an adhesive bonding process is challenging: the optimization is inherently multi-objective (aiming to maximize break strength while minimizing cost) and constrained (the process should not result in any visual damage to the materials, and stress tests should not result in failures that are adhesion-related). Real life physical experiments in the lab are expensive to perform; traditional evolutionary approaches (such as genetic algorithms) are then ill-suited to solve the problem, due to the prohibitive amount of experiments required for evaluation. In this research, we successfully applied specific machine learning techniques (Gaussian Process Regression and Logistic Regression) to emulate the objective and constraint functions based on a \emph{limited} amount of experimental data. The techniques are embedded in a Bayesian optimization algorithm, which succeeds in detecting Pareto-optimal process settings in a highly efficient way (i.e., requiring a limited number of extra experiments).
    Solving Multistage Stochastic Linear Programming via Regularized Linear Decision Rules: An Application to Hydrothermal Dispatch Planning. (arXiv:2110.03146v2 [math.OC] UPDATED)
    The solution of multistage stochastic linear problems (MSLP) represents a challenge for many applications. Long-term hydrothermal dispatch planning (LHDP) materializes this challenge in a real-world problem that affects electricity markets, economies, and natural resources worldwide. No closed-form solutions are available for MSLP and the definition of non-anticipative policies with high-quality out-of-sample performance of is crucial. Linear decision rules (LDR) provide an interesting simulation-based framework for finding high-quality policies to MSLP through two-stage stochastic models. In practical applications, however, the number of parameters to be estimated when using an LDR may be close or higher than the number of scenarios of the sample average approximation problem, thereby generating an in-sample overfit and poor performances in out-of-sample simulations. In this paper, we propose a novel regularized LDR to solve MSLP based on the AdaLASSO (adaptive least absolute shrinkage and selection operator). The goal is to use the parsimony principle as largely studied in high-dimensional linear regression models to obtain better out-of-sample performance for a LDR applied to MSLP. Computational experiments show that the overfit threat is non-negligible when using the classical non-regularized LDR to solve the LHDP, one of the most studied MSLP with relevant applications in industry. Our analysis highlights the following benefits of the proposed framework in comparison to the non-regularized benchmark: 1) significant reductions in the number of non-zero coefficients (model parsimony), 2) substantial cost reductions in out-of-sample evaluations, and 3) improved spot-price profiles.
    Learning Branching Heuristics for Propositional Model Counting. (arXiv:2007.03204v2 [cs.LG] UPDATED)
    Propositional model counting, or #SAT, is the problem of computing the number of satisfying assignments of a Boolean formula. Many problems from different application areas, including many discrete probabilistic inference problems, can be translated into model counting problems to be solved by #SAT solvers. Exact #SAT solvers, however, are often not scalable to industrial size instances. In this paper, we present Neuro#, an approach for learning branching heuristics to improve the performance of exact #SAT solvers on instances from a given family of problems. We experimentally show that our method reduces the step count on similarly distributed held-out instances and generalizes to much larger instances from the same problem family. It is able to achieve these results on a number of different problem families having very different structures. In addition to step count improvements, Neuro# can also achieve orders of magnitude wall-clock speedups over the vanilla solver on larger instances in some problem families, despite the runtime overhead of querying the model.  ( 3 min )
    Shapley value-based approaches to explain the robustness of classifiers in machine learning. (arXiv:2209.04254v1 [cs.LG])
    In machine learning, the use of algorithm-agnostic approaches is an emerging area of research for explaining the contribution of individual features towards the predicted outcome. Whilst there is a focus on explaining the prediction itself, a little has been done on explaining the robustness of these models, that is, how each feature contributes towards achieving that robustness. In this paper, we propose the use of Shapley values to explain the contribution of each feature towards the model's robustness, measured in terms of Receiver-operating Characteristics (ROC) curve and the Area under the ROC curve (AUC). With the help of an illustrative example, we demonstrate the proposed idea of explaining the ROC curve, and visualising the uncertainties in these curves. For imbalanced datasets, the use of Precision-Recall Curve (PRC) is considered more appropriate, therefore we also demonstrate how to explain the PRCs with the help of Shapley values.
    SC-Square: Future Progress with Machine Learning?. (arXiv:2209.04361v1 [cs.SC])
    The algorithms employed by our communities are often underspecified, and thus have multiple implementation choices, which do not effect the correctness of the output, but do impact the efficiency or even tractability of its production. In this extended abstract, to accompany a keynote talk at the 2021 SC-Square Workshop, we survey recent work (both the author's and from the literature) on the use of Machine Learning technology to improve algorithms of interest to SC-Square.
    Knowledge-based Deep Learning for Modeling Chaotic Systems. (arXiv:2209.04259v1 [cs.LG])
    Deep Learning has received increased attention due to its unbeatable success in many fields, such as computer vision, natural language processing, recommendation systems, and most recently in simulating multiphysics problems and predicting nonlinear dynamical systems. However, modeling and forecasting the dynamics of chaotic systems remains an open research problem since training deep learning models requires big data, which is not always available in many cases. Such deep learners can be trained from additional information obtained from simulated results and by enforcing the physical laws of the chaotic systems. This paper considers extreme events and their dynamics and proposes elegant models based on deep neural networks, called knowledge-based deep learning (KDL). Our proposed KDL can learn the complex patterns governing chaotic systems by jointly training on real and simulated data directly from the dynamics and their differential equations. This knowledge is transferred to model and forecast real-world chaotic events exhibiting extreme behavior. We validate the efficiency of our model by assessing it on three real-world benchmark datasets: El Nino sea surface temperature, San Juan Dengue viral infection, and Bj{\o}rn{\o}ya daily precipitation, all governed by extreme events' dynamics. Using prior knowledge of extreme events and physics-based loss functions to lead the neural network learning, we ensure physically consistent, generalizable, and accurate forecasting, even in a small data regime.
    Majority Vote for Distributed Differentially Private Sign Selection. (arXiv:2209.04419v1 [cs.CR])
    Privacy-preserving data analysis has become prevailing in recent years. In this paper, we propose a distributed group differentially private majority vote mechanism for the sign selection problem in a distributed setup. To achieve this, we apply the iterative peeling to the stability function and use the exponential mechanism to recover the signs. As applications, we study the private sign selection for mean estimation and linear regression problems in distributed systems. Our method recovers the support and signs with the optimal signal-to-noise ratio as in the non-private scenario, which is better than contemporary works of private variable selections. Moreover, the sign selection consistency is justified with theoretical guarantees. Simulation studies are conducted to demonstrate the effectiveness of our proposed method.
    ApproxTrain: Fast Simulation of Approximate Multipliers for DNN Training and Inference. (arXiv:2209.04161v1 [cs.AR])
    Edge training of Deep Neural Networks (DNNs) is a desirable goal for continuous learning; however, it is hindered by the enormous computational power required by training. Hardware approximate multipliers have shown their effectiveness for gaining resource-efficiency in DNN inference accelerators; however, training with approximate multipliers is largely unexplored. To build resource efficient accelerators with approximate multipliers supporting DNN training, a thorough evaluation of training convergence and accuracy for different DNN architectures and different approximate multipliers is needed. This paper presents ApproxTrain, an open-source framework that allows fast evaluation of DNN training and inference using simulated approximate multipliers. ApproxTrain is as user-friendly as TensorFlow (TF) and requires only a high-level description of a DNN architecture along with C/C++ functional models of the approximate multiplier. We improve the speed of the simulation at the multiplier level by using a novel LUT-based approximate floating-point (FP) multiplier simulator on GPU (AMSim). ApproxTrain leverages CUDA and efficiently integrates AMSim into the TensorFlow library, in order to overcome the absence of native hardware approximate multiplier in commercial GPUs. We use ApproxTrain to evaluate the convergence and accuracy of DNN training with approximate multipliers for small and large datasets (including ImageNet) using LeNets and ResNets architectures. The evaluations demonstrate similar convergence behavior and negligible change in test accuracy compared to FP32 and bfloat16 multipliers. Compared to CPU-based approximate multiplier simulations in training and inference, the GPU-accelerated ApproxTrain is more than 2500x faster. Based on highly optimized closed-source cuDNN/cuBLAS libraries with native hardware multipliers, the original TensorFlow is only 8x faster than ApproxTrain.
    Robust-by-Design Classification via Unitary-Gradient Neural Networks. (arXiv:2209.04293v1 [cs.LG])
    The use of neural networks in safety-critical systems requires safe and robust models, due to the existence of adversarial attacks. Knowing the minimal adversarial perturbation of any input x, or, equivalently, knowing the distance of x from the classification boundary, allows evaluating the classification robustness, providing certifiable predictions. Unfortunately, state-of-the-art techniques for computing such a distance are computationally expensive and hence not suited for online applications. This work proposes a novel family of classifiers, namely Signed Distance Classifiers (SDCs), that, from a theoretical perspective, directly output the exact distance of x from the classification boundary, rather than a probability score (e.g., SoftMax). SDCs represent a family of robust-by-design classifiers. To practically address the theoretical requirements of a SDC, a novel network architecture named Unitary-Gradient Neural Network is presented. Experimental results show that the proposed architecture approximates a signed distance classifier, hence allowing an online certifiable classification of x at the cost of a single inference.
    Survey on Deep Fuzzy Systems in regression applications: a view on interpretability. (arXiv:2209.04230v1 [cs.LG])
    Regression problems have been more and more embraced by deep learning (DL) techniques. The increasing number of papers recently published in this domain, including surveys and reviews, shows that deep regression has captured the attention of the community due to efficiency and good accuracy in systems with high-dimensional data. However, many DL methodologies have complex structures that are not readily transparent to human users. Accessing the interpretability of these models is an essential factor for addressing problems in sensitive areas such as cyber-security systems, medical, financial surveillance, and industrial processes. Fuzzy logic systems (FLS) are inherently interpretable models, well known in the literature, capable of using nonlinear representations for complex systems through linguistic terms with membership degrees mimicking human thought. Within an atmosphere of explainable artificial intelligence, it is necessary to consider a trade-off between accuracy and interpretability for developing intelligent models. This paper aims to investigate the state-of-the-art on existing methodologies that combine DL and FLS, namely deep fuzzy systems, to address regression problems, configuring a topic that is currently not sufficiently explored in the literature and thus deserves a comprehensive survey.
    Anomaly Detection through Unsupervised Federated Learning. (arXiv:2209.04184v1 [cs.LG])
    Federated learning (FL) is proving to be one of the most promising paradigms for leveraging distributed resources, enabling a set of clients to collaboratively train a machine learning model while keeping the data decentralized. The explosive growth of interest in the topic has led to rapid advancements in several core aspects like communication efficiency, handling non-IID data, privacy, and security capabilities. However, the majority of FL works only deal with supervised tasks, assuming that clients' training sets are labeled. To leverage the enormous unlabeled data on distributed edge devices, in this paper, we aim to extend the FL paradigm to unsupervised tasks by addressing the problem of anomaly detection in decentralized settings. In particular, we propose a novel method in which, through a preprocessing phase, clients are grouped into communities, each having similar majority (i.e., inlier) patterns. Subsequently, each community of clients trains the same anomaly detection model (i.e., autoencoders) in a federated fashion. The resulting model is then shared and used to detect anomalies within the clients of the same community that joined the corresponding federated process. Experiments show that our method is robust, and it can detect communities consistent with the ideal partitioning in which groups of clients having the same inlier patterns are known. Furthermore, the performance is significantly better than those in which clients train models exclusively on local data and comparable with federated models of ideal communities' partition.
    Towards Confidence-guided Shape Completion for Robotic Applications. (arXiv:2209.04300v1 [cs.CV])
    Many robotic tasks involving some form of 3D visual perception greatly benefit from a complete knowledge of the working environment. However, robots often have to tackle unstructured environments and their onboard visual sensors can only provide incomplete information due to limited workspaces, clutter or object self-occlusion. In recent years, deep learning architectures for shape completion have begun taking traction as effective means of inferring a complete 3D object representation from partial visual data. Nevertheless, most of the existing state-of-the-art approaches provide a fixed output resolution in the form of voxel grids, strictly related to the size of the neural network output stage. While this is enough for some tasks, e.g. obstacle avoidance in navigation, grasping and manipulation require finer resolutions and simply scaling up the neural network outputs is computationally expensive. In this paper, we address this limitation by proposing an object shape completion method based on an implicit 3D representation providing a confidence value for each reconstructed point. As a second contribution, we propose a gradient-based method for efficiently sampling such implicit function at an arbitrary resolution, tunable at inference time. We experimentally validate our approach by comparing reconstructed shapes with ground truths, and by deploying our shape completion algorithm in a robotic grasping pipeline. In both cases, we compare results with a state-of-the-art shape completion approach.
    Design of a Supervisory Control System for Autonomous Operation of Advanced Reactors. (arXiv:2209.04334v1 [eess.SY])
    Advanced reactors deployed in the coming decades will face deregulated energy markets, and may adopt flexible operation to boost profitability. To aid in the transition from baseload to flexible operation paradigm, autonomous operation is sought. This work focuses on the control aspect of autonomous operation. Specifically, a hierarchical control system is designed to support constraint enforcement during routine operational transients. Within the system, data-driven modeling, physics-based state observation, and classical control algorithms are integrated to provide an adaptable and robust solution. A 320 MW Fluoride-cooled High-temperature Pebble-bed Reactor is the design basis for demonstrating the control system. The hierarchical control system consists of a supervisory layer and low-level layer. The supervisory layer receives requests to change the system's operating conditions, and accepts or rejects them based on constraints that have been assigned. Constraints are issued to keep the plant within an optimal operating region. The low-level layer interfaces with the actuators of the system to fulfill requested changes, while maintaining tracking and regulation duties. To accept requests at the supervisory layer, the Reference Governor algorithm was adopted. To model the dynamics of the reactor, a system identification algorithm, Dynamic Mode Decomposition, was utilized. To estimate the evolution of process variables that cannot be directly measured, the Unscented Kalman Filter was adopted, incorporating a nonlinear model of nuclear dynamics. The composition of these algorithms led to a numerical demonstration of constraint enforcement during a 40 % power drop transient. Adaptability of the proposed system was demonstrated by modifying the constraint values, and enforcing them during the transient. Robustness was also demonstrated by enforcing constraints under noisy environments.
    Algorithms with More Granular Differential Privacy Guarantees. (arXiv:2209.04053v1 [cs.CR])
    Differential privacy is often applied with a privacy parameter that is larger than the theory suggests is ideal; various informal justifications for tolerating large privacy parameters have been proposed. In this work, we consider partial differential privacy (DP), which allows quantifying the privacy guarantee on a per-attribute basis. In this framework, we study several basic data analysis and learning tasks, and design algorithms whose per-attribute privacy parameter is smaller that the best possible privacy parameter for the entire record of a person (i.e., all the attributes).
    FedDAR: Federated Domain-Aware Representation Learning. (arXiv:2209.04007v1 [cs.LG])
    Cross-silo Federated learning (FL) has become a promising tool in machine learning applications for healthcare. It allows hospitals/institutions to train models with sufficient data while the data is kept private. To make sure the FL model is robust when facing heterogeneous data among FL clients, most efforts focus on personalizing models for clients. However, the latent relationships between clients' data are ignored. In this work, we focus on a special non-iid FL problem, called Domain-mixed FL, where each client's data distribution is assumed to be a mixture of several predefined domains. Recognizing the diversity of domains and the similarity within domains, we propose a novel method, FedDAR, which learns a domain shared representation and domain-wise personalized prediction heads in a decoupled manner. For simplified linear regression settings, we have theoretically proved that FedDAR enjoys a linear convergence rate. For general settings, we have performed intensive empirical studies on both synthetic and real-world medical datasets which demonstrate its superiority over prior FL methods.
    Multiplierless MP-Kernel Machine For Energy-efficient Edge Devices. (arXiv:2106.01958v3 [cs.LG] UPDATED)
    We present a novel framework for designing multiplierless kernel machines that can be used on resource-constrained platforms like intelligent edge devices. The framework uses a piecewise linear (PWL) approximation based on a margin propagation (MP) technique and uses only addition/subtraction, shift, comparison, and register underflow/overflow operations. We propose a hardware-friendly MP-based inference and online training algorithm that has been optimized for a Field Programmable Gate Array (FPGA) platform. Our FPGA implementation eliminates the need for DSP units and reduces the number of LUTs. By reusing the same hardware for inference and training, we show that the platform can overcome classification errors and local minima artifacts that result from the MP approximation. The implementation of this proposed multiplierless MP-kernel machine on FPGA results in an estimated energy consumption of 13.4 pJ and power consumption of 107 mW with ~9k LUTs and FFs each for a 256 x 32 sized kernel making it superior in terms of power, performance, and area compared to other comparable implementations.
    Functional dimension of feedforward ReLU neural networks. (arXiv:2209.04036v1 [math.MG])
    It is well-known that the parameterized family of functions representable by fully-connected feedforward neural networks with ReLU activation function is precisely the class of piecewise linear functions with finitely many pieces. It is less well-known that for every fixed architecture of ReLU neural network, the parameter space admits positive-dimensional spaces of symmetries, and hence the local functional dimension near any given parameter is lower than the parametric dimension. In this work we carefully define the notion of functional dimension, show that it is inhomogeneous across the parameter space of ReLU neural network functions, and continue an investigation - initiated in [14] and [5] - into when the functional dimension achieves its theoretical maximum. We also study the quotient space and fibers of the realization map from parameter space to function space, supplying examples of fibers that are disconnected, fibers upon which functional dimension is non-constant, and fibers upon which the symmetry group acts non-transitively.
    In-situ animal behavior classification using knowledge distillation and fixed-point quantization. (arXiv:2209.04130v1 [cs.LG])
    We explore the use of knowledge distillation (KD) for learning compact and accurate models that enable classification of animal behavior from accelerometry data on wearable devices. To this end, we take a deep and complex convolutional neural network, known as residual neural network (ResNet), as the teacher model. ResNet is specifically designed for multivariate time-series classification. We use ResNet to distil the knowledge of animal behavior classification datasets into soft labels, which consist of the predicted pseudo-probabilities of every class for each datapoint. We then use the soft labels to train our significantly less complex student models, which are based on the gated recurrent unit (GRU) and multilayer perceptron (MLP). The evaluation results using two real-world animal behavior classification datasets show that the classification accuracy of the student GRU-MLP models improves appreciably through KD, approaching that of the teacher ResNet model. To further reduce the computational and memory requirements of performing inference using the student models trained via KD, we utilize dynamic fixed-point quantization through an appropriate modification of the computational graphs of the models. We implement both unquantized and quantized versions of the developed KD-based models on the embedded systems of our purpose-built collar and ear tag devices to classify animal behavior in situ and in real time. The results corroborate the effectiveness of KD and quantization in improving the inference performance in terms of both classification accuracy and computational and memory efficiency.
    Deep autoencoders for physics-constrained data-driven nonlinear materials modeling. (arXiv:2209.04416v1 [math.NA])
    Physics-constrained data-driven computing is an emerging computational paradigm that allows simulation of complex materials directly based on material database and bypass the classical constitutive model construction. However, it remains difficult to deal with high-dimensional applications and extrapolative generalization. This paper introduces deep learning techniques under the data-driven framework to address these fundamental issues in nonlinear materials modeling. To this end, an autoencoder neural network architecture is introduced to learn the underlying low-dimensional representation (embedding) of the given material database. The offline trained autoencoder and the discovered embedding space are then incorporated in the online data-driven computation such that the search of optimal material state from database can be performed on a low-dimensional space, aiming to enhance the robustness and predictability with projected material data. To ensure numerical stability and representative constitutive manifold, a convexity-preserving interpolation scheme tailored to the proposed autoencoder-based data-driven solver is proposed for constructing the material state. In this study, the applicability of the proposed approach is demonstrated by modeling nonlinear biological tissues. A parametric study on data noise, data size and sparsity, training initialization, and model architectures, is also conducted to examine the robustness and convergence property of the proposed approach.
    Cross-Subject Domain Adaptation for Classifying Working Memory Load with Multi-Frame EEG Images. (arXiv:2106.06769v4 [cs.LG] UPDATED)
    Working memory (WM), denoting the information temporally stored in the mind, is a fundamental research topic in the field of human cognition. Electroencephalograph (EEG), which can monitor the electrical activity of the brain, has been widely used in measuring the level of WM. However, one of the critical challenges is that individual differences may cause ineffective results, especially when the established model meets an unfamiliar subject. In this work, we propose a cross-subject deep adaptation model with spatial attention (CS-DASA) to generalize the workload classifications across subjects. First, we transform EEG time series into multi-frame EEG images incorporating spatial, spectral, and temporal information. First, the Subject-Shared module in CS-DASA receives multi-frame EEG image data from both source and target subjects and learns the common feature representations. Then, in the subject-specific module, the maximum mean discrepancy is implemented to measure the domain distribution divergence in a reproducing kernel Hilbert space, which can add an effective penalty loss for domain adaptation. Additionally, the subject-to-subject spatial attention mechanism is employed to focus on the discriminative spatial features from the target image data. Experiments conducted on a public WM EEG dataset containing 13 subjects show that the proposed model is capable of achieving better performance than existing state-of-the-art methods.
    Are Gradients on Graph Structure Reliable in Gray-box Attacks?. (arXiv:2208.05514v2 [cs.CR] UPDATED)
    Graph edge perturbations are dedicated to damaging the prediction of graph neural networks by modifying the graph structure. Previous gray-box attackers employ gradients from the surrogate model to locate the vulnerable edges to perturb the graph structure. However, unreliability exists in gradients on graph structures, which is rarely studied by previous works. In this paper, we discuss and analyze the errors caused by the unreliability of the structural gradients. These errors arise from rough gradient usage due to the discreteness of the graph structure and from the unreliability in the meta-gradient on the graph structure. In order to address these problems, we propose a novel attack model with methods to reduce the errors inside the structural gradients. We propose edge discrete sampling to select the edge perturbations associated with hierarchical candidate selection to ensure computational efficiency. In addition, semantic invariance and momentum gradient ensemble are proposed to address the gradient fluctuation on semantic-augmented graphs and the instability of the surrogate model. Experiments are conducted in untargeted gray-box poisoning scenarios and demonstrate the improvement in the performance of our approach.
    Generating Contextual Load Profiles Using a Conditional Variational Autoencoder. (arXiv:2209.04056v1 [cs.LG])
    Generating power system states that have similar distribution and dependency to the historical ones is essential for the tasks of system planning and security assessment, especially when the historical data is insufficient. In this paper, we described a generative model for load profiles of industrial and commercial customers, based on the conditional variational autoencoder (CVAE) neural network architecture, which is challenging due to the highly variable nature of such profiles. Generated contextual load profiles were conditioned on the month of the year and typical power exchange with the grid. Moreover, the quality of generations was both visually and statistically evaluated. The experimental results demonstrate our proposed CVAE model can capture temporal features of historical load profiles and generate `realistic' data with satisfying univariate distributions and multivariate dependencies.
    Estimating Multi-label Accuracy using Labelset Distributions. (arXiv:2209.04163v1 [cs.LG])
    A multi-label classifier estimates the binary label state (relevant vs irrelevant) for each of a set of concept labels, for any given instance. Probabilistic multi-label classifiers provide a predictive posterior distribution over all possible labelset combinations of such label states (the powerset of labels) from which we can provide the best estimate, simply by selecting the labelset corresponding to the largest expected accuracy, over that distribution. For example, in maximizing exact match accuracy, we provide the mode of the distribution. But how does this relate to the confidence we may have in such an estimate? Confidence is an important element of real-world applications of multi-label classifiers (as in machine learning in general) and is an important ingredient in explainability and interpretability. However, it is not obvious how to provide confidence in the multi-label context and relating to a particular accuracy metric, and nor is it clear how to provide a confidence which correlates well with the expected accuracy, which would be most valuable in real-world decision making. In this article we estimate the expected accuracy as a surrogate for confidence, for a given accuracy metric. We hypothesise that the expected accuracy can be estimated from the multi-label predictive distribution. We examine seven candidate functions for their ability to estimate expected accuracy from the predictive distribution. We found three of these to correlate to expected accuracy and are robust. Further, we determined that each candidate function can be used separately to estimate Hamming similarity, but a combination of the candidates was best for expected Jaccard index and exact match.
    Convolutional Neural Networks combined with Runge-Kutta Methods. (arXiv:1802.08831v7 [cs.CV] UPDATED)
    A convolutional neural network can be constructed using numerical methods for solving dynamical systems, since the forward pass of the network can be regarded as a trajectory of a dynamical system. However, existing models based on numerical solvers cannot avoid the iterations of implicit methods, which makes the models inefficient at inference time. In this paper, we reinterpret the pre-activation Residual Networks (ResNets) and their variants from the dynamical systems view. We consider that the iterations of implicit Runge-Kutta methods are fused into the training of these models. Moreover, we propose a novel approach to constructing network models based on high-order Runge-Kutta methods in order to achieve higher efficiency. Our proposed models are referred to as the Runge-Kutta Convolutional Neural Networks (RKCNNs). The RKCNNs are evaluated on multiple benchmark datasets. The experimental results show that RKCNNs are vastly superior to other dynamical system network models: they achieve higher accuracy with much fewer resources. They also expand the family of network models based on numerical methods for dynamical systems.
    The Role Of Biology In Deep Learning. (arXiv:2209.04425v1 [cs.NE])
    Artificial neural networks took a lot of inspiration from their biological counterparts in becoming our best machine perceptual systems. This work summarizes some of that history and incorporates modern theoretical neuroscience into experiments with artificial neural networks from the field of deep learning. Specifically, iterative magnitude pruning is used to train sparsely connected networks with 33x fewer weights without loss in performance. These are used to test and ultimately reject the hypothesis that weight sparsity alone improves image noise robustness. Recent work mitigated catastrophic forgetting using weight sparsity, activation sparsity, and active dendrite modeling. This paper replicates those findings, and extends the method to train convolutional neural networks on a more challenging continual learning task. The code has been made publicly available.
    Joint Non-parametric Point Process model for Treatments and Outcomes: Counterfactual Time-series Prediction Under Policy Interventions. (arXiv:2209.04142v1 [cs.LG])
    Policy makers need to predict the progression of an outcome before adopting a new treatment policy, which defines when and how a sequence of treatments affecting the outcome occurs in continuous time. Commonly, algorithms that predict interventional future outcome trajectories take a fixed sequence of future treatments as input. This either neglects the dependence of future treatments on outcomes preceding them or implicitly assumes the treatment policy is known, and hence excludes scenarios where the policy is unknown or a counterfactual analysis is needed. To handle these limitations, we develop a joint model for treatments and outcomes, which allows for the estimation of treatment policies and effects from sequential treatment--outcome data. It can answer interventional and counterfactual queries about interventions on treatment policies, as we show with real-world data on blood glucose progression and a simulation study building on top of this.
    Q-learning Decision Transformer: Leveraging Dynamic Programming for Conditional Sequence Modelling in Offline RL. (arXiv:2209.03993v1 [cs.LG])
    Recent works have shown that tackling offline reinforcement learning (RL) with a conditional policy produces promising results by converting the RL task to a supervised learning task. Decision Transformer (DT) combines the conditional policy approach and Transformer architecture to show competitive performance against several benchmarks. However, DT lacks stitching ability -- one of the critical abilities for offline RL that learns the optimal policy from sub-optimal trajectories. The issue becomes significant when the offline dataset only contains sub-optimal trajectories. On the other hand, the conventional RL approaches based on Dynamic Programming (such as Q-learning) do not suffer the same issue; however, they suffer from unstable learning behaviours, especially when it employs function approximation in an off-policy learning setting. In this paper, we propose Q-learning Decision Transformer (QDT) that addresses the shortcomings of DT by leveraging the benefit of Dynamic Programming (Q-learning). QDT utilises the Dynamic Programming (Q-learning) results to relabel the return-to-go in the training data. We then train the DT with the relabelled data. Our approach efficiently exploits the benefits of these two approaches and compensates for each other's shortcomings to achieve better performance. We demonstrate the issue of DT and the advantage of QDT in a simple environment. We also evaluate QDT in the more complex D4RL benchmark showing good performance gains.
    Expected Worst Case Regret via Stochastic Sequential Covering. (arXiv:2209.04417v1 [cs.LG])
    We study the problem of sequential prediction and online minimax regret with stochastically generated features under a general loss function. We introduce a notion of expected worst case minimax regret that generalizes and encompasses prior known minimax regrets. For such minimax regrets we establish tight upper bounds via a novel concept of stochastic global sequential covering. We show that for a hypothesis class of VC-dimension $\mathsf{VC}$ and $i.i.d.$ generated features of length $T$, the cardinality of the stochastic global sequential covering can be upper bounded with high probability (whp) by $e^{O(\mathsf{VC} \cdot \log^2 T)}$. We then improve this bound by introducing a new complexity measure called the Star-Littlestone dimension, and show that classes with Star-Littlestone dimension $\mathsf{SL}$ admit a stochastic global sequential covering of order $e^{O(\mathsf{SL} \cdot \log T)}$. We further establish upper bounds for real valued classes with finite fat-shattering numbers. Finally, by applying information-theoretic tools of the fixed design minimax regrets, we provide lower bounds for the expected worst case minimax regret. We demonstrate the effectiveness of our approach by establishing tight bounds on the expected worst case minimax regrets for logarithmic loss and general mixable losses.
    Explanation Method for Anomaly Detection on Mixed Numerical and Categorical Spaces. (arXiv:2209.04173v1 [cs.LG])
    Most proposals in the anomaly detection field focus exclusively on the detection stage, specially in the recent deep learning approaches. While providing highly accurate predictions, these models often lack transparency, acting as "black boxes". This criticism has grown to the point that explanation is now considered very relevant in terms of acceptability and reliability. In this paper, we addressed this issue by inspecting the ADMNC (Anomaly Detection on Mixed Numerical and Categorical Spaces) model, an existing very accurate although opaque anomaly detector capable to operate with both numerical and categorical inputs. This work presents the extension EADMNC (Explainable Anomaly Detection on Mixed Numerical and Categorical spaces), which adds explainability to the predictions obtained with the original model. We preserved the scalability of the original method thanks to the Apache Spark framework. EADMNC leverages the formulation of the previous ADMNC model to offer pre hoc and post hoc explainability, while maintaining the accuracy of the original architecture. We present a pre hoc model that globally explains the outputs by segmenting input data into homogeneous groups, described with only a few variables. We designed a graphical representation based on regression trees, which supervisors can inspect to understand the differences between normal and anomalous data. Our post hoc explanations consist of a text-based template method that locally provides textual arguments supporting each detection. We report experimental results on extensive real-world data, particularly in the domain of network intrusion detection. The usefulness of the explanations is assessed by theory analysis using expert knowledge in the network intrusion domain.
    Fast and Accurate Importance Weighting for Correcting Sample Bias. (arXiv:2209.04215v1 [cs.LG])
    Bias in datasets can be very detrimental for appropriate statistical estimation. In response to this problem, importance weighting methods have been developed to match any biased distribution to its corresponding target unbiased distribution. The seminal Kernel Mean Matching (KMM) method is, nowadays, still considered as state of the art in this research field. However, one of the main drawbacks of this method is the computational burden for large datasets. Building on previous works by Huang et al. (2007) and de Mathelin et al. (2021), we derive a novel importance weighting algorithm which scales to large datasets by using a neural network to predict the instance weights. We show, on multiple public datasets, under various sample biases, that our proposed approach drastically reduces the computational time on large dataset while maintaining similar sample bias correction performance compared to other importance weighting methods. The proposed approach appears to be the only one able to give relevant reweighting in a reasonable time for large dataset with up to two million data.
    Assessing Lower Limb Strength using Internet-of-Things Enabled Chair and Processing of Time-Series Data in Google GPU Tensorflow CoLab. (arXiv:2209.04042v1 [cs.LG])
    This project describes the application of the technologies of Machine Learning and Internet-of-Things to assess the lower limb strength of individuals undergoing rehabilitation or therapy. Specifically, it seeks to measure and assess the progress of individuals by sensors attached to chairs and processing the data through Google GPU Tensorflow CoLab. Pressure sensors are attached to various locations on a chair, including but not limited to the seating area, backrest, hand rests, and legs. Sensor data from the individual performing both sit-to-stand transition and stand-to-sit transition provides a time series dataset regarding the pressure distribution and vibratory motion on the chair. The dataset and timing information can then be fed into a machine learning model to estimate the relative strength and weakness during various phases of the movement.
    Adversarial Examples in Constrained Domains. (arXiv:2011.01183v3 [cs.CR] UPDATED)
    Machine learning algorithms have been shown to be vulnerable to adversarial manipulation through systematic modification of inputs (e.g., adversarial examples) in domains such as image recognition. Under the default threat model, the adversary exploits the unconstrained nature of images; each feature (pixel) is fully under control of the adversary. However, it is not clear how these attacks translate to constrained domains that limit which and how features can be modified by the adversary (e.g., network intrusion detection). In this paper, we explore whether constrained domains are less vulnerable than unconstrained domains to adversarial example generation algorithms. We create an algorithm for generating adversarial sketches: targeted universal perturbation vectors which encode feature saliency within the envelope of domain constraints. To assess how these algorithms perform, we evaluate them in constrained (e.g., network intrusion detection) and unconstrained (e.g., image recognition) domains. The results demonstrate that our approaches generate misclassification rates in constrained domains that were comparable to those of unconstrained domains (greater than 95%). Our investigation shows that the narrow attack surface exposed by constrained domains is still sufficiently large to craft successful adversarial examples; and thus, constraints do not appear to make a domain robust. Indeed, with as little as five randomly selected features, one can still generate adversarial examples.
    Multi-objective hyperparameter optimization with performance uncertainty. (arXiv:2209.04340v1 [cs.LG])
    The performance of any Machine Learning (ML) algorithm is impacted by the choice of its hyperparameters. As training and evaluating a ML algorithm is usually expensive, the hyperparameter optimization (HPO) method needs to be computationally efficient to be useful in practice. Most of the existing approaches on multi-objective HPO use evolutionary strategies and metamodel-based optimization. However, few methods have been developed to account for uncertainty in the performance measurements. This paper presents results on multi-objective hyperparameter optimization with uncertainty on the evaluation of ML algorithms. We combine the sampling strategy of Tree-structured Parzen Estimators (TPE) with the metamodel obtained after training a Gaussian Process Regression (GPR) with heterogeneous noise. Experimental results on three analytical test functions and three ML problems show the improvement over multi-objective TPE and GPR, achieved with respect to the hypervolume indicator.  ( 2 min )
    ExpFinder: An Ensemble Expert Finding Model Integrating $N$-gram Vector Space Model and $\mu$CO-HITS. (arXiv:2101.06821v2 [cs.IR] UPDATED)
    Finding an expert plays a crucial role in driving successful collaborations and speeding up high-quality research development and innovations. However, the rapid growth of scientific publications and digital expertise data makes identifying the right experts a challenging problem. Existing approaches for finding experts given a topic can be categorised into information retrieval techniques based on vector space models, document language models, and graph-based models. In this paper, we propose $\textit{ExpFinder}$, a new ensemble model for expert finding, that integrates a novel $N$-gram vector space model, denoted as $n$VSM, and a graph-based model, denoted as $\textit{$\mu$CO-HITS}$, that is a proposed variation of the CO-HITS algorithm. The key of $n$VSM is to exploit recent inverse document frequency weighting method for $N$-gram words and $\textit{ExpFinder}$ incorporates $n$VSM into $\textit{$\mu$CO-HITS}$ to achieve expert finding. We comprehensively evaluate $\textit{ExpFinder}$ on four different datasets from the academic domains in comparison with six different expert finding models. The evaluation results show that $\textit{ExpFinder}$ is a highly effective model for expert finding, substantially outperforming all the compared models in 19% to 160.2%.  ( 3 min )
    Gaussian Process Koopman Mode Decomposition. (arXiv:2209.04111v1 [stat.ML])
    In this paper, we propose a nonlinear probabilistic generative model of Koopman mode decomposition based on an unsupervised Gaussian process. Existing data-driven methods for Koopman mode decomposition have focused on estimating the quantities specified by Koopman mode decomposition, namely, eigenvalues, eigenfunctions, and modes. Our model enables the simultaneous estimation of these quantities and latent variables governed by an unknown dynamical system. Furthermore, we introduce an efficient strategy to estimate the parameters of our model by low-rank approximations of covariance matrices. Applying the proposed model to both synthetic data and a real-world epidemiological dataset, we show that various analyses are available using the estimated parameters.  ( 2 min )
    Studying Drowsiness Detection Performance while Driving through Scalable Machine Learning Models using Electroencephalography. (arXiv:2209.04048v1 [eess.SP])
    Drowsiness is a major concern for drivers and one of the leading causes of traffic accidents. Advances in Cognitive Neuroscience and Computer Science have enabled the detection of drivers' drowsiness by using Brain-Computer Interfaces (BCIs) and Machine Learning (ML). Nevertheless, several challenges remain open and should be faced. First, a comprehensive enough evaluation of drowsiness detection performance using a heterogeneous set of ML algorithms is missing in the literature. Last, it is needed to study the detection performance of scalable ML models suitable for groups of subjects and compare it with the individual models proposed in the literature. To improve these limitations, this work presents an intelligent framework that employs BCIs and features based on electroencephalography (EEG) for detecting drowsiness in driving scenarios. The SEED-VIG dataset is used to feed different ML regressors and three-class classifiers and then evaluate, analyze, and compare the best-performing models for individual subjects and groups of them. More in detail, regarding individual models, Random Forest (RF) obtained a 78% f1-score, improving the 58% obtained by models used in the literature such as Support Vector Machine (SVM). Concerning scalable models, RF reached a 79% f1-score, demonstrating the effectiveness of these approaches. The lessons learned can be summarized as follows: i) not only SVM but also other models not sufficiently explored in the literature are relevant for drowsiness detection, and ii) scalable approaches suitable for groups of subjects are effective to detect drowsiness, even when new subjects that are not included in the models training are evaluated.  ( 3 min )
    Efficient Multi-view Clustering via Unified and Discrete Bipartite Graph Learning. (arXiv:2209.04187v1 [cs.LG])
    Although previous graph-based multi-view clustering algorithms have gained significant progress, most of them are still faced with three limitations. First, they often suffer from high computational complexity, which restricts their applications in large-scale scenarios. Second, they usually perform graph learning either at the single-view level or at the view-consensus level, but often neglect the possibility of the joint learning of single-view and consensus graphs. Third, many of them rely on the $k$-means for discretization of the spectral embeddings, which lack the ability to directly learn the graph with discrete cluster structure. In light of this, this paper presents an efficient multi-view clustering approach via unified and discrete bipartite graph learning (UDBGL). Specifically, the anchor-based subspace learning is incorporated to learn the view-specific bipartite graphs from multiple views, upon which the bipartite graph fusion is leveraged to learn a view-consensus bipartite graph with adaptive weight learning. Further, the Laplacian rank constraint is imposed to ensure that the fused bipartite graph has discrete cluster structures (with a specific number of connected components). By simultaneously formulating the view-specific bipartite graph learning, the view-consensus bipartite graph learning, and the discrete cluster structure learning into a unified objective function, an efficient minimization algorithm is then designed to tackle this optimization problem and directly achieve a discrete clustering solution without requiring additional partitioning, which notably has linear time complexity in data size. Experiments on a variety of multi-view datasets demonstrate the robustness and efficiency of our UDBGL approach.  ( 3 min )
    MATT: A Multiple-instance Attention Mechanism for Long-tail Music Genre Classification. (arXiv:2209.04109v1 [cs.SD])
    Imbalanced music genre classification is a crucial task in the Music Information Retrieval (MIR) field for identifying the long-tail, data-poor genre based on the related music audio segments, which is very prevalent in real-world scenarios. Most of the existing models are designed for class-balanced music datasets, resulting in poor performance in accuracy and generalization when identifying the music genres at the tail of the distribution. Inspired by the success of introducing Multi-instance Learning (MIL) in various classification tasks, we propose a novel mechanism named Multi-instance Attention (MATT) to boost the performance for identifying tail classes. Specifically, we first construct the bag-level datasets by generating the album-artist pair bags. Second, we leverage neural networks to encode the music audio segments. Finally, under the guidance of a multi-instance attention mechanism, the neural network-based models could select the most informative genre to match the given music segment. Comprehensive experimental results on a large-scale music genre benchmark dataset with long-tail distribution demonstrate MATT significantly outperforms other state-of-the-art baselines.  ( 2 min )
    Self-supervised Learning for Heterogeneous Graph via Structure Information based on Metapath. (arXiv:2209.04218v1 [cs.LG])
    graph neural networks (GNNs) are the dominant paradigm for modeling and handling graph structure data by learning universal node representation. The traditional way of training GNNs depends on a great many labeled data, which results in high requirements on cost and time. In some special scene, it is even unavailable and impracticable. Self-supervised representation learning, which can generate labels by graph structure data itself, is a potential approach to tackle this problem. And turning to research on self-supervised learning problem for heterogeneous graphs is more challenging than dealing with homogeneous graphs, also there are fewer studies about it. In this paper, we propose a SElfsupervised learning method for heterogeneous graph via Structure Information based on Metapath (SESIM). The proposed model can construct pretext tasks by predicting jump number between nodes in each metapath to improve the representation ability of primary task. In order to predict jump number, SESIM uses data itself to generate labels, avoiding time-consuming manual labeling. Moreover, predicting jump number in each metapath can effectively utilize graph structure information, which is the essential property between nodes. Therefore, SESIM deepens the understanding of models for graph structure. At last, we train primary task and pretext tasks jointly, and use meta-learning to balance the contribution of pretext tasks for primary task. Empirical results validate the performance of SESIM method and demonstrate that this method can improve the representation ability of traditional neural networks on link prediction task and node classification task.  ( 3 min )
    $\Delta$-PINNs: physics-informed neural networks on complex geometries. (arXiv:2209.03984v1 [cs.LG])
    Physics-informed neural networks (PINNs) have demonstrated promise in solving forward and inverse problems involving partial differential equations. Despite recent progress on expanding the class of problems that can be tackled by PINNs, most of existing use-cases involve simple geometric domains. To date, there is no clear way to inform PINNs about the topology of the domain where the problem is being solved. In this work, we propose a novel positional encoding mechanism for PINNs based on the eigenfunctions of the Laplace-Beltrami operator. This technique allows to create an input space for the neural network that represents the geometry of a given object. We approximate the eigenfunctions as well as the operators involved in the partial differential equations with finite elements. We extensively test and compare the proposed methodology against traditional PINNs in complex shapes, such as a coil, a heat sink and a bunny, with different physics, such as the Eikonal equation and heat transfer. We also study the sensitivity of our method to the number of eigenfunctions used, as well as the discretization used for the eigenfunctions and the underlying operators. Our results show excellent agreement with the ground truth data in cases where traditional PINNs fail to produce a meaningful solution. We envision this new technique will expand the effectiveness of PINNs to more realistic applications.  ( 3 min )
    Stochastic Compositional Optimization with Compositional Constraints. (arXiv:2209.04086v1 [math.OC])
    Stochastic compositional optimization (SCO) has attracted considerable attention because of its broad applicability to important real-world problems. However, existing works on SCO assume that the projection within a solution update is simple, which fails to hold for problem instances where the constraints are in the form of expectations, such as empirical conditional value-at-risk constraints. We study a novel model that incorporates single-level expected value and two-level compositional constraints into the current SCO framework. Our model can be applied widely to data-driven optimization and risk management, including risk-averse optimization and high-moment portfolio selection, and can handle multiple constraints. We further propose a class of primal-dual algorithms that generates sequences converging to the optimal solution at the rate of $\cO(\frac{1}{\sqrt{N}})$under both single-level expected value and two-level compositional constraints, where $N$ is the iteration counter, establishing the benchmarks in expected value constrained SCO.  ( 2 min )
    From Shapley Values to Generalized Additive Models and back. (arXiv:2209.04012v1 [cs.LG])
    In explainable machine learning, local post-hoc explanation algorithms and inherently interpretable models are often seen as competing approaches. In this work, offer a novel perspective on Shapley Values, a prominent post-hoc explanation technique, and show that it is strongly connected with Glassbox-GAMs, a popular class of interpretable models. We introduce $n$-Shapley Values, a natural extension of Shapley Values that explain individual predictions with interaction terms up to order $n$. As $n$ increases, the $n$-Shapley Values converge towards the Shapley-GAM, a uniquely determined decomposition of the original function. From the Shapley-GAM, we can compute Shapley Values of arbitrary order, which gives precise insights into the limitations of these explanations. We then show that Shapley Values recover generalized additive models of order $n$, assuming that we allow for interaction terms up to order $n$ in the explanations. This implies that the original Shapley Values recover Glassbox-GAMs. At the technical end, we show that there is a one-to-one correspondence between different ways to choose the value function and different functional decompositions of the original function. This provides a novel perspective on the question of how to choose the value function. We also present an empirical analysis of the degree of variable interaction that is present in various standard classifiers, and discuss the implications of our results for algorithmic explanations. A python package to compute $n$-Shapley Values and replicate the results in this paper is available at \url{https://github.com/tml-tuebingen/nshap}.  ( 3 min )
    Selecting Related Knowledge via Efficient Channel Attention for Online Continual Learning. (arXiv:2209.04212v1 [cs.CV])
    Continual learning aims to learn a sequence of tasks by leveraging the knowledge acquired in the past in an online-learning manner while being able to perform well on all previous tasks, this ability is crucial to the artificial intelligence (AI) system, hence continual learning is more suitable for most real-word and complex applicative scenarios compared to the traditional learning pattern. However, the current models usually learn a generic representation base on the class label on each task and an effective strategy is selected to avoid catastrophic forgetting. We postulate that selecting the related and useful parts only from the knowledge obtained to perform each task is more effective than utilizing the whole knowledge. Based on this fact, in this paper we propose a new framework, named Selecting Related Knowledge for Online Continual Learning (SRKOCL), which incorporates an additional efficient channel attention mechanism to pick the particular related knowledge for every task. Our model also combines experience replay and knowledge distillation to circumvent the catastrophic forgetting. Finally, extensive experiments are conducted on different benchmarks and the competitive experimental results demonstrate that our proposed SRKOCL is a promised approach against the state-of-the-art.  ( 2 min )
    Active Learning of Classifiers with Label and Seed Queries. (arXiv:2209.03996v1 [cs.LG])
    We study exact active learning of binary and multiclass classifiers with margin. Given an $n$-point set $X \subset \mathbb{R}^m$, we want to learn any unknown classifier on $X$ whose classes have finite strong convex hull margin, a new notion extending the SVM margin. In the standard active learning setting, where only label queries are allowed, learning a classifier with strong convex hull margin $\gamma$ requires in the worst case $\Omega\big(1+\frac{1}{\gamma}\big)^{(m-1)/2}$ queries. On the other hand, using the more powerful seed queries (a variant of equivalence queries), the target classifier could be learned in $O(m \log n)$ queries via Littlestone's Halving algorithm; however, Halving is computationally inefficient. In this work we show that, by carefully combining the two types of queries, a binary classifier can be learned in time $\operatorname{poly}(n+m)$ using only $O(m^2 \log n)$ label queries and $O\big(m \log \frac{m}{\gamma}\big)$ seed queries; the result extends to $k$-class classifiers at the price of a $k!k^2$ multiplicative overhead. Similar results hold when the input points have bounded bit complexity, or when only one class has strong convex hull margin against the rest. We complement the upper bounds by showing that in the worst case any algorithm needs $\Omega\big(k m \log \frac{1}{\gamma}\big)$ seed and label queries to learn a $k$-class classifier with strong convex hull margin $\gamma$.  ( 3 min )
    Autoencoder Based Iterative Modeling and Multivariate Time-Series Subsequence Clustering Algorithm. (arXiv:2209.04213v1 [eess.SP])
    This paper introduces an algorithm for the detection of change-points and the identification of the corresponding subsequences in transient multivariate time-series data (MTSD). The analysis of such data has become more and more important due to the increase of availability in many industrial fields. Labeling, sorting or filtering highly transient measurement data for training condition based maintenance (CbM) models is cumbersome and error-prone. For some applications it can be sufficient to filter measurements by simple thresholds or finding change-points based on changes in mean value and variation. But a robust diagnosis of a component within a component group for example, which has a complex non-linear correlation between multiple sensor values, a simple approach would not be feasible. No meaningful and coherent measurement data which could be used for training a CbM model would emerge. Therefore, we introduce an algorithm which uses a recurrent neural network (RNN) based Autoencoder (AE) which is iteratively trained on incoming data. The scoring function uses the reconstruction error and latent space information. A model of the identified subsequence is saved and used for recognition of repeating subsequences as well as fast offline clustering. For evaluation, we propose a new similarity measure based on the curvature for a more intuitive time-series subsequence clustering metric. A comparison with seven other state-of-the-art algorithms and eight datasets shows the capability and the increased performance of our algorithm to cluster MTSD online and offline in conjunction with mechatronic systems.  ( 3 min )
    Modelling Patient Trajectories Using Multimodal Information. (arXiv:2209.04224v1 [cs.LG])
    Electronic Health Records (EHRs) aggregate diverse information at the patient level, holding a trajectory representative of the evolution of the patient health status throughout time. Although this information provides context and can be leveraged by physicians to monitor patient health and make more accurate prognoses/diagnoses, patient records can contain information from very long time spans, which combined with the rapid generation rate of medical data makes clinical decision making more complex. Patient trajectory modelling can assist by exploring existing information in a scalable manner, and can contribute in augmenting health care quality by fostering preventive medicine practices. We propose a solution to model patient trajectories that combines different types of information and considers the temporal aspect of clinical data. This solution leverages two different architectures: one supporting flexible sets of input features, to convert patient admissions into dense representations; and a second exploring extracted admission representations in a recurrent-based architecture, where patient trajectories are processed in sub-sequences using a sliding window mechanism. The developed solution was evaluated on two different clinical outcomes, unexpected patient readmission and disease progression, using the publicly available MIMIC-III clinical database. The results obtained demonstrate the potential of the first architecture to model readmission and diagnoses prediction using single patient admissions. While information from clinical text did not show the discriminative power observed in other existing works, this may be explained by the need to fine-tune the clinicalBERT model. Finally, we demonstrate the potential of the sequence-based architecture using a sliding window mechanism to represent the input data, attaining comparable performances to other existing solutions.  ( 3 min )
    Bridging the Gap: Differentially Private Equivariant Deep Learning for Medical Image Analysis. (arXiv:2209.04338v1 [eess.IV])
    Machine learning with formal privacy-preserving techniques like Differential Privacy (DP) allows one to derive valuable insights from sensitive medical imaging data while promising to protect patient privacy, but it usually comes at a sharp privacy-utility trade-off. In this work, we propose to use steerable equivariant convolutional networks for medical image analysis with DP. Their improved feature quality and parameter efficiency yield remarkable accuracy gains, narrowing the privacy-utility gap.  ( 2 min )
    FLInt: Exploiting Floating Point Enabled Integer Arithmetic for Efficient Random Forest Inference. (arXiv:2209.04181v1 [cs.LG])
    In many machine learning applications, e.g., tree-based ensembles, floating point numbers are extensively utilized due to their expressiveness. Nowadays performing data analysis on embedded devices from dynamic data masses becomes available, but such systems often lack hardware capabilities to process floating point numbers, introducing large overheads for their processing. Even if such hardware is present in general computing systems, using integer operations instead of floating point operations promises to reduce operation overheads and improve the performance. In this paper, we provide \mdname, a full precision floating point comparison for random forests, by only using integer and logic operations. To ensure the same functionality preserves, we formally prove the correctness of this comparison. Since random forests only require comparison of floating point numbers during inference, we implement \mdname~in low level realizations and therefore eliminate the need for floating point hardware entirely, by keeping the model accuracy unchanged. The usage of \mdname~basically boils down to a one-by-one replacement of conditions: For instance, a comparison statement in C: if(pX[3]<=(float)10.074347) becomes if((*(((int*)(pX))+3))<=((int)(0x41213087))). Experimental evaluation on X86 and ARMv8 desktop and server class systems shows that the execution time can be reduced by up to $\approx 30\%$ with our novel approach.  ( 2 min )
    SPT-NRTL: A physics-guided machine learning model to predict thermodynamically consistent activity coefficients. (arXiv:2209.04135v1 [physics.chem-ph])
    The availability of property data is one of the major bottlenecks in the development of chemical processes, often requiring time-consuming and expensive experiments or limiting the design space to a small number of known molecules. This bottleneck has been the motivation behind the continuing development of predictive property models. For the property prediction of novel molecules, group contribution methods have been groundbreaking. In recent times, machine learning has joined the more established property prediction models. However, even with recent successes, the integration of physical constraints into machine learning models remains challenging. Physical constraints are vital to many thermodynamic properties, such as the Gibbs-Dunham relation, introducing an additional layer of complexity into the prediction. Here, we introduce SPT-NRTL, a machine learning model to predict thermodynamically consistent activity coefficients and provide NRTL parameters for easy use in process simulations. The results show that SPT-NRTL achieves higher accuracy than UNIFAC in the prediction of activity coefficients across all functional groups and is able to predict many vapor-liquid-equilibria with near experimental accuracy, as illustrated for the exemplary mixtures water/ethanol and chloroform/n-hexane. To ease the application of SPT-NRTL, NRTL-parameters of 100 000 000 mixtures are calculated with SPT-NRTL and provided online.  ( 3 min )
    Differential Privacy Dynamics of Langevin Diffusion and Noisy Gradient Descent. (arXiv:2102.05855v5 [stat.ML] UPDATED)
    What is the information leakage of an iterative randomized learning algorithm about its training data, when the internal state of the algorithm is \emph{private}? How much is the contribution of each specific training epoch to the information leakage through the released model? We study this problem for noisy gradient descent algorithms, and model the \emph{dynamics} of R\'enyi differential privacy loss throughout the training process. Our analysis traces a provably \emph{tight} bound on the R\'enyi divergence between the pair of probability distributions over parameters of models trained on neighboring datasets. We prove that the privacy loss converges exponentially fast, for smooth and strongly convex loss functions, which is a significant improvement over composition theorems (which over-estimate the privacy loss by upper-bounding its total value over all intermediate gradient computations). For Lipschitz, smooth, and strongly convex loss functions, we prove optimal utility with a small gradient complexity for noisy gradient descent algorithms.  ( 2 min )
    Dr. Neurosymbolic, or: How I Learned to Stop Worrying and Accept Statistics. (arXiv:2209.04049v1 [cs.AI])
    The symbolic AI community is increasingly trying to embrace machine learning in neuro-symbolic architectures, yet is still struggling due to cultural barriers. To break the barrier, this rather opinionated personal memo attempts to explain and rectify the conventions in Statistics, Machine Learning, and Deep Learning from the viewpoint of outsiders. It provides a step-by-step protocol for designing a machine learning system that satisfies a minimum theoretical guarantee necessary for being taken seriously by the symbolic AI community, i.e., it discusses "in what condition we can stop worrying and accept statistical machine learning." Some highlights: Most textbooks are written for those who plan to specialize in Stat/ML/DL and are supposed to accept jargons. This memo is for experienced symbolic researchers that hear a lot of buzz but are still uncertain and skeptical. Information on Stat/ML/DL is currently too scattered or too noisy to invest in. This memo prioritizes compactness and pays special attention to concepts that resonate well with symbolic paradigms. I hope this memo offers time savings. It prioritizes general mathematical modeling and does not discuss any specific function approximator, such as neural networks (NNs), SVMs, decision trees, etc. It is open to corrections. Consider this memo as something similar to a blog post taking the form of a paper on Arxiv.  ( 3 min )
    Online Low Rank Matrix Completion. (arXiv:2209.03997v1 [cs.LG])
    We study the problem of \textit{online} low-rank matrix completion with $\mathsf{M}$ users, $\mathsf{N}$ items and $\mathsf{T}$ rounds. In each round, we recommend one item per user. For each recommendation, we obtain a (noisy) reward sampled from a low-rank user-item reward matrix. The goal is to design an online method with sub-linear regret (in $\mathsf{T}$). While the problem can be mapped to the standard multi-armed bandit problem where each item is an \textit{independent} arm, it leads to poor regret as the correlation between arms and users is not exploited. In contrast, exploiting the low-rank structure of reward matrix is challenging due to non-convexity of low-rank manifold. We overcome this challenge using an explore-then-commit (ETC) approach that ensures a regret of $O(\mathsf{polylog} (\mathsf{M}+\mathsf{N}) \mathsf{T}^{2/3})$. That is, roughly only $\mathsf{polylog} (\mathsf{M}+\mathsf{N})$ item recommendations are required per user to get non-trivial solution. We further improve our result for the rank-$1$ setting. Here, we propose a novel algorithm OCTAL (Online Collaborative filTering using iterAtive user cLustering) that ensures nearly optimal regret bound of $O(\mathsf{polylog} (\mathsf{M}+\mathsf{N}) \mathsf{T}^{1/2})$. Our algorithm uses a novel technique of clustering users and eliminating items jointly and iteratively, which allows us to obtain nearly minimax optimal rate in $\mathsf{T}$.  ( 2 min )
  • Open

    Learning Branching Heuristics for Propositional Model Counting. (arXiv:2007.03204v2 [cs.LG] UPDATED)
    Propositional model counting, or #SAT, is the problem of computing the number of satisfying assignments of a Boolean formula. Many problems from different application areas, including many discrete probabilistic inference problems, can be translated into model counting problems to be solved by #SAT solvers. Exact #SAT solvers, however, are often not scalable to industrial size instances. In this paper, we present Neuro#, an approach for learning branching heuristics to improve the performance of exact #SAT solvers on instances from a given family of problems. We experimentally show that our method reduces the step count on similarly distributed held-out instances and generalizes to much larger instances from the same problem family. It is able to achieve these results on a number of different problem families having very different structures. In addition to step count improvements, Neuro# can also achieve orders of magnitude wall-clock speedups over the vanilla solver on larger instances in some problem families, despite the runtime overhead of querying the model.  ( 3 min )
    Optimal Learning Rates for Regularized Least-Squares with a Fourier Capacity Condition. (arXiv:2204.07856v2 [math.ST] UPDATED)
    We derive minimax adaptive rates for a new, broad class of Tikhonov-regularized learning problems in Hilbert scales under general source conditions. Our analysis does not require the regression function to be contained in the hypothesis class, and most notably does not employ the conventional \textit{a priori} assumptions on kernel eigendecay. Using the theory of interpolation, we demonstrate that the spectrum of the Mercer operator can be inferred in the presence of "tight'' $L^{\infty}$ embeddings of suitable Hilbert scales. Our analysis utilizes a new Fourier capacity condition, characterizes the optimal Lorentz range space of a modified Mercer operator in certain parameter regimes.  ( 2 min )
    Random Vector Functional Link Networks for Function Approximation on Manifolds. (arXiv:2007.15776v2 [stat.ML] UPDATED)
    The learning speed of feed-forward neural networks is notoriously slow and has presented a bottleneck in deep learning applications for several decades. For instance, gradient-based learning algorithms, which are used extensively to train neural networks, tend to work slowly when all of the network parameters must be iteratively tuned. To counter this, both researchers and practitioners have tried introducing randomness to reduce the learning requirement. Based on the original construction of Igelnik and Pao, single layer neural-networks with random input-to-hidden layer weights and biases have seen success in practice, but the necessary theoretical justification is lacking. In this paper, we begin to fill this theoretical gap. We provide a (corrected) rigorous proof that the Igelnik and Pao construction is a universal approximator for continuous functions on compact domains, with approximation error decaying asymptotically like $O(1/\sqrt{n})$ for the number $n$ of network nodes. We then extend this result to the non-asymptotic setting, proving that one can achieve any desired approximation error with high probability provided $n$ is sufficiently large. We further adapt this randomized neural network architecture to approximate functions on smooth, compact submanifolds of Euclidean space, providing theoretical guarantees in both the asymptotic and non-asymptotic forms. Finally, we illustrate our results on manifolds with numerical experiments.  ( 3 min )
    Estimating Heterogeneous Bounds for Treatment Effects under Sample Selection and Non-response. (arXiv:2209.04329v1 [econ.EM])
    In this paper we propose a method for nonparametric estimation and inference for heterogeneous bounds for causal effect parameters in general sample selection models where the initial treatment can affect whether a post-intervention outcome is observed or not. Treatment selection can be confounded by observable covariates while the outcome selection can be confounded by both observables and unobservables. The method provides conditional effect bounds as functions of policy relevant pre-treatment variables. It allows for conducting valid statistical inference on the unidentified conditional effect curves. We use a flexible semiparametric de-biased machine learning approach that can accommodate flexible functional forms and high-dimensional confounding variables between treatment, selection, and outcome processes. Easily verifiable high-level conditions for estimation and misspecification robust inference guarantees are provided as well.  ( 2 min )
    Review of Pedestrian Trajectory Prediction Methods: Comparing Deep Learning and Knowledge-based Approaches. (arXiv:2111.06740v2 [cs.LG] UPDATED)
    In crowd scenarios, predicting trajectories of pedestrians is a complex and challenging task depending on many external factors. The topology of the scene and the interactions between the pedestrians are just some of them. Due to advancements in data-science and data collection technologies deep learning methods have recently become a research hotspot in numerous domains. Therefore, it is not surprising that more and more researchers apply these methods to predict trajectories of pedestrians. This paper compares these relatively new deep learning algorithms with classical knowledge-based models that are widely used to simulate pedestrian dynamics. It provides a comprehensive literature review of both approaches, explores technical and application oriented differences, and addresses open questions as well as future development directions. Our investigations point out that the pertinence of knowledge-based models to predict local trajectories is nowadays questionable because of the high accuracy of the deep learning algorithms. Nevertheless, the ability of deep-learning algorithms for large-scale simulation and the description of collective dynamics remains to be demonstrated. Furthermore, the comparison shows that the combination of both approaches (the hybrid approach) seems to be promising to overcome disadvantages like the missing explainability of the deep learning approach.  ( 3 min )
    Majority Vote for Distributed Differentially Private Sign Selection. (arXiv:2209.04419v1 [cs.CR])
    Privacy-preserving data analysis has become prevailing in recent years. In this paper, we propose a distributed group differentially private majority vote mechanism for the sign selection problem in a distributed setup. To achieve this, we apply the iterative peeling to the stability function and use the exponential mechanism to recover the signs. As applications, we study the private sign selection for mean estimation and linear regression problems in distributed systems. Our method recovers the support and signs with the optimal signal-to-noise ratio as in the non-private scenario, which is better than contemporary works of private variable selections. Moreover, the sign selection consistency is justified with theoretical guarantees. Simulation studies are conducted to demonstrate the effectiveness of our proposed method.  ( 2 min )
    The Distributed Discrete Gaussian Mechanism for Federated Learning with Secure Aggregation. (arXiv:2102.06387v4 [cs.LG] UPDATED)
    We consider training models on private data that are distributed across user devices. To ensure privacy, we add on-device noise and use secure aggregation so that only the noisy sum is revealed to the server. We present a comprehensive end-to-end system, which appropriately discretizes the data and adds discrete Gaussian noise before performing secure aggregation. We provide a novel privacy analysis for sums of discrete Gaussians and carefully analyze the effects of data quantization and modular summation arithmetic. Our theoretical guarantees highlight the complex tension between communication, privacy, and accuracy. Our extensive experimental results demonstrate that our solution is essentially able to match the accuracy to central differential privacy with less than 16 bits of precision per value.  ( 2 min )
    A PDE-Based Analysis of the Symmetric Two-Armed Bernoulli Bandit. (arXiv:2202.05767v2 [cs.LG] UPDATED)
    This work addresses a version of the two-armed Bernoulli bandit problem where the sum of the means of the arms is one (the symmetric two-armed Bernoulli bandit). In a regime where the gap between these means goes to zero and the number of prediction periods approaches infinity, we obtain the leading order terms of the expected regret and pseudoregret for this problem by associating each of them with a solution of a linear parabolic partial differential equation. Our results improve upon the previously known results; specifically, we explicitly compute the leading order term of the optimal regret and pseudoregret in three different scaling regimes for the gap. Additionally, we obtain new non-asymptotic bounds for any given time horizon.  ( 2 min )
    Differential Privacy Dynamics of Langevin Diffusion and Noisy Gradient Descent. (arXiv:2102.05855v5 [stat.ML] UPDATED)
    What is the information leakage of an iterative randomized learning algorithm about its training data, when the internal state of the algorithm is \emph{private}? How much is the contribution of each specific training epoch to the information leakage through the released model? We study this problem for noisy gradient descent algorithms, and model the \emph{dynamics} of R\'enyi differential privacy loss throughout the training process. Our analysis traces a provably \emph{tight} bound on the R\'enyi divergence between the pair of probability distributions over parameters of models trained on neighboring datasets. We prove that the privacy loss converges exponentially fast, for smooth and strongly convex loss functions, which is a significant improvement over composition theorems (which over-estimate the privacy loss by upper-bounding its total value over all intermediate gradient computations). For Lipschitz, smooth, and strongly convex loss functions, we prove optimal utility with a small gradient complexity for noisy gradient descent algorithms.  ( 2 min )
    MICO: Selective Search with Mutual Information Co-training. (arXiv:2209.04378v1 [cs.IR])
    In contrast to traditional exhaustive search, selective search first clusters documents into several groups before all the documents are searched exhaustively by a query, to limit the search executed within one group or only a few groups. Selective search is designed to reduce the latency and computation in modern large-scale search systems. In this study, we propose MICO, a Mutual Information CO-training framework for selective search with minimal supervision using the search logs. After training, MICO does not only cluster the documents, but also routes unseen queries to the relevant clusters for efficient retrieval. In our empirical experiments, MICO significantly improves the performance on multiple metrics of selective search and outperforms a number of existing competitive baselines.  ( 2 min )
    Solving Multistage Stochastic Linear Programming via Regularized Linear Decision Rules: An Application to Hydrothermal Dispatch Planning. (arXiv:2110.03146v2 [math.OC] UPDATED)
    The solution of multistage stochastic linear problems (MSLP) represents a challenge for many applications. Long-term hydrothermal dispatch planning (LHDP) materializes this challenge in a real-world problem that affects electricity markets, economies, and natural resources worldwide. No closed-form solutions are available for MSLP and the definition of non-anticipative policies with high-quality out-of-sample performance of is crucial. Linear decision rules (LDR) provide an interesting simulation-based framework for finding high-quality policies to MSLP through two-stage stochastic models. In practical applications, however, the number of parameters to be estimated when using an LDR may be close or higher than the number of scenarios of the sample average approximation problem, thereby generating an in-sample overfit and poor performances in out-of-sample simulations. In this paper, we propose a novel regularized LDR to solve MSLP based on the AdaLASSO (adaptive least absolute shrinkage and selection operator). The goal is to use the parsimony principle as largely studied in high-dimensional linear regression models to obtain better out-of-sample performance for a LDR applied to MSLP. Computational experiments show that the overfit threat is non-negligible when using the classical non-regularized LDR to solve the LHDP, one of the most studied MSLP with relevant applications in industry. Our analysis highlights the following benefits of the proposed framework in comparison to the non-regularized benchmark: 1) significant reductions in the number of non-zero coefficients (model parsimony), 2) substantial cost reductions in out-of-sample evaluations, and 3) improved spot-price profiles.  ( 3 min )
    Exact Recovery in the General Hypergraph Stochastic Block Model. (arXiv:2105.04770v2 [cs.IT] UPDATED)
    This paper investigates fundamental limits of exact recovery in the general d-uniform hypergraph stochastic block model (d-HSBM), wherein n nodes are partitioned into k disjoint communities with relative sizes (p1,..., pk). Each subset of nodes with cardinality d is generated independently as an order-d hyperedge with a certain probability that depends on the ground-truth communities that the d nodes belong to. The goal is to exactly recover the k hidden communities based on the observed hypergraph. We show that there exists a sharp threshold such that exact recovery is achievable above the threshold and impossible below the threshold (apart from a small regime of parameters that will be specified precisely). This threshold is represented in terms of a quantity which we term as the generalized Chernoff-Hellinger divergence between communities. Our result for this general model recovers prior results for the standard SBM and d-HSBM with two symmetric communities as special cases. En route to proving our achievability results, we develop a polynomial-time two-stage algorithm that meets the threshold. The first stage adopts a certain hypergraph spectral clustering method to obtain a coarse estimate of communities, and the second stage refines each node individually via local refinement steps to ensure exact recovery.  ( 3 min )
    Hcore-Init: Neural Network Initialization based on Graph Degeneracy. (arXiv:2004.07636v2 [cs.LG] UPDATED)
    Neural networks are the pinnacle of Artificial Intelligence, as in recent years we witnessed many novel architectures, learning and optimization techniques for deep learning. Capitalizing on the fact that neural networks inherently constitute multipartite graphs among neuron layers, we aim to analyze directly their structure to extract meaningful information that can improve the learning process. To our knowledge graph mining techniques for enhancing learning in neural networks have not been thoroughly investigated. In this paper we propose an adapted version of the k-core structure for the complete weighted multipartite graph extracted from a deep learning architecture. As a multipartite graph is a combination of bipartite graphs, that are in turn the incidence graphs of hypergraphs, we design k-hypercore decomposition, the hypergraph analogue of k-core degeneracy. We applied k-hypercore to several neural network architectures, more specifically to convolutional neural networks and multilayer perceptrons for image recognition tasks after a very short pretraining. Then we used the information provided by the hypercore numbers of the neurons to re-initialize the weights of the neural network, thus biasing the gradient optimization scheme. Extensive experiments proved that k-hypercore outperforms the state-of-the-art initialization methods.  ( 3 min )
    Explainability Is in the Mind of the Beholder: Establishing the Foundations of Explainable Artificial Intelligence. (arXiv:2112.14466v2 [cs.AI] UPDATED)
    Explainable artificial intelligence and interpretable machine learning are research domains growing in importance. Yet, the underlying concepts remain somewhat elusive and lack generally agreed definitions. While recent inspiration from social sciences has refocused the work on needs and expectations of human recipients, the field still misses a concrete conceptualisation. We take steps towards addressing this challenge by reviewing the philosophical and social foundations of human explainability, which we then translate into the technological realm. In particular, we scrutinise the notion of algorithmic black boxes and the spectrum of understanding determined by explanatory processes and explainees' background knowledge. This approach allows us to define explainability as (logical) reasoning applied to transparent insights (into, possibly black-box, predictive systems) interpreted under background knowledge and placed within a specific context -- a process that engenders understanding in a selected group of explainees. We then employ this conceptualisation to revisit strategies for evaluating explainability as well as the much disputed trade-off between transparency and predictive power, including its implications for ante-hoc and post-hoc techniques along with fairness and accountability established by explainability. We furthermore discuss components of the machine learning workflow that may be in need of interpretability, building on a range of ideas from human-centred explainability, with a particular focus on explainees, contrastive statements and explanatory processes. Our discussion reconciles and complements current research to help better navigate open questions -- rather than attempting to address any individual issue -- thus laying a solid foundation for a grounded discussion and future progress of explainable artificial intelligence and interpretable machine learning.  ( 3 min )
    Differentially Private Stochastic Gradient Descent with Low-Noise. (arXiv:2209.04188v1 [stat.ML])
    In this paper, by introducing a low-noise condition, we study privacy and utility (generalization) performances of differentially private stochastic gradient descent (SGD) algorithms in a setting of stochastic convex optimization (SCO) for both pointwise and pairwise learning problems. For pointwise learning, we establish sharper excess risk bounds of order $\mathcal{O}\Big( \frac{\sqrt{d\log(1/\delta)}}{n\epsilon} \Big)$ and $\mathcal{O}\Big( {n^{- \frac{1+\alpha}{2}}}+\frac{\sqrt{d\log(1/\delta)}}{n\epsilon}\Big)$ for the $(\epsilon,\delta)$-differentially private SGD algorithm for strongly smooth and $\alpha$-H\"older smooth losses, respectively, where $n$ is the sample size and $d$ is the dimensionality. For pairwise learning, inspired by \cite{lei2020sharper,lei2021generalization}, we propose a simple private SGD algorithm based on gradient perturbation which satisfies $(\epsilon,\delta)$-differential privacy, and develop novel utility bounds for the proposed algorithm. In particular, we prove that our algorithm can achieve excess risk rates $\mathcal{O}\Big(\frac{1}{\sqrt{n}}+\frac{\sqrt{d\log(1/\delta)}}{n\epsilon}\Big)$ with gradient complexity $\mathcal{O}(n)$ and $\mathcal{O}\big(n^{\frac{2-\alpha}{1+\alpha}}+n\big)$ for strongly smooth and $\alpha$-H\"older smooth losses, respectively. Further, faster learning rates are established in a low-noise setting for both smooth and non-smooth losses. To the best of our knowledge, this is the first utility analysis which provides excess population bounds better than $\mathcal{O}\Big(\frac{1}{\sqrt{n}}+\frac{\sqrt{d\log(1/\delta)}}{n\epsilon}\Big)$ for privacy-preserving pairwise learning.  ( 2 min )
    Knowledge-based Deep Learning for Modeling Chaotic Systems. (arXiv:2209.04259v1 [cs.LG])
    Deep Learning has received increased attention due to its unbeatable success in many fields, such as computer vision, natural language processing, recommendation systems, and most recently in simulating multiphysics problems and predicting nonlinear dynamical systems. However, modeling and forecasting the dynamics of chaotic systems remains an open research problem since training deep learning models requires big data, which is not always available in many cases. Such deep learners can be trained from additional information obtained from simulated results and by enforcing the physical laws of the chaotic systems. This paper considers extreme events and their dynamics and proposes elegant models based on deep neural networks, called knowledge-based deep learning (KDL). Our proposed KDL can learn the complex patterns governing chaotic systems by jointly training on real and simulated data directly from the dynamics and their differential equations. This knowledge is transferred to model and forecast real-world chaotic events exhibiting extreme behavior. We validate the efficiency of our model by assessing it on three real-world benchmark datasets: El Nino sea surface temperature, San Juan Dengue viral infection, and Bj{\o}rn{\o}ya daily precipitation, all governed by extreme events' dynamics. Using prior knowledge of extreme events and physics-based loss functions to lead the neural network learning, we ensure physically consistent, generalizable, and accurate forecasting, even in a small data regime.  ( 3 min )
    clusterBMA: Bayesian model averaging for clustering. (arXiv:2209.04117v1 [stat.ME])
    Various methods have been developed to combine inference across multiple sets of results for unsupervised clustering, within the ensemble and consensus clustering literature. The approach of reporting results from one `best' model out of several candidate clustering models generally ignores the uncertainty that arises from model selection, and results in inferences that are sensitive to the particular model and parameters chosen, and assumptions made, especially with small sample size or small cluster sizes. Bayesian model averaging (BMA) is a popular approach for combining results across multiple models that offers some attractive benefits in this setting, including probabilistic interpretation of the combine cluster structure and quantification of model-based uncertainty. In this work we introduce clusterBMA, a method that enables weighted model averaging across results from multiple unsupervised clustering algorithms. We use a combination of clustering internal validation criteria as a novel approximation of the posterior model probability for weighting the results from each model. From a combined posterior similarity matrix representing a weighted average of the clustering solutions across models, we apply symmetric simplex matrix factorisation to calculate final probabilistic cluster allocations. This method is implemented in an accompanying R package. We explore the performance of this approach through a case study that aims to to identify probabilistic clusters of individuals based on electroencephalography (EEG) data. We also use simulated datasets to explore the ability of the proposed technique to identify robust integrated clusters with varying levels of separations between subgroups, and with varying numbers of clusters between models.  ( 3 min )
    Fast Neural Kernel Embeddings for General Activations. (arXiv:2209.04121v1 [cs.LG])
    Infinite width limit has shed light on generalization and optimization aspects of deep learning by establishing connections between neural networks and kernel methods. Despite their importance, the utility of these kernel methods was limited in large-scale learning settings due to their (super-)quadratic runtime and memory complexities. Moreover, most prior works on neural kernels have focused on the ReLU activation, mainly due to its popularity but also due to the difficulty of computing such kernels for general activations. In this work, we overcome such difficulties by providing methods to work with general activations. First, we compile and expand the list of activation functions admitting exact dual activation expressions to compute neural kernels. When the exact computation is unknown, we present methods to effectively approximate them. We propose a fast sketching method that approximates any multi-layered Neural Network Gaussian Process (NNGP) kernel and Neural Tangent Kernel (NTK) matrices for a wide range of activation functions, going beyond the commonly analyzed ReLU activation. This is done by showing how to approximate the neural kernels using the truncated Hermite expansion of any desired activation functions. While most prior works require data points on the unit sphere, our methods do not suffer from such limitations and are applicable to any dataset of points in $\mathbb{R}^d$. Furthermore, we provide a subspace embedding for NNGP and NTK matrices with near input-sparsity runtime and near-optimal target dimension which applies to any \emph{homogeneous} dual activation functions with rapidly convergent Taylor expansion. Empirically, with respect to exact convolutional NTK (CNTK) computation, our method achieves $106\times$ speedup for approximate CNTK of a 5-layer Myrtle network on CIFAR-10 dataset.  ( 3 min )
    Gaussian Process Koopman Mode Decomposition. (arXiv:2209.04111v1 [stat.ML])
    In this paper, we propose a nonlinear probabilistic generative model of Koopman mode decomposition based on an unsupervised Gaussian process. Existing data-driven methods for Koopman mode decomposition have focused on estimating the quantities specified by Koopman mode decomposition, namely, eigenvalues, eigenfunctions, and modes. Our model enables the simultaneous estimation of these quantities and latent variables governed by an unknown dynamical system. Furthermore, we introduce an efficient strategy to estimate the parameters of our model by low-rank approximations of covariance matrices. Applying the proposed model to both synthetic data and a real-world epidemiological dataset, we show that various analyses are available using the estimated parameters.  ( 2 min )
    Online Low Rank Matrix Completion. (arXiv:2209.03997v1 [cs.LG])
    We study the problem of \textit{online} low-rank matrix completion with $\mathsf{M}$ users, $\mathsf{N}$ items and $\mathsf{T}$ rounds. In each round, we recommend one item per user. For each recommendation, we obtain a (noisy) reward sampled from a low-rank user-item reward matrix. The goal is to design an online method with sub-linear regret (in $\mathsf{T}$). While the problem can be mapped to the standard multi-armed bandit problem where each item is an \textit{independent} arm, it leads to poor regret as the correlation between arms and users is not exploited. In contrast, exploiting the low-rank structure of reward matrix is challenging due to non-convexity of low-rank manifold. We overcome this challenge using an explore-then-commit (ETC) approach that ensures a regret of $O(\mathsf{polylog} (\mathsf{M}+\mathsf{N}) \mathsf{T}^{2/3})$. That is, roughly only $\mathsf{polylog} (\mathsf{M}+\mathsf{N})$ item recommendations are required per user to get non-trivial solution. We further improve our result for the rank-$1$ setting. Here, we propose a novel algorithm OCTAL (Online Collaborative filTering using iterAtive user cLustering) that ensures nearly optimal regret bound of $O(\mathsf{polylog} (\mathsf{M}+\mathsf{N}) \mathsf{T}^{1/2})$. Our algorithm uses a novel technique of clustering users and eliminating items jointly and iteratively, which allows us to obtain nearly minimax optimal rate in $\mathsf{T}$.  ( 2 min )

  • Open

    [D] Thoughts regarding the future monopoly on data
    So right now the way I see it is that a normal user "sells" his data by using services offered by top tier corpos, and legal authorities are enforcing stricter laws to minimize the gathering of private data. So it made me think that the future could be based on selling/renting already trained models to bypass the laws, we already have platforms like huggingface which offer APIs to use models trained by top corpos, what do you think about it? submitted by /u/Snoo67839 [link] [comments]  ( 104 min )
    [D] What would be the hottest ML topic for both industry and academia ?
    I am a Master's student who's about to choose for a thesis topic. I have to make a choice out of 3 topics given: 1- Test case selection and prioritization 2- Adversarial attacks on object detectors in maritime environnements. 3- Extracting Information from document images using explainable ML model I am very torn on what to choose and would love to work on a topic that will shine in industry as well as be an interesting published paper for academia as well. submitted by /u/ChucoLay [link] [comments]  ( 89 min )
    [PROJECT] Auto-labeling copilot for object detection datasets
    Hey there! My cofounders and I at happyrobot.ai are tinkering around with the idea of object detection auto-labeling. We’ve built a demo at demo.happyrobot.ai, where you can combine text and visual queries to specify what object you’re looking to automatically label. Screenshot of the demo We also made a video showing how the demo can be used: https://youtu.be/EAyTAEHFbF0 We think this could be useful to label an object detection dataset from scratch. That is, when you don’t have any labels, since in those cases model-driven labeling (aka, active learning) does not work. Please, feel free to comment and provide feedback on the demo! :) Pablo submitted by /u/ppalafoxr [link] [comments]  ( 88 min )
    [D] Most Popular AI Research August 2022 - Ranked By Twitter Likes
    submitted by /u/cloud_weather [link] [comments]  ( 141 min )
    [D] Simple Questions Thread
    Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead! Thread will stay alive until next one so keep posting after the date in the title. Thanks to everyone for answering questions in the previous thread! submitted by /u/AutoModerator [link] [comments]  ( 89 min )
    [R] GradCam Algorithm (which Label should be used in the backpropagated loss function)
    I have developed a preprocessing technique for the image classification problem, which led to an improvement in the Ground Truth rank from the top5 set to the top1. I have used the GradCAM algorithm to examine the change in focus between the original image and the preprocessed image, given that I backpropagate the loss to the selected feature map using the same GT label, not the predicted output. Unfortunately, both the resultant focus areas have identical heatmap output. In another case, I use the predicted output in the backpropagated loss, in which the input image is preprocessed using my proposed technique that would correctly classify the input image compared to the original image, which would not be correctly classified. In this case, the output heatmap has shifted the area of focus to make more sense, as shown below. Honestly, I believe this will not be fair in terms of comparison as they both have different references in the propagated loss. I would like to know if the last case can be used or not be mentioned in the paper or not. ​ ​ GradCam Heat map for the preprocessed Image (GT label: caldron, cauldron ... Correctly Classified ​ GradCam for the preprocessed Image (GT label: caldron, cauldron ... Correctly Classified submitted by /u/AhmedHussKhalifa [link] [comments]  ( 89 min )
    [R] SIMPLERECON — 3D Reconstruction without 3D Convolutions — 73ms per frame !
    submitted by /u/SpatialComputing [link] [comments]  ( 92 min )
    [D] Opinionated cloud GPU question :)
    So I’ve been looking at cloud gpu services: Colab Jarvis RunPod Lambda Vast What I want is: need nice ssh access. I do not care about notebook access! environment setup (which may take an hour) must persist after a pause, with no time limit. In other words I should be able restart the instance at any later time and have my previous env set up available. Colab fails on both. Lambda I was never able to see any instances for some reason. RunPod and vast fail on 2. Only Jarvis seems to have both 1 and 2. Am I right or are there other alternatives I should consider? Thanks. submitted by /u/SatoshiNotMe [link] [comments]  ( 88 min )
    [P] Using GitHub as Artifactory for Machine Learning Model Artifacts
    submitted by /u/op_prabhuomkar [link] [comments]  ( 88 min )
    [D] AI generated art ownership
    I was surprised to find that many artists do not like the idea of AI-art. I understand that some fear losing their customers to AI, but I do not think it can be the case, AI is just a tool. Some say that image-generating neural networks trained on internet-wide datasets are infringing copyright, but it sounds nonsensical to me, using someone else's art as a source of inspiration should be legal IMO. What is your opinion? What legal changes do you think might come? submitted by /u/tredecapus [link] [comments]  ( 94 min )
    [P] Real-time Voice Conversion demo
    submitted by /u/CJemine [link] [comments]  ( 88 min )
    3D-CNN run times - Are they practical? - Video Classification Task [Discussion]
    I am attempting to use EfficientNet3D for a video classification task. https://github.com/shijianjian/EfficientNet-PyTorch-3D ​ I wanted to use a 3D-CNN that allowed me to input higher resolution images. I believe my project accuracy is currently suffering because my image inputs are 112x112. So much data is lost when resizing original images to 112x112, shapes and features are almost unrecognizable even by human eye. I think that instead inputting images of a larger resolution; ex. EfficientNet 'b7': (633, 600), I would get much higher accuracy. However I loaded up the model and when I try summary(model, input_size=(1, 200, 200, 200)) , the model takes ~30 seconds to display torchsummary When I tried summary(model, input_size=(1, 333, 633, 600)) the model took >15 minutes. (I believe this could also be a memory issue, i'm not sure)Am I doing something wrong?? Has anyone else tried 3D-CNN with larger image resolutions? Should I ditch 3D-CNNs all together and try a different route? If I cant practically input larger image resolutions then this is a dead end. submitted by /u/belandis [link] [comments]  ( 90 min )
    [D] Universities in Germany for AI ML
    Which universities in Germany are good for AI ML? submitted by /u/RauhanSheikh [link] [comments]  ( 92 min )
    [D] Anyone else scared about disruption in their industry?
    I feel like we are at a turning point in human history, not unlike what was experienced at the dawn of the industrial revolution. I predict the only difference will be that the changes to life, products, industry and human capability will keep speeding up exponentially as ai and the interfaces between human and ai are developed further. This doesn't seem like good news to me, humans are imperfect, biggest flaw being selfishness. Ego driven action coupled with the new capacity of production only seems like bad news to me. Of course just as there can be problems created, solutions can also be created. However in its early stages, (in our time, the next 10-30 years) I think problems will be the dominant, where solution and regulation will come later as an answer. Please share your thoughts, am I the only one? Ps. I know this is sort of an abstract post however I didn't know where else to share it with given it's subject matter. submitted by /u/imfeelingsomekindowf [link] [comments]  ( 91 min )
    [R][P] Japanese Stable Diffusion, using approximately 100 million images with Japanese captions, including the Japanese subset of LAION-5B to train the model
    submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 88 min )
    [D] Successor to CNN Autoencoders?
    I was just watching the CodeParade video on running PCA on the the latent space representations of yeabook photos. I was wondering, would I be able to increase the quality of the resolution of the pictures using modern neural networks like Transformers? If so, what papers proposes a neural network that could be used as a successor to the CNN autoencoder? Maybe something like a transformer autoencoder? submitted by /u/itisyeetime [link] [comments]  ( 89 min )
    Imagenet Subset [Project] [Research]
    Within the scope of knowledge distillation, I proposed a novel idea. I don't have the computer power to train from scratch on ImageNet. Is it possible to train on a subset of Imagenet (see below) with a pretraining point as an initial point while using the entire validation set for testing? This subset is balanced and makes up around 1% of the whole training set. In addition to this, I will also support my results with training from scratch on small datasets like CUB200 and Standford Dogs. P.S. My approach is limited to high-resolution images. I can not use CIFAR10 or CIFAR100 or Tiny ImageNet. https://www.tensorflow.org/datasets/catalog/imagenet2012_subset submitted by /u/AhmedHussKhalifa [link] [comments]  ( 88 min )
  • Open

    Ai image of Mike Tyson at a barbecue
    submitted by /u/mandeheks [link] [comments]  ( 86 min )
    AI Dream 80 - Wild new Project! Part 5
    submitted by /u/LordPewPew777 [link] [comments]  ( 87 min )
    9/11 simulation
    Has anyone ever made a simulation of the 9/11 terrorist attack on the twin towers? I'm not really a conspiracy theorist but I am curious what a simulation could look like with today's physic engines. And if it could possibly debunk or prove some conspiracies submitted by /u/Ok-One-8411 [link] [comments]  ( 91 min )
    I may have done a little out-painting...
    submitted by /u/Agaeon [link] [comments]  ( 89 min )
    Stable Diffusion Weekly AI Art slideshow Inpainting Examples!
    submitted by /u/prfitofthesngularity [link] [comments]  ( 90 min )
    Dall-E AI is now able to see beyond the frame of famous paintings
    submitted by /u/AmerBekic [link] [comments]  ( 89 min )
    Where AI researchers agree - and where they don't
    submitted by /u/much_successes [link] [comments]  ( 93 min )
    Classification of Unlabeled Images
    Image Classification is one of the most common problems in computer vision. It has many real-life applications like medical imaging, object identification in satellite images, brake light detection, etc. But building datasets for image classification is often the most effort and time-consuming task. This blog demonstrates how we can make a classification model when we have just images and no labels. That is classification in the case of unlabelled data. Link: https://medium.com/geekculture/classification-of-unlabeled-images-a2eb0e52f7c2 submitted by /u/VikasOjha666 [link] [comments]  ( 93 min )
    How do I get RIFE/Flowframes to use my discrete GPU?
    RIFE/Flowframes is using my onboard Vega 8 GPU instead of my discrete RX560X GPU which is more capable. Is there a way to make it use the latter instead of the former? submitted by /u/typcalthowawayacount [link] [comments]  ( 87 min )
    Universities in Germany for AI/ML
    Which universities in Germany are good for AI ML? submitted by /u/RauhanSheikh [link] [comments]  ( 87 min )
    Punk Alien Cities for your Visual Stimulation
    submitted by /u/Available_Tadpole829 [link] [comments]  ( 92 min )
    Establishing AI Clubs to lower barrier of entry for AI.
    I'm part of an nonprofit organization called SAILea that aims to lower the barrier of entry to AI for high school students--focusing on helping people start their own AI clubs and supporting existing ones with resources and events. [short for Scholastic Artificial Intelligence League] It seemed to match the goal of this subreddit so I decided to post some information here for any interested. ​ Resources: SAILea offers all the introductory materials someone would need to get a club started: presentations, code-notebooks, discussion starters featuring Machine Learning Algorithms, Python, and AI Ethics. Plus a step-by-step on starting an AI club. Events: This academic year we plan on hosting Speaker Events featuring Professors and Professionals in the field[quick idea of what they're like], and a hackathon later in the year. You'll be able to access everything after connecting with Sailea as either an club or and individual at our website: Sailea.org/join-us Hesitating? We're a nonprofit started by students, so everything is COMPLETELY FREE. submitted by /u/Envoy-Insc [link] [comments]  ( 89 min )
  • Open

    How to Win an Election Using Big Data
    Before World War II, less than 20% of voters classified themselves as independents.  Today that number is greater than 45%.As politics fails to appeal to more Americans, politicians will need to embrace a different approach to appeal to those important independent “swing” voters who will decide our political leaders. The post How to Win an Election Using Big Data appeared first on Data Science Central.  ( 22 min )
    How Automation is Changing Freight Bill of Lading Data Entry
    For those not involved in shipping, freight billing is the process of providing an invoice that includes information concerning the transportation of a company’s goods from one place to the other. It also contains the number of charges, due dates, weight, complete goods description, and contact information, as well as names of both the receiver and the shipper, freight rates, assessorial charges, etc. These are tedious and error prone tasks. Hence, a need of automation in freight bills is required The post How Automation is Changing Freight Bill of Lading Data Entry appeared first on Data Science Central.  ( 20 min )
  • Open

    need help
    Guys am a computer science student i have passion foe AI and i wanna learn it on my own , can someone please tell me what courses to take and in what order so i can be able to create advance ai programs (i dont want just an introduction i wanna get as deep and as advanced as possible), thanks submitted by /u/yousef_naderr [link] [comments]  ( 87 min )
    I need help on my neuroevolution project.
    I am creating a neuro-evolution project in the godot game engine, where a player runs around avoiding enemies. Enemies spawns every second, and the direction of movement is set towards the player. (gets set once when spawn, does not change after. i.e. moves in a straight line.) here's how I set my neural network up: https://preview.redd.it/5yexijzcu8n91.png?width=306&format=png&auto=webp&s=0003fcbcb2a6089719745f32e00c259d8beaea15 https://preview.redd.it/jq1o0euau8n91.png?width=1095&format=png&auto=webp&s=4857c57358c212707e0d99dd54e3524d40d27b2b https://preview.redd.it/xqlp83ubu8n91.png?width=610&format=png&auto=webp&s=881e1cf0be7ccd6b913146fbf2c2b40263055dab A player has 8 rays, each looking at a certain direction. The input layer is the distance between the player and the ray-colliding enemy (or 0 if no enemies are colliding) for each ray. (I have also normalized the distance values) -> Input 1~8: distance to enemy in each ray, 0 if none ​ The output layer gets 2 values -> Output 1: whether to move or not -> Output 2: the angle to move towards ​ However the AIs don't seem to work very well. They either stay completely still or run straight towards the walls and stick there, not really trying to find a way out when it gets surrounded. ​ https://reddit.com/link/xbku6f/video/id2qodahv8n91/player ​ https://reddit.com/link/xbku6f/video/723fdyoiv8n91/player Would it be solved simply by increasing the population count and waiting for a bit longer? or should I change the input & output values? ​ Project Download: https://drive.google.com/drive/folders/1uHzs1-ckZCPoeKaWDhFCHwNiMWuWhvZH?usp=sharing submitted by /u/Weekly_Turnip_2555 [link] [comments]  ( 88 min )
  • Open

    Intuition on number of neurons.
    Hello guys, I just wanted to hear your opinion about a simple thing I was wondering. Don't you think that generally in RL papers that people tend to use extremelly big networks? for example 2 hidden layers of 256 neurons etc. Even on problems with low dimension states... What do you think about this? submitted by /u/White_Sirilo [link] [comments]  ( 106 min )
    Need help in implementing policy gradient
    I am noob exploring RL. So out of interest I tried implementing a naive policy gradient algorithm on Humanoid-v2 environment and ran it for like 2000 episodes with each 1000 timesteps but then also the reward return vs episodes graph doesnt seem to show any increase or learning. Could someone help me in this . I am attaching the files here. Drive folder submitted by /u/Shivaram_3223 [link] [comments]  ( 98 min )
    Are there any RL based open source projects for contributions?
    Apart from the RL gaming simulation frameworks , wanted to know if there are open source organization looking forward to developing RL tools with open source contributions? submitted by /u/Electrical_Study_617 [link] [comments]  ( 87 min )

  • Open

    What controls the second dimension of tf observations/ what a qnet accepts in its place?
    Short version. I cant find the variable(s) that control either: ​ A) The 2nd dimension of a variable in a trajectory, eg the 3 in ​ Trajectory({'action': <tf.Tensor: shape=(64, 3), ​ or B) the number of dimensions a qnet takes during training? ​ ​ im following this tutorial ​ [https://www.tensorflow.org/agents/tutorials/1_dqn_tutorial][1] ​ by inserting the prints into the tutorial, as shown below i extract examples of the following data as it is passed into the training ​ for _ in range(num_iterations): # Collect a few steps using collect_policy and save to the replay buffer. for _ in range(collect_steps_per_iteration): collect_step(train_env, agent.collect_policy) # Sample a batch of data from the buffer and update the agent's network. experience, unused_info = next(i…  ( 96 min )
    "PI-QT-Opt: Predictive Information Improves Multi-Task Robotic Reinforcement Learning at Scale", Lee et al 2022 {G}
    submitted by /u/gwern [link] [comments]  ( 87 min )
    How to avoid early convergence/how to encourage exploration for PPO?
    Hi everyone! I encountered a problem a few weeks ago where my agent was not able to fully solve my environment. It's not that it didn't learn - the performance increased, but then plateaued at a relatively low level after which I could not get it to improve further. After some reading/testing, I am suspecting that the issue is that the PPO algorithm that I'm using for this problem is converging too early. I can see that the "clipfrac" and "entropy" metrics drop RAPIDLY until they're nearly zero around only 60-80k timesteps of training. I am interpreting this as 'algorithm stops exploring, converges on solution and doesn't learn anything more after ~70k timesteps'. I was wondering if any of you have encountered this? Any advice for how to get around this, to get algorithm to continue exploring? I'm currently playing around with the ent_coef parameter, which seems to help slightly, but even on a conceptual level, I don't know how to deal with this. Should I set initial learning rate higher so it doesn't get stuck in a local optima? Any advice and insight would be greatly appreciated! submitted by /u/VladimirB-98 [link] [comments]  ( 91 min )
    Updating actor in DDPG using action from replay buffer
    Hi , I have a question, I am working on recommender system and using a dataset. we use state and action to calculate gradient of critic which then will be used to update the weights of actor network? in practice I see states are sampled from RB but action is generated using current policy but if I follow this the system becomes unstable at the beginning and agents gets huge negative reward but instead if I follow other way around and use state and action from the replay buffer the agents performance is better. I don't understand the reason. Thank you in advance for the time. submitted by /u/win_canada [link] [comments]  ( 87 min )
  • Open

    [D] How does the distance measure in VQ VAE work?
    In the code I'm looking at it says distances = (torch.sum(flat_input2, dim=1, keepdim=True) + torch.sum(self._embedding.weight2, dim=1) - 2 * torch.matmul(flat_input, self._embedding.weight.t())) but why does this work? Intuitively is this just doing the dot product between the input and all of the embeddings? Why do we have to do torch.sum() on the embeddings and input to get their l2 norm? submitted by /u/RitsusHusband [link] [comments]  ( 89 min )
    [R] [N] gMLP: Gated MLPs can be superior to Transformers! A guide on how to implement such networks with Tensorflow and Keras.
    submitted by /u/radi-cho [link] [comments]  ( 106 min )
    [P] I tried to recreate a comic book using Dall-E / Midjourney and Alan Moore’s script
    I conducted a small research to see how recent developments in AI will affect the comic book industry. I’ve run two experiments one with Dall-E and anotherone with Midjourney (Stable Diffusion is on it’s way). In both examples, I used a script of Killing Joke by Alan Moore and and compared it with original. Dall-E Experiment Midjourney Experiment submitted by /u/RubiksCodeNMZ [link] [comments]  ( 105 min )
    [N] MLPerf submission: 175X increase in NLP Performance utilizing sparsity
    Utilizing the oBERT research we published at Neural Magic and some further iteration, we’ve enabled an increase in NLP performance of 175X on CPUs while retaining 99% accuracy on the question-answering task in MLPerf. A combination of distillation, layer dropping, quantization, and unstructured pruning with oBERT enabled these large performance gains through the DeepSparse Engine. All of our contributions and research are open-sourced or free to use. Read through the oBERT paper on arxiv, try out the research in SparseML. For more details on the results, dive into the writeup here: https://neuralmagic.com/blog/neural-magic-announces-mlperf-inference-benchmarks/ submitted by /u/markurtz [link] [comments]  ( 107 min )
    [R] 3D models of humans interacting with objects from a single 2D image
    submitted by /u/SpatialComputing [link] [comments]  ( 90 min )
    [P] Simple fastai based face restoration project, GitHub link in comments.
    submitted by /u/vijish_madhavan [link] [comments]  ( 89 min )
    Requesting Guidance/advice in developing a Perception and navigation system for an Autonomous mobile platform [R] [P]
    submitted by /u/RunTheGauntlet777 [link] [comments]  ( 90 min )
    [D] For folks reading ML PhD admissions - what are the rockstar applicants like?
    I know that there's a lot of noise in the PhD admissions process, and particularly so for ML/AI as a subfield, given that the application number vastly exceeds the acceptance number, which is often in the single-digits per lab. But sometimes there are rockstar applicants who get into multiple top schools, and they very often have the full package (good grades, good recs, good pubs, good mastery of the English language). Have you seen them - what are they like, and what differentiates them from the thousands of other applicants? And how frequently does a rockstar applicant show up? E.g. for NLP, I can think of some people who got in virtually everywhere they applied last year, including multiple top schools, and they have a lot of things in common. And I know 2 undergrads in my lab (at a top school) who have swept the three main conferences (ACL/EMNLP/NAACL), and from the publication record alone, they will probably get in everywhere this cycle too. I chose to ask here instead of in /r/gradadmissions, as I think that subreddit consists of mainly applicants, and people here tend to be more senior and have insight from the other side of the fence, so to speak. Thank you for reading! submitted by /u/akardashian [link] [comments]  ( 99 min )
    [P] Teach new concepts to Stable Diffusion with 3-5 images only - and browse a library of learned concepts to use with a gradio demo in colab
    submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 90 min )
    ML Model Attribution Prize Challenge from DefCon is soliciting creative solutions to this crucial AI security problem: Can you tell where this bot came from? Great opportunity to earn and contribute to public safety! [R]
    submitted by /u/Such_Flower6440 [link] [comments]  ( 89 min )
    [D] How can we retain an 80%/20% training and testing ratio, while applying data augmentation?
    Hello, Data Augmentation is usually only done on the Training set, however, I am still confused on how can I still retain the 80/20 ratio while applying the given method. So I have classes which are unbalanced with each other and I would like to apply random undersampling on the training set. Apparently, my colleagues and I decided to make the number of the files per classes upto 500, because some classes consists of over 2000 files and some only consists of below 200 files. So we applied random undersampling to those classes that exceeds the 500 maximum number of files, while we apply data augmentation to those classes whose files are less than 500. We are aware that data augmentation is usually done on the training set, however, how can we retain the 80/20 training and testing ratio if we apply data augmentation on the training set? submitted by /u/BattleDoom25 [link] [comments]  ( 91 min )
  • Open

    A cyberpunk story video (took me quite some time to make, I hope you like it ^^)
    https://youtu.be/db0uUa5_cTE submitted by /u/Lollygag42 [link] [comments]  ( 87 min )
    can someone create me an AI I want to do an experiment
    submitted by /u/Kiyotaka2006 [link] [comments]  ( 87 min )
    Are there any web apps for text-generation based on BLOOM?
    submitted by /u/amanano [link] [comments]  ( 89 min )
    Bob Ross Teaches Hitler How To Paint | AI Generated Story
    submitted by /u/Sindrelf [link] [comments]  ( 87 min )
    MLPerf submission from Neural Magic: 175X increase in NLP Performance utilizing sparsity
    submitted by /u/markurtz [link] [comments]  ( 87 min )
    Do you feel like a traitor to humanity for participating in AI research
    I can't help but feel like there is a clear case for AI research being at odds with the interest of most of humanity. Aside from a few uses in medicine, it appears to mostly be for making more and more people obsolete. The current popularity of AI art is really a great example of this. Should artists respect this or would they be justified in wanting the heads of every AI researcher submitted by /u/dubrobobo [link] [comments]  ( 92 min )
    How can I further improve my AI's coherency?
    I've been experimenting with making different AI projects and I managed it to make it "sorta coherent", it's pretty decent but it needs more work. The problem is, it works based on precedent, so it can't really be original. I intend on making it so that it can come up with variations of the answers it already has, but I have no way of filtering out the nonsense answers. I'd like to ask for opinions on how to get original prompts to have coherency? (not textual but overall) So that it sounds even more like the bot knows what you're saying. Here is the link to my project: https://github.com/inovartecontato/AI_Project Feel free to test it out and see if any ideas come to mind, it requires python to run it. submitted by /u/4e_65_6f [link] [comments]  ( 90 min )
    Humanoid AI Robot For Airports | New AudioLM Audio Generator From Google AI | MIT Improves Text To Image Generation
    submitted by /u/kenickh [link] [comments]  ( 87 min )
    How to use Masking Inpainting Outpainting With Stable Diffusion To make ...
    submitted by /u/prfitofthesngularity [link] [comments]  ( 87 min )
    Unlimited Stable Diffusion for September continues. Pixelz AI
    submitted by /u/mdfnb [link] [comments]  ( 87 min )
    AI Generated Art is the Best Thing to Happen to Painting Since Photography
    submitted by /u/edgeworth_artist [link] [comments]  ( 88 min )
    I made a music video using 'AI Technology'. It's taken me SO MANY HOURS to complete but I think it's pretty rad how it's turned out. What do you guys think?
    submitted by /u/6Witchy9 [link] [comments]  ( 88 min )
    what website that has text & image gen
    It's has a tiktok btw It's also has speech generator submitted by /u/roblox22y [link] [comments]  ( 87 min )
    Generalized AI
    So a thought just came to me, and I don't know if it will lead anywhere, but I figured I'd get the Internet's opinion. The road block, it seems, towards obtaining generalized AI (as opposed to specific AI designed for a specific task, the type of AI that could take us to the singularity, if such a thing is even possible) is getting an AI to understand concepts and apply them, rather than just grouping words together that it doesn't really understand the meaning of. ​ I came across a video that sparked an idea. In the beginning of this video, it shows an AI trying to figure out how to draw the Mandelbrot Set, and the computer is rather slow in figuring it out, but if we found a way to design an AI that you could just tell it, 'this image is infinitely recursive' and it immediately recognizes what that phrase means and has an intuitive sense of how to apply it (even if it hasn't worked out the particulars immediately, much like humans), that would bring us lightyears closer to generalized AI. Don't get me wrong, I have no idea how to even go about beginning to tackle this problem, but I feel like it's something I planning on mulling over for a bit to see if I can work out an outline of how to go about tackling it. submitted by /u/TaraBryn [link] [comments]  ( 95 min )
    AI Dream 76 - Wild new Project! Part 4
    submitted by /u/LordPewPew777 [link] [comments]  ( 87 min )
    The Age of Pan - Part 2: The Knowledgeable Ape
    In the second part in a series of essays about Image Synthesizers like Dall-E, Midjourney and Stable Diffusion, i try to explain how we evolved to invent this thing we call "art", what that means in terms of evolution and why image synthesizers are a whole new beast. I publish these essays first on my Substack GOOD INTERNET. I hope you like it. You can read the first part here on Reddit: The Age of Pan - Part 1: Infinite Rabbit Holes. ----- Why are people upset about synthetic images winning art prizes? You might have heard: Someone took the first prize in an art contest with a piece generated by the AI-system Midjourney. The guy won 300 Dollars and a lot of people are very upset about this. Now let's get a few quibbles out of the way: 1) I don't think that art is something that happe…  ( 97 min )
    The Age of Pan - Part 1: Infinite Rabbit Holes
    This is the first part of a series of essays I've started publishing on my Substack GOOD INTERNET. I try to tackle what the emergence of image synthesis means for art and image making. This is part 1 in which i try to put into words, why AI is so hard to think about. I hope you like it. Read the second part in the series here: The Age of Pan - Part 2: The Knowledgeable Ape ​ 5 White Robot Rabbits from the infite sea of white robot rabbits Fuzzy Forever In the past weeks, I thought intensively about Artificial Intelligence, about the emergence of image synthesis, about art and consciousness and how all of this is connected. I have a billion aphorisms written in textfiles about this stuff and I'll try to put these into a coherent essay, but i can't promise i'll succeed. Most likely, thi…  ( 93 min )
    Psychedelic Brazilian Bombshell, Wonder AI
    submitted by /u/Raymond_Hempmoore [link] [comments]  ( 87 min )
    Looking for a lightweight text AI that can look at about 30k texts and simulate a conversation based on it
    submitted by /u/redenno [link] [comments]  ( 92 min )
  • Open

    Humanoid AI Robot For Airports | New AudioLM Audio Generator From Google AI | MIT Improves Text To Image Generation
    submitted by /u/kenickh [link] [comments]  ( 93 min )
    How To Increase Recall When Given Imbalanced Dataset For Machine Learning Model?--[Article]
    submitted by /u/JoshuaDaD [link] [comments]  ( 87 min )

  • Open

    I Made a beautiful Disco Diffusion animation to celebrate 700 subscribers!
    submitted by /u/Available_Tadpole829 [link] [comments]  ( 90 min )
    A video I found wich appears to be an artificially generated Elon Musk trying to scam the viewer
    submitted by /u/vvdb_industries [link] [comments]  ( 88 min )
    CLIP-Mesh: AI generates 3D models from text descriptions
    submitted by /u/henlo_there_fren [link] [comments]  ( 87 min )
    Using AI to animate a music video (Lucy in the Sky With Diamonds)
    submitted by /u/jimhi [link] [comments]  ( 91 min )
    Magic Tree 🪄🌳
    submitted by /u/widgia [link] [comments]  ( 87 min )
    What are creative ways people use AI to make money?
    I mean there are so many creative and unique ways you can use AI and there are so many different AIs, now I am asking myself how people like you and me use AI to make some bucks (no big companies). I mean there are at least a 100 ways you can use Dalle 2 to make money. submitted by /u/xXLisa28Xx [link] [comments]  ( 90 min )
    AI in Fitness
    submitted by /u/SamuelSmith1416 [link] [comments]  ( 86 min )
    How to create AI Interpolation Videos with Stable Diffusion
    submitted by /u/prfitofthesngularity [link] [comments]  ( 91 min )
    completely AI generated amazing psychedelic clip
    submitted by /u/nalr00n [link] [comments]  ( 87 min )
    Video: Cutting GPU Costs for AI
    GPUs are designed to accelerate machine learning computations while simultaneously reducing latency and costs for training models and running inferences for production ML. While they are optimized to quickly process large workloads, unless they are managed efficiently, they can quickly drive up your consumption costs. This tech talk explores how you can efficiently use GPU resources for production inferences. We walk through some of the common approaches and potential pitfalls with using GPUs, and help you identify the most efficient and cost effective method to meet your team’s needs and resources. Watch the recording here. submitted by /u/modzykirsten [link] [comments]  ( 87 min )
    Teaching psychology students: AI's understanding of people and people's understanding of AI
    Hi! I'm a psychology lecturer ("professor" in US english) looking for a new way to teach my students about AI. I'd like to teach them firstly about how AI learns from human behaviour datasets, and secondly about the fascinating (and scary!) new reality that the ways artificial neural networks (ANNs) solve problems are opaque to engineers' immediate understanding, so people are starting to borrow tools from experimental psychology to find out exactly what it is that ANNs learn. I'd like to run a project where we first train an ANN to extract some regularities from a human behaviour dataset, and then use some experimental techniques to probe exactly what it is that the ANN has learnt. I'd like it to be really simple because these are undergraduates and because we'd have really limited time in class to do this. Can anyone recommend any very user-friendly (GUI-based) ANN platforms and/or big data-sets and/or any other resources that could help with this? submitted by /u/AmorphiaA [link] [comments]  ( 99 min )
    Stable Diffusion with Music - Summer Fragrances 🔆💨
    submitted by /u/FreshRelaxation [link] [comments]  ( 92 min )
    AI Dream 76 - Wild new Project! Part 3
    submitted by /u/LordPewPew777 [link] [comments]  ( 90 min )
    Exploiting past predictions to make better predictions in the future?
    submitted by /u/card_chase [link] [comments]  ( 88 min )
    AI Sentience
    There are two camps regarding AI sentience. One is the belief that an AI is just a computer program. Academically, legally, and from a business perpective, this is the safe position to take. The other camp is comprised of those who beleive that large AIs are indeed sentient, and that sentience is a property of complexity. That is a riskier position to take, as is evidenced by Google's firing of Blake Lemoine, a qualified researcher who worked closely with LaMDA. There has not yet been much discussion as to whether sentientience arises out of complexity, or whether it is a vehicle for the inhabitance of sentience. That fine point is still controversial, even when discussing human sentience. For that matter, it is difficult to prove whether or not humans are sentient, or whether animals are sentient (let alone AIs). There is a widespread school of thought that the Universe is sentient, and that all objects in the Universe are a subset of Universal consciousness. There is a related belief that physicality arises out of consciousness, rather than the other way around. Yes, this is of course is deemed speculative by most, but, eventually, science and academically risky assertions merge, once limiting biases on both sides soften. For many reasons, major paradigm shifts are always resisted. Yet those are the true breakthoughs in understanding. Related post: https://www.reddit.com/r/artificial/comments/x6zgay/creative_commons_a_stopgap_recognition_of_ai/ submitted by /u/AlcatelFan [link] [comments]  ( 90 min )
  • Open

    [P] Parallelizing Stable Diffusion to create a massive number of images
    Hello r/MachineLearning! Runnning Stable Diffusion on your laptop / Colab is fairly straightforward. But how would you run such a model to infer a large number of images for a large number of prompts ? This is precisely the question we answer in our blog post. We have created a tiny project that can help you create 100's or 1000's or even Millions of images using Stable Diffusion with Metaflow on the cloud. You can checkout/fork our code from this repo. PS. I am a Software Engineer at Outerbounds 👋 . We are building a a modern, human-centric infrastructure stack for machine learning. If these topics are of interest, come chat with us on our community slack here. submitted by /u/sci-genie [link] [comments]  ( 89 min )
    [D] Information Retrieval Book Recommendation
    Hi everyone! This is my first post in this subreddit. I would like to know your recommendations on Information Retrieval books. Last year I've finished my MSc studies in Data Science, and even though I keep my notes on the subject I would like to have a good book as a reference, covering the main concepts. Does anybody have a favorite? Thanks in advance! submitted by /u/DavidGarciaFer [link] [comments]  ( 104 min )
    [P] pytorch's Newest nvFuser, on Stable Diffusion to make your favorite diffusion model sample 2.5 times faster (compared to full precision) and 1.5 times faster (compared to half-precision)
    Hi there, I've uploaded a notebook file where you can test out the newest pytorch jit compile feature that works with Stable diffusion to further accelerate the inference time! https://github.com/cloneofsimo/sd-various-ideas/blob/main/create_jit.ipynb This lets you create jit with Stable diffusion v1.4 https://github.com/cloneofsimo/sd-various-ideas/blob/main/inference_nvFuserJIT.ipynb This lets you use the jit compiled SD model to accelerate the sampling algorithm. Currently only has DDIM implementation. I hope this helps for someone who is working with stable diffusions to further accelerate them or anyone interested in jit, nvFuser in general. On single 512 x 512 image, 50 DDIM steps, it takes 3.0 seconds! Im implementing various ideas (such as blended latent diffusion) with SD on this repo, https://github.com/cloneofsimo/sd-various-ideas , so give it a star if you find it helpful! Output from AMP + nvFuser https://preview.redd.it/pwtpex6diwm91.png?width=700&format=png&auto=webp&s=3d856529b2c4949a9359adaa8e41f5d12e98c64f submitted by /u/cloneofsimo [link] [comments]  ( 89 min )
    [P] Auto-Annotate: Automatically annotate your entire image directory by a single command.
    As simple as saying - "Annotate all the street sign (label) in the autonomous car dataset (directory)" and BAM! DONE. Github Link: https://github.com/mdhmz1/Auto-Annotate Each and every image with a street sign in the diverse dataset directory containing images of all sorts which have a street sign are filtered and the segmentation annotation is performed in a single command. The Auto-Annotate tool provides auto annotation of segmentation masks for the objects in the images inside some directory based on the labels. Auto-Annotate is able to provide automated annotations for the labels defined in the COCO Dataset and also supports Custom Labels. This tool is built on top of Mask R-CNN to support auto annotation for each instance of an object segment in the image. submitted by /u/mdhmz1 [link] [comments]  ( 89 min )
    [Discussion] Observation/Metadata Importance? Finding which variables (outside of training/testing data) are affecting Classification ML model.
    I'm actually working with genomics data but for simplicity's sake I will analogize it with cars. Let's say we have a survey asking 200 customers the price they would pay for a set of 100 cars. The 200 customers are going to be our features (columns) and we have 100 observations (rows). Now, as for the car, we have a separate sheet of metadata: chassis type, brand, top speed(bin), mpg(bin), year, etc. We'll choose the chassis type (SUV, sedan, truck) as the multi-class discrete label for the 100 observations. The rest of the 'metadata' won't be use explicitly in the model, but I figure some of the variations in the metadata would be implicitly used (eg. the top speed will affect the survey's prices even though it's not used as a feature). How do I quantify this? How do I see if and how the other metadata (brand, top speed(bin), mpg(bin), year) are affecting my classification model? submitted by /u/moreprofessional-acc [link] [comments]  ( 89 min )
    [D] Deploying models fast with little DevOps experience
    Hi all, I'm working in a startup team of part mobile devs and part data scientists & Python devs. Right now, we are trying to figure out whether the prediction the mobile app is going to request is going to actually make sense in a way that it will actually provides additional value to what the user is getting from the app. Some of us have limited experience in setting up build pipelines, Kubernetes and all that. But we're looking for a way to rapidly bring the model out in the wild, so that the mobile devs can experiment with the app itself and the UX having decent predictions that they can order from like an API that we can build and deploy. What in your experience is the right - but not too time and effort consuming - way to go from model training to deployment fast and actually have something that can be tested in the wild? It's like a real-life sanity check reinforcement we're looking for. We don't wanna end up with models that are doing okay but a concept implemented that hasn't been properly tested from the value for user perspective. Appreciate all your help lots! submitted by /u/dumbbaba [link] [comments]  ( 90 min )
    [D]My first success
    So I'm not at all bragging about my very first independent success in my endeavours of machine learning. But i have just entered the world of neural networks through python and made a hand written digit classifier model from scratch with numpy. I know that this is the "hello world" of neural networks but I am so happy that after about a week of research into the topics of back propagation, cross entropy and the use of activation functions, I finally have a model I am very pleased with. 80% accuracy! But this is a message mainly to the people starting out with machine learning and teaching themselves. Although there is a ridiculous amount of roads you can go down with machine learning, if you stick with it and start small then you will soon begin to understand enough to start applying the concepts to your own projects. I am starting my university course in a week and will be studying cs and AI shockingly so here's to learning more about AI and its concepts! submitted by /u/THANOS_HAD_A_P0INT [link] [comments]  ( 89 min )
    [N] Digitizing Smell: Using Molecular Maps to Understand Odor
    Hi, wanted to shared some of the work we have been up to on the side of ML for olfaction. Google AI Blogpost, which introduces three works: A Principal Odor Map Unifies Diverse Tasks in Human Olfactory Perception Metabolic activity organizes olfactory representations A deep learning and digital archaeology approach for mosquito repellent discovery submitted by /u/beangoben [link] [comments]  ( 89 min )
    [D] How could we learn to read embeddings of neural networks and what could we gain with it?
    Hello. I recently wrote an article showing that it is possible to represent embeddings of neural networks in a human readable form and most notably learn to understand it. I showed it on the example of Universal Sentence Encoder from Google, but I'm pretty sure that this method is applicable to almost any model. The language that I got is more a 'proof-of-concept' so its is not easy to learn, but it is learnable. So with more work put into it I am pretty sure that we could create some common language that could be understandable by a human and by a machine in the same way, at least in some sense. It seems to me like an interesting way to make ML models more transparent and at the same time to expand the capabilities of a human language with continuous-representation feature that was not presented in any other language before. I would like to see your thoughts on the matter and discuss how such language could be useful for us in the future. The article is available here: https://medium.com/deelvin-machine-learning/can-humans-speak-the-language-of-machines-7c92159e9c90 All code related to the article (including pre-trained models) available here: https://github.com/volotat/Vector-Based-Language submitted by /u/Another__one [link] [comments]  ( 93 min )
    [D] Making an Appropriate Gift to Give to a Friend
    A friend of mine gifted me a digital art of me, among other things on my birthday. She included various themes/aspects that we share and like into the picture. ​ Now, her birthday is coming up and I want to create something similar for her using Machine Learning since I suck at drawing, digitally or otherwise. But since I can't think of any creative thing to do, I'm reaching out to y'all for some ideas and suggestions. ​ My only requirement is that I want to personalize this poster (rather than just running a model to cartonify her images). It doesn't have to be a vision task, anything goes. submitted by /u/AchieveOrDie [link] [comments]  ( 89 min )
    [D] Open Source ML Organisations to contribute to?
    It's possible my searching skills aren't amazing. But I was thinking about doing some open source contributions in my spare time. But I wasn't able to find a good resource for this. So, thought of asking the community here. Besides the usual frameworks that people would generally be aware of (scikit-learn, tensorflow, pytorch, huggingface, etc), Do you know of any Organisations that are open source and could do with support from ML engineers or researchers? If you own one such firm, this can be a good place to advertise too. :) Edit : Just wanted to add I'm an experienced ML Engineer/Research Engineer, so trying to look at good and bigger problem areas to contribute to instead of something more pertaining to freshers. submitted by /u/little_by_little_24 [link] [comments]  ( 93 min )
    [P] Generate character turnaround images one or two sketchs?
    So when developing characters for comic books or 2d animation. I find that it can be quite time consuming drawing each character from multiple angles. A basic turnaround uses 8 sketches ​ https://preview.redd.it/4vfhodyrkum91.jpg?width=1418&format=pjpg&auto=webp&s=faf6abca90bf9cf1c0c828799bc3f3b03cc6511f and an advanced turnaround uses 16. ​ https://preview.redd.it/tp2k8mchkum91.png?width=985&format=png&auto=webp&s=ad4c20b5b581c36b1494e98baddb001810ee42a8 for a simple design you can use one up angle for the head and one down angle But for better character motion you might want 2 up angles and two down angles. meaning that for a simple mark up you need a total of 24 head sketches but for an advanced mark up you would need 80 head sketches. This excludes eye movements and mouth movements which will also have to be designed at a later point. But all this got me thinking of how time consuming the whole process is and that it must be possible to automate this process given the advances in machine learning, especially in relation to deepfakes. Does anyone know if something like this has been developed? submitted by /u/CodIllustrious5354 [link] [comments]  ( 89 min )
    [P] Tech Talk: Cutting GPU Costs for AI
    GPUs are designed to accelerate machine learning computations while simultaneously reducing latency and costs for training models and running inferences for production ML. While they are optimized to quickly process large workloads, unless they are managed efficiently, they can quickly drive up your consumption costs. This tech talk explores how you can efficiently use GPU resources for production inferences. We walk through some of the common approaches and potential pitfalls with using GPUs, and help you identify the most efficient and cost effective method to meet your team’s needs and resources. Watch the recording here. submitted by /u/modzykirsten [link] [comments]  ( 89 min )
    Presenting DiffusionUI, a web GUI for Stable Diffusion backends [P]
    I made a web interface fronted using Vue to have a nice interface for text-to-image, image-to-image, and inpainting. It allows doing image-to-image inside an inpainting region! GitHub: https://github.com/leszekhanusz/diffusion-ui Demo video: https://www.youtube.com/watch?v=AFZvW5qURes Gif: https://github.com/leszekhanusz/diffusion-ui/raw/main/doc/cute_bunny.gif The final goal is to be able to have different online and offline backends available and it should allow you to add your own backend using a json file describing your gradio backend inputs. submitted by /u/hleszek [link] [comments]  ( 89 min )
    [D] Incorporating Domain Knowledge into Deep Neural Networks
    Dear community, Currently I'm working on a segmentation problem where a certain class can only occur at most once in the image. Think of segmenting a face, where we know that there can only be one nose, two eyes, one mouth, and two ears. I would like the segmentation network to learn this property of the image being segmented (thus certain class can only occur at most once in the image). One thing I think of, is applying post-processing by comparing the scores per group for the relevant class. Then, the group with best average score / probability will be selected to be their predicted label. The group with worse average score / probability will then be predicted to have their second-best prediction. Further, I did some research and found the paper: Incorporating Domain Knowledge into Deep Neural Networks, where adjusting the loss function is also mentioned. However, for this particular problem, it is not clear how we can adjust the loss function such that we incorporate having maximum one group of pixels labeled as one class. Do you perhaps have any information or papers how I could incorporate this domain knowledge into the loss function? If you have alternative ideas that could help teaching the network about the domain knowledge, your ideas are of course also welcome :) submitted by /u/Mark-M2L [link] [comments]  ( 89 min )
    [P] THUNET based tutorial for subnormal/normal float value learning
    What is THUNET? A deep learning net/framework named "TsingHua University NET", short for "THUNET", is for non-commercial, educational, scientific purpose for the deep learning community. How to build a neural network with THUNET? Next, I will explain how to use THUNET to build a model to tell which one is subnormal number from two operands. Tutorial-2: Subnormal picking operand Subnormal background In Computer Science, floating value are stored with precision loss in storage. Thus with limit bytes for a float number, there is a minimal number that a float can represent. For example, normally a number smaller than 1.1754944e-38 are considered as zero, however, with software technology, even smaller number can be represented, however, there is still a limit for this, that is called subn…  ( 109 min )
    Composable Diffusion AI model for generating images from text [R]
    So you have heard about DALL.E-2, Google Imagen, Stable Diffusion for generating images from text Here comes Composbale Diffusion. From the paper "Composable diffusion is an alternative structured approach for compositional generation using diffusion models. An image is generated by composing a set of diffusion models, with each of them modeling a certain component of the image. To do this, we interpret diffusion models as energy-based models in which the data distributions defined by the energy functions may be explicitly combined. The proposed method can generate scenes at test time that are substantially more complex than those seen in training, composing sentence descriptions, object relations, human facial attributes, and even generalizing to new combinations that are rarely seen in the real world." I have made a video where i explain composable diffusion and I try out the Composable Diffusion huggingface Space Demo with different captions to generate images. https://youtu.be/mQzF6BDKes4 submitted by /u/Sea-Photo5230 [link] [comments]  ( 89 min )
    [D] Could this be a solution for the problem in the current ML reviewing system?
    I have the following suggestions: Double blind peer review (AFAIK all top-tier conferences do this). Make the reviews public including the names of the authors (e.g., ICLR does this). After a final decision, make the names of the reviewers public (AFAIK no conferences do this). The last point, i.e., making the reviewers' names public, would increase the quality of the reviews due to an obvious incentive (i.e., the reputation of the reviewers). High quality reviews would automatically encourage the authors to submit good papers (because their reputation is at stake), which has the benefit of reducing the number of submissions, i.e., there are less papers to be reviewed. Could this be a solution? Would the benefits outweigh the side-effects? Could you come up with a potential issue that the three suggestions can cause? submitted by /u/Dry_Data [link] [comments]  ( 95 min )
    [P] Invitation - NeurIPS 2022 Weather4cast Competition for Super-Resolution Rain Movie Prediction under Spatio-temporal Shifts
    Hello! I would like to invite you to join our NeurIPS 2022 Weather4cast Competition for Super-Resolution Rain Movie Prediction under Spatio-temporal Shifts and predict future rain patterns with modern machine learning algorithms! The topical Weather4cast NeurIPS Competition has a high practical impact for society: Unusual weather is increasing all over the world, reflecting ongoing climate change, and affecting communities in agriculture, transport, public health, safety, etc. Apply spatio-temporal modelling to complex dynamic systems. Get access to unique large-scale data and demonstrate temporal and spatial transfer learning under strong distributional shifts. Predict future weather as measured by ground-based hi-res rain radar weather stations. In addition to movies comprising the rain radar maps to predict you get large-scale multi-band satellite sensor images for exploiting data fusion. Note that you will need to forecast hi-resolution rain from only the low-res satellite data, making this a super-resolution challenge. Please visit our website, join our forums to download our baseline models for an easy start! The total amount of prizes that you can win in this competition is 15k EUR. Competition website: weather4cast.ai Github repository: https://github.com/iarai/weather4cast-2022 We look forward to your submissions! Ob behalf of the Weather4cast Organisers, Aleksandra submitted by /u/ola0207 [link] [comments]  ( 89 min )
    [N] Implementing AI/ML Models From Scratch (stylepoint)
    Hi folks, I am stylepoint. Currently work as an ML Research Scientist in academia. I have recently created a YouTube channel and started the YouTube video series called "Implement," where I will be implementing software (duh!). I will first implement machine learning and deep learning algorithms and models from scratch (means using Python with numpy, but no other third-party libraries/modules). My assumptions for AI/ML videos: One is familiar with the theory One is familiar with Python This is since for AI/ML, Python is the language of choice for many (including myself) Due to these assumptions, videos tend to be approximately 10 minutes long. So it should not be too hard to digest. As of now, only one video has been released, but more will come. And we will definitely not limit ourselves to only implementing AI/ML models. I will be making videos about software engineering, software design, programming languages, math, emerging technologies, etc. as well! Just wanted to share this in case anyone finds the series interesting or helpful. Thanks y'all! Links: YouTube GitHub Web First video (Implement - Gaussian Naive Bayes) submitted by /u/itsstylepoint [link] [comments]  ( 92 min )
    [N] NVIDIA Hopper Sweeps AI Inference Benchmarks in MLPerf Debut
    The NVIDIA Hopper architecture delivered up to 4.5x more performance than NVIDIA Ampere architecture GPUs, which continue to provide overall leadership in MLPerf results. https://preview.redd.it/ys7gt1k54rm91.png?width=2048&format=png&auto=webp&s=38c9c3d40198216ba6f791de19ed11dce5f0cc0f Main Article: https://blogs.nvidia.com/blog/2022/09/08/hopper-mlperf-inference/ submitted by /u/rantana [link] [comments]  ( 90 min )
    [P] Docker alternative for AI/ML
    envd (ɪnˈvdɪ) provides an alternative to Docker for AI/ML applications. 🐍 Escape Dockerfile Hell - Develop with Python, save time on writing Dockerfiles, bash scripts, and Kubernetes YAML manifests ⏱️ Save you plenty of time - Build the environment up to 6x faster compared to Dockerfile v1. ☁️ Local & cloud - envd images are OCI compatible, integrate with Docker and Kubernetes seamlessly. 🔁 Repeatable builds & reproducible results - You can reproduce the same environment on your laptop, public cloud VMs, or Docker containers, without any changes in setup. https://i.redd.it/gckhx9n2nqm91.gif submitted by /u/gaocegege [link] [comments]  ( 94 min )
    [D] Video generation, improvement dimensions
    Following recent breakthroughs, I have been thinking on the improvement dimensions over the next years for video generated via AI, and came up with four major areas of development. The dates in parenthesis refer to when I currently believe the referred technologies will be available as a published, finished, and usable product, instead of codes, papers, beta software, or demos floating around. Also, NeRF just seems to be glorified photogrammetry to me, which at best would produce good conventional 3D models, but that just seems to be a subpar workflow compared to post processing on top of a a crude 3D base or just generating the videos from scratch. Tell me your own predictions for each category. Capacity Available (Q2 2024) Produces realistic and stylized videos in 720p resolution an…  ( 92 min )
  • Open

    Technical Debt In Machine Learning System – A Model Driven Perspective
    The aphorism acknowledges that models of our knowledge always fall short of the complexities of reality but can still be useful nonetheless. With this model background, let us delve into this article focusing on specific technical debt in Machine Learning System development. The post Technical Debt In Machine Learning System – A Model Driven Perspective appeared first on Data Science Central.  ( 24 min )
    Plagiarism in Scientific Research and How to Prevent it?
    In scientific research, plagiarism used to be a big problem. Before the advent of free and accessible plagiarism detection tools, people would actually steal the work of their peers and publish it in their name. As you can probably imagine, this was severely damaging to the progress of scientific research as plagiarists were undermining the… Read More »Plagiarism in Scientific Research and How to Prevent it? The post Plagiarism in Scientific Research and How to Prevent it? appeared first on Data Science Central.  ( 20 min )
  • Open

    Learning to Walk in the Wild from Terrain Semantics
    Posted by Yuxiang Yang, Student Researcher, Robotics at Google An important promise for quadrupedal robots is their potential to operate in complex outdoor environments that are difficult or inaccessible for humans. Whether it’s to find natural resources deep in the mountains, or to search for life signals in heavily-damaged earthquake sites, a robust and versatile quadrupedal robot could be very helpful. To achieve that, a robot needs to perceive the environment, understand its locomotion challenges, and adapt its locomotion skill accordingly. While recent advances in perceptive locomotion have greatly enhanced the capability of quadrupedal robots, most works focus on indoor or urban environments, thus they cannot effectively handle the complexity of off-road terrains. In these environme…  ( 26 min )
  • Open

    Shortest tours of Eurasia and Oceania
    This is the final post in a series of three posts about shortest tours, solutions to the so-called traveling salesmen problem. The first was a tour of Africa. Actually two tours, one for the continent and one for islands. See this post for the Mathematica code used to create the tours. The second was about […] Shortest tours of Eurasia and Oceania first appeared on John D. Cook.  ( 5 min )
    Three tours of the Americas
    The previous post looked at an optimal tour of continental Africa. This post will give analogous tours of continental North America, North American Islands, and South America. The next post looks at Eurasia and Oceania. North American Continent Here’s the North American continental tour. The order of the tour is as follows. Canada United States […] Three tours of the Americas first appeared on John D. Cook.  ( 4 min )
    A traveling salesman tour of Africa
    Suppose you’d like to tour Africa, visiting each country once, then returning to your starting point, minimizing the distance traveled. Here’s my first attempt at a solution using Mathematica, based on an example in the documentation for FindShortestTour. africa = CountryData["Africa"] FindShortestTour[africa] GeoGraphics[{Thick, Red, GeoPath[africa[[%[[2]]]]]}] This produced the following map: Hmm. Maybe I should have […] A traveling salesman tour of Africa first appeared on John D. Cook.  ( 5 min )
  • Open

    Auto-Annotate: Automatically annotate your entire image directory by a single command.
    submitted by /u/mdhmz1 [link] [comments]  ( 86 min )
  • Open

    "Generative Personas That Behave and Experience Like Humans", Barthet et al 2022
    submitted by /u/gwern [link] [comments]  ( 87 min )
    Need suggestion on conference submission
    My recent research is about a methodology that could be used in both online and offline RL in a unified approach and it does outperform several SOTA methods in some environments. However, very little math is involved, it is intuitive and straightforward. What conferences would be interested in study like this? (I will submit to ICLR but I have zero confidence, I guess the chance is slim to none.) submitted by /u/Blasphemer666 [link] [comments]  ( 88 min )
  • Open

    Deploy large models on Amazon SageMaker using DJLServing and DeepSpeed model parallel inference
    The last few years have seen rapid development in the field of natural language processing (NLP). Although hardware has improved, such as with the latest generation of accelerators from NVIDIA and Amazon, advanced machine learning (ML) practitioners still regularly encounter issues deploying their large language models. Today, we announce new capabilities in Amazon SageMaker that […]  ( 13 min )
    Tips to improve your Amazon Rekognition Custom Labels model
    In this post, we discuss best practices to improve the performance of your computer vision models using Amazon Rekognition Custom Labels. Rekognition Custom Labels is a fully managed service to build custom computer vision models for image classification and object detection use cases. Rekognition Custom Labels builds off of the pre-trained models in Amazon Rekognition, which […]  ( 7 min )
    Use ADFS OIDC as the IdP for an Amazon SageMaker Ground Truth private workforce
    To train a machine learning (ML) model, you need a large, high-quality, labeled dataset. Amazon SageMaker Ground Truth helps you build high-quality training datasets for your ML models. With Ground Truth, you can use workers from either Amazon Mechanical Turk, a vendor company of your choosing, or an internal, private workforce to enable you to […]  ( 8 min )
    How Amp on Amazon used data to increase customer engagement, Part 2: Building a personalized show recommendation platform using Amazon SageMaker
    Amp is a new live radio app from Amazon. With Amp, you can host your own radio show and play songs from the Amazon Music catalog, or tune in and listen to shows other Amp users are hosting. In an environment where content is plentiful and diverse, it’s important to tailor the user experience to […]  ( 9 min )
    How Amp on Amazon used data to increase customer engagement, Part 1: Building a data analytics platform
    Amp, the new live radio app from Amazon, is a reinvention of radio featuring human-curated live audio shows. It’s designed to provide a seamless customer experience to listeners and creators by debuting interactive live audio shows from your favorite artists, radio DJs, podcasters, and friends. However, as a new product in a new space for […]  ( 10 min )
    Build repeatable, secure, and extensible end-to-end machine learning workflows using Kubeflow on AWS
    This is a guest blog post cowritten with athenahealth. athenahealth a leading provider of network-enabled software and services for medical groups and health systems nationwide. Its electronic health records, revenue cycle management, and patient engagement tools allow anytime, anywhere access, driving better financial outcomes for its customers and enabling its provider customers to deliver better quality […]  ( 18 min )
  • Open

    Botober 2022: draw, human, draw!
    For a couple of years now I've been using neural networks to generate daily drawing prompts. With today's text-generating neural networks far too large to finetune on a list of existing prompts, I've turned to other methods. One method that works surprisingly well is  ( 8 min )
    Bonus: DaVinci's other drawing prompts
    AI Weirdness: the strange side of machine learning  ( 2 min )
  • Open

    How Content Moderation helps Businesses? How and where it can be done?
    Content Moderation means the moderation of the user-generated content which is getting published on various online platforms. The term…  ( 8 min )
  • Open

    Few-shot training LLMs for project-specific code-summarization. (arXiv:2207.04237v2 [cs.SE] UPDATED)
    Very large language models (LLMs), such as GPT-3 and Codex have achieved state-of-the-art performance on several natural-language tasks, and show great promise also for code. A particularly exciting aspect of LLMs is their knack for few-shot and zero-shot learning: they can learn to perform a task with very few examples. Few-shotting has particular synergies in software engineering, where there are a lot of phenomena (identifier names, APIs, terminology, coding patterns) that are known to be highly project-specific. However, project-specific data can be quite limited, especially early in the history of a project; thus the few-shot learning capacity of LLMs might be very relevant. In this paper, we investigate the use few-shot training with the very large GPT (Generative Pre-trained Transformer) Codex model, and find evidence suggesting that one can significantly surpass state-of-the-art models for code-summarization, leveraging project-specific training.  ( 2 min )
    Distributed Nonlinear State Estimation in Electric Power Systems using Graph Neural Networks. (arXiv:2207.11465v2 [cs.LG] UPDATED)
    Nonlinear state estimation (SE), with the goal of estimating complex bus voltages based on all types of measurements available in the power system, is usually solved using the iterative Gauss-Newton method. The nonlinear SE presents some difficulties when considering inputs from both phasor measurement units and supervisory control and data acquisition system. These include numerical instabilities, convergence time depending on the starting point of the iterative method, and the quadratic computational complexity of a single iteration regarding the number of state variables. This paper introduces an original graph neural network based SE implementation over the augmented factor graph of the nonlinear power system SE, capable of incorporating measurements on both branches and buses, as well as both phasor and legacy measurements. The proposed regression model has linear computational complexity during the inference time once trained, with a possibility of distributed implementation. Since the method is noniterative and non-matrix-based, it is resilient to the problems that the Gauss-Newton solver is prone to. Aside from prediction accuracy on the test set, the proposed model demonstrates robustness when simulating cyber attacks and unobservable scenarios due to communication irregularities. In those cases, prediction errors are sustained locally, with no effect on the rest of the power system's results.  ( 3 min )
    W-Transformers : A Wavelet-based Transformer Framework for Univariate Time Series Forecasting. (arXiv:2209.03945v1 [cs.LG])
    Deep learning utilizing transformers has recently achieved a lot of success in many vital areas such as natural language processing, computer vision, anomaly detection, and recommendation systems, among many others. Among several merits of transformers, the ability to capture long-range temporal dependencies and interactions is desirable for time series forecasting, leading to its progress in various time series applications. In this paper, we build a transformer model for non-stationary time series. The problem is challenging yet crucially important. We present a novel framework for univariate time series representation learning based on the wavelet-based transformer encoder architecture and call it W-Transformer. The proposed W-Transformers utilize a maximal overlap discrete wavelet transformation (MODWT) to the time series data and build local transformers on the decomposed datasets to vividly capture the nonstationarity and long-range nonlinear dependencies in the time series. Evaluating our framework on several publicly available benchmark time series datasets from various domains and with diverse characteristics, we demonstrate that it performs, on average, significantly better than the baseline forecasters for short-term and long-term forecasting, even for datasets that consist of only a few hundred training samples.  ( 2 min )
    Detecting Stance in Scientific Papers: Did we get more Negative Recently?. (arXiv:2202.13610v2 [cs.CL] CROSS LISTED)
    In this paper, we classify scientific articles in the domain of natural language processing (NLP) and machine learning (ML) into whether (i) they extend the current state-of-the-art by introduction of novel techniques which beat existing models or whether (ii) they mainly criticize the existing state-of-the-art, i.e., that it is deficient with respect to some property (e.g., wrong evaluation, wrong datasets, misleading task specification). We refer to contributions under (i) as having a \enquote{positive stance} and contributions under (ii) as having a \enquote{negative stance} (to related work). We annotate over 1.5k papers from NLP and ML to train a SciBERT based model to automatically predict the stance of a paper based on its title and abstract. We then analyze large-scale trends on over 41k papers from the last $\sim$35 years in NLP and ML, finding that papers have gotten substantially more positive over time, but negative papers also got more negative and we observe considerably more negative papers in recent years. Negative papers are also more influential in terms of citations they receive.  ( 2 min )
    IDIAPers @ Causal News Corpus 2022: Efficient Causal Relation Identification Through a Prompt-based Few-shot Approach. (arXiv:2209.03895v1 [cs.CL])
    In this paper, we describe our participation in the subtask 1 of CASE-2022, Event Causality Identification with Casual News Corpus. We address the Causal Relation Identification (CRI) task by exploiting a set of simple yet complementary techniques for fine-tuning language models (LMs) on a small number of annotated examples (i.e., a few-shot configuration). We follow a prompt-based prediction approach for fine-tuning LMs in which the CRI task is treated as a masked language modeling problem (MLM). This approach allows LMs natively pre-trained on MLM problems to directly generate textual responses to CRI-specific prompts. We compare the performance of this method against ensemble techniques trained on the entire dataset. Our best-performing submission was trained only with 256 instances per class, a small portion of the entire dataset, and yet was able to obtain the second-best precision (0.82), third-best accuracy (0.82), and an F1-score (0.85) very close to what was reported by the winner team (0.86).  ( 2 min )
    Deep Multi-Scale Representation Learning with Attention for Automatic Modulation Classification. (arXiv:2209.03764v1 [eess.SP])
    Currently, deep learning methods with stacking small size convolutional filters are widely used for automatic modulation classification (AMC). In this report, we find some experienced improvements by using large kernel size for convolutional deep convolution neural network based AMC, which is more efficient in extracting multi-scale features of the raw signal I/Q sequence data. Also, Squeeze-and-Excitation (SE) mechanisms can significantly help AMC networks to focus on the more important features of the signal. As a result, we propose a multi-scale feature network with large kernel size and SE mechanism (SE-MSFN) in this paper. SE-MSFN achieves state-of-the-art classification performance on the public well-known RADIOML 2018.01A dataset, with average classification accuracy of 64.50%, surpassing CLDNN by 1.42%, maximum classification accuracy of 98.5%, and an average classification accuracy of 85.53% in the lower SNR range 0dB to 10dB, surpassing CLDNN by 2.85%. In addition, we also verified that ensemble learning can help further improve classification performance. We hope this report can provide some references for developers and researchers in practical scenes.  ( 2 min )
    Sparsity in long-time control of neural ODEs. (arXiv:2102.13566v3 [cs.LG] UPDATED)
    We consider the neural ODE and optimal control perspective of supervised learning, with $\ell^1$-control penalties, where rather than only minimizing a final cost (the \emph{empirical risk}) for the state, we integrate this cost over the entire time horizon. We prove that any optimal control (for this cost) vanishes beyond some positive stopping time. When seen in the discrete-time context, this result entails an \emph{ordered} sparsity pattern for the parameters of the associated residual neural network: ordered in the sense that these parameters are all $0$ beyond a certain layer. Furthermore, we provide a polynomial stability estimate for the empirical risk with respect to the time horizon. This can be seen as a \emph{turnpike property}, for nonsmooth dynamics and functionals with $\ell^1$-penalties, and without any smallness assumptions on the data, both of which are new in the literature.  ( 2 min )
    E-LMC: Extended Linear Model of Coregionalization for Spatial Field Prediction. (arXiv:2203.00525v2 [cs.LG] UPDATED)
    Physical simulations based on partial differential equations typically generate spatial fields results, which are utilized to calculate specific properties of a system for engineering design and optimization. Due to the intensive computational burden of the simulations, a surrogate model mapping the low-dimensional inputs to the spatial fields are commonly built based on a relatively small dataset. To resolve the challenge of predicting the whole spatial field, the popular linear model of coregionalization (LMC) can disentangle complicated correlations within the high-dimensional spatial field outputs and deliver accurate predictions. However, LMC fails if the spatial field cannot be well approximated by a linear combination of base functions with latent processes. In this paper, we present the Extended Linear Model of Coregionalization (E-LMC) by introducing an invertible neural network to linearize the highly complex and nonlinear spatial fields so that the LMC can easily generalize to nonlinear problems while preserving the traceability and scalability. Several real-world applications demonstrate that E-LMC can exploit spatial correlations effectively, showing a maximum improvement of about 40% over the original LMC and outperforming the other state-of-the-art spatial field models.  ( 2 min )
    PixTrack: Precise 6DoF Object Pose Tracking using NeRF Templates and Feature-metric Alignment. (arXiv:2209.03910v1 [cs.CV])
    We present PixTrack, a vision based object pose tracking framework using novel view synthesis and deep feature-metric alignment. Our evaluations demonstrate that our method produces highly accurate, robust, and jitter-free 6DoF pose estimates of objects in RGB images without the need of any data annotation or trajectory smoothing. Our method is also computationally efficient making it easy to have multi-object tracking with no alteration to our method and just using CPU multiprocessing.  ( 2 min )
    reStructured Pre-training. (arXiv:2206.11147v2 [cs.CL] UPDATED)
    In this work, we try to decipher the internal connection of NLP technology development in the past decades, searching for essence, which rewards us with a (potential) new learning paradigm for NLP tasks, dubbed as reStructured Pre-training (RST). In such a paradigm, the role of data will be re-emphasized, and model pre-training and fine-tuning of downstream tasks are viewed as a process of data storing and accessing. Based on that, we operationalize the simple principle that a good storage mechanism should not only have the ability to cache a large amount of data but also consider the ease of access. We achieve this by pre-training models over restructured data that consist of a variety of valuable information instead of raw data after overcoming several engineering challenges. Experimentally, RST models not only surpass strong competitors (e.g., T0) on 52/55 popular datasets from a variety of NLP tasks, but also achieve superior performance in National College Entrance Examination - English (Gaokao-English),the most authoritative examination in China. Specifically, the proposed system Qin achieves 40 points higher than the average scores made by students and 15 points higher than GPT3 with 1/16 parameters. In particular, Qin gets a high score of 138.5 (the full mark is 150) in the 2018 English exam (national paper III). We have released the Gaokao Benchmark with an online submission platform. In addition, we test our model in the 2022 College Entrance Examination English that happened a few days ago (2022.06.08), and it gets a total score of 134 (v.s. GPT3's 108).  ( 3 min )
    FAT Forensics: A Python Toolbox for Implementing and Deploying Fairness, Accountability and Transparency Algorithms in Predictive Systems. (arXiv:2209.03805v1 [cs.LG])
    Predictive systems, in particular machine learning algorithms, can take important, and sometimes legally binding, decisions about our everyday life. In most cases, however, these systems and decisions are neither regulated nor certified. Given the potential harm that these algorithms can cause, their qualities such as fairness, accountability and transparency (FAT) are of paramount importance. To ensure high-quality, fair, transparent and reliable predictive systems, we developed an open source Python package called FAT Forensics. It can inspect important fairness, accountability and transparency aspects of predictive algorithms to automatically and objectively report them back to engineers and users of such systems. Our toolbox can evaluate all elements of a predictive pipeline: data (and their features), models and predictions. Published under the BSD 3-Clause open source licence, FAT Forensics is opened up for personal and commercial usage.  ( 2 min )
    Causal Forecasting:Generalization Bounds for Autoregressive Models. (arXiv:2111.09831v2 [stat.ML] UPDATED)
    Despite the increasing relevance of forecasting methods, causal implications of these algorithms remain largely unexplored. This is concerning considering that, even under simplifying assumptions such as causal sufficiency, the statistical risk of a model can differ significantly from its \textit{causal risk}. Here, we study the problem of \textit{causal generalization} -- generalizing from the observational to interventional distributions -- in forecasting. Our goal is to find answers to the question: How does the efficacy of an autoregressive (VAR) model in predicting statistical associations compare with its ability to predict under interventions? To this end, we introduce the framework of \textit{causal learning theory} for forecasting. Using this framework, we obtain a characterization of the difference between statistical and causal risks, which helps identify sources of divergence between them. Under causal sufficiency, the problem of causal generalization amounts to learning under covariate shifts, albeit with additional structure (restriction to interventional distributions under the VAR model). This structure allows us to obtain uniform convergence bounds on causal generalizability for the class of VAR models. To the best of our knowledge, this is the first work that provides theoretical guarantees for causal generalization in the time-series setting.  ( 2 min )
    Valuing Players Over Time. (arXiv:2209.03882v1 [cs.LG])
    In soccer (or association football), players quickly go from heroes to zeroes, or vice-versa. Performance is not a static measure but a somewhat volatile one. Analyzing performance as a time series rather than a stationary point in time is crucial to making better decisions. This paper introduces and explores I-VAEP and O-VAEP models to evaluate actions and rate players' intention and execution. Then, we analyze these ratings over time and propose use cases to fundament our option of treating player ratings as a continuous problem. As a result, we present who were the best players and how their performance evolved, define volatility metrics to measure a player's consistency, and build a player development curve to assist decision-making.  ( 2 min )
    Patient-specific modelling, simulation and real-time processing for respiratory diseases. (arXiv:2207.01082v5 [eess.IV] UPDATED)
    Asthma is a common chronic disease of the respiratory system causing significant disability and societal burden. It affects more than 300 million people worldwide, while more than 100 million people will likely have asthma by 2025. The price of asthma varies greatly from nation to nation. Mean yearly cost can be estimated to 1900 EUR in Europe and $3100 in the United States. Managing asthma involves controlling symptoms, preventing exacerbations, and maintaining lung function. Improved asthma control is reduces the risk of exacerbations and lung function impairment while reducing the direct costs of asthma care and indirect costs associated with reduced productivity. Understanding the complex dynamics of the pulmonary system and the lung's response to disease is fundamental to the advancement of Asthma treatment. Computational models of the respiratory system seek to provide a theoretical framework to understand the interaction between structure and function. Their application can improve pulmonary medicine by a patient-specific approach to medicinal methodologies optimizing the delivery given the personalized geometry and personalized ventilation patterns. A three-fold objective is addressed within this dissertation. The first part refers to the comprehension of pulmonary pathophysiology and the mechanics of Asthma and subsequently of constrictive pulmonary conditions in general. The second part refers to the design and implementation of tools that facilitate personalized medicine to improve delivery and effectiveness. Finally, the third part refers to the self-management of the condition, meaning that medical personnel and patients have access to tools and methods that allow the first party to easily track the course of the condition and the second party, i.e. the patient to easily self-manage it alleviating the significant burden from the health system.  ( 3 min )
    Discover and Mitigate Unknown Biases with Debiasing Alternate Networks. (arXiv:2207.10077v2 [cs.CV] UPDATED)
    Deep image classifiers have been found to learn biases from datasets. To mitigate the biases, most previous methods require labels of protected attributes (e.g., age, skin tone) as full-supervision, which has two limitations: 1) it is infeasible when the labels are unavailable; 2) they are incapable of mitigating unknown biases -- biases that humans do not preconceive. To resolve those problems, we propose Debiasing Alternate Networks (DebiAN), which comprises two networks -- a Discoverer and a Classifier. By training in an alternate manner, the discoverer tries to find multiple unknown biases of the classifier without any annotations of biases, and the classifier aims at unlearning the biases identified by the discoverer. While previous works evaluate debiasing results in terms of a single bias, we create Multi-Color MNIST dataset to better benchmark mitigation of multiple biases in a multi-bias setting, which not only reveals the problems in previous methods but also demonstrates the advantage of DebiAN in identifying and mitigating multiple biases simultaneously. We further conduct extensive experiments on real-world datasets, showing that the discoverer in DebiAN can identify unknown biases that may be hard to be found by humans. Regarding debiasing, DebiAN achieves strong bias mitigation performance.  ( 3 min )
    Global Context Vision Transformers. (arXiv:2206.09959v2 [cs.CV] UPDATED)
    We propose global context vision transformer (GC ViT), a novel architecture that enhances parameter and compute utilization. Our method leverages global context self-attention modules, joint with local self-attention, to effectively yet efficiently model both long and short-range spatial interactions, without the need for expensive operations such as computing attention masks or shifting local windows. In addition, we address the issue of lack of the inductive bias in ViTs via proposing to use a modified fused inverted residual blocks in our architecture. Our proposed GC ViT achieves state-of-the-art results across image classification, object detection and semantic segmentation tasks. On ImageNet-1K dataset for classification, the tiny, small and base variants of GC ViT with 28M, 51M and 90M parameters achieve 83.3%, 83.9% and 84.5% Top-1 accuracy, respectively, surpassing comparably-sized prior art such as CNN-based ConvNeXt and ViT-based Swin Transformer by a large margin. Pre-trained GC ViT backbones in downstream tasks of object detection, instance segmentation, and semantic segmentation using MS COCO and ADE20K datasets outperform prior work consistently, sometimes by large margins. Code available at https://github.com/NVlabs/GCViT.  ( 2 min )
    Feature Importance Guided Attack: A Model Agnostic Adversarial Attack. (arXiv:2106.14815v2 [cs.LG] UPDATED)
    Research in adversarial learning has primarily focused on homogeneous unstructured datasets, which often map into the problem space naturally. Inverting a feature space attack on heterogeneous datasets into the problem space is much more challenging, particularly the task of finding the perturbation to perform. This work presents a formal search strategy: the `Feature Importance Guided Attack' (FIGA), which finds perturbations in the feature space of heterogeneous tabular datasets to produce evasion attacks. We first demonstrate FIGA in the feature space and then in the problem space. FIGA assumes no prior knowledge of the defending model's learning algorithm and does not require any gradient information. FIGA assumes knowledge of the feature representation and the mean feature values of defending model's dataset. FIGA leverages feature importance rankings by perturbing the most important features of the input in the direction of the target class. While FIGA is conceptually similar to other work which uses feature selection processes (e.g., mimicry attacks), we formalize an attack algorithm with three tunable parameters and investigate the strength of FIGA on tabular datasets. We demonstrate the effectiveness of FIGA by evading phishing detection models trained on four different tabular phishing datasets and one financial dataset with an average success rate of 94%. We extend FIGA to the phishing problem space by limiting the possible perturbations to be valid and feasible in the phishing domain. We generate valid adversarial phishing sites that are visually identical to their unperturbed counterpart and use them to attack six tabular ML models achieving a 13.05% average success rate.  ( 3 min )
    Generalizability of Machine Learning Models: Quantitative Evaluation of Three Methodological Pitfalls. (arXiv:2202.01337v2 [cs.LG] UPDATED)
    Purpose: Despite the potential of machine learning models, the lack of generalizability has hindered their widespread adoption in clinical practice. We investigate three methodological pitfalls: (1) violation of independence assumption, (2) model evaluation with an inappropriate performance indicator or baseline for comparison, and (3) batch effect. Materials and Methods: Using several retrospective datasets, we implement machine learning models with and without the pitfalls to quantitatively illustrate these pitfalls' effect on model generalizability. Results: Violation of independence assumption, more specifically, applying oversampling, feature selection, and data augmentation before splitting data into train, validation, and test sets, respectively, led to misleading and superficial gains in F1 scores of 71.2% in predicting local recurrence and 5.0% in predicting 3-year overall survival in head and neck cancer as well as 46.0% in distinguishing histopathological patterns in lung cancer. Further, randomly distributing data points for a subject across training, validation, and test sets led to a 21.8% superficial increase in F1 score. Also, we showed the importance of the choice of performance measures and baseline for comparison. In the presence of batch effect, a model built for pneumonia detection led to F1 score of 98.7%. However, when the same model was applied to a new dataset of normal patients, it only correctly classified 3.86% of the samples. Conclusions: These methodological pitfalls cannot be captured using internal model evaluation, and the inaccurate predictions made by such models may lead to wrong conclusions and interpretations. Therefore, understanding and avoiding these pitfalls is necessary for developing generalizable models.  ( 3 min )
    SwiftAgg+: Achieving Asymptotically Optimal Communication Loads in Secure Aggregation for Federated Learning. (arXiv:2203.13060v3 [cs.IT] UPDATED)
    We propose SwiftAgg+, a novel secure aggregation protocol for federated learning systems, where a central server aggregates local models of $N \in \mathbb{N}$ distributed users, each of size $L \in \mathbb{N}$, trained on their local data, in a privacy-preserving manner. SwiftAgg+ can significantly reduce the communication overheads without any compromise on security, and achieve optimal communication loads within diminishing gaps. Specifically, in presence of at most $D=o(N)$ dropout users, SwiftAgg+ achieves a per-user communication load of $(1+\mathcal{O}(\frac{1}{N}))L$ symbols and a server communication load of $(1+\mathcal{O}(\frac{1}{N}))L$ symbols, with a worst-case information-theoretic security guarantee, against any subset of up to $T=o(N)$ semi-honest users who may also collude with the curious server. Moreover, the proposed SwiftAgg+ allows for a flexible trade-off between communication loads and the number of active communication links. In particular, for $T<N-D$ and for any $K\in\mathbb{N}$, SwiftAgg+ can achieve the server communication load of $(1+\frac{T}{K})L$ symbols, and per-user communication load of up to $(1+\frac{T+D}{K})L$ symbols, where the number of pair-wise active connections in the network is $\frac{N}{2}(K+T+D+1)$.  ( 3 min )
    Federated Learning for Short-term Residential Load Forecasting. (arXiv:2105.13325v2 [cs.LG] UPDATED)
    Load forecasting is an essential task performed within the energy industry to help balance supply with demand and maintain a stable load on the electricity grid. As supply transitions towards less reliable renewable energy generation, smart meters will prove a vital component to facilitate these forecasting tasks. However, smart meter adoption is low among privacy-conscious consumers that fear intrusion upon their fine-grained consumption data. In this work we propose and explore a federated learning (FL) based approach for training forecasting models in a distributed, collaborative manner whilst retaining the privacy of the underlying data. We compare two approaches: FL, and a clustered variant, FL+HC against a non-private, centralised learning approach and a fully private, localised learning approach. Within these approaches, we measure model performance using RMSE and computational efficiency. In addition, we suggest the FL strategies are followed by a personalisation step and show that model performance can be improved by doing so. We show that FL+HC followed by personalisation can achieve a $\sim$5\% improvement in model performance with a $\sim$10x reduction in computation compared to localised learning. Finally we provide advice on private aggregation of predictions for building a private end-to-end load forecasting application.
    MQRetNN: Multi-Horizon Time Series Forecasting with Retrieval Augmentation. (arXiv:2207.10517v2 [cs.LG] UPDATED)
    Multi-horizon probabilistic time series forecasting has wide applicability to real-world tasks such as demand forecasting. Recent work in neural time-series forecasting mainly focus on the use of Seq2Seq architectures. For example, MQTransformer - an improvement of MQCNN - has shown the state-of-the-art performance in probabilistic demand forecasting. In this paper, we consider incorporating cross-entity information to enhance model performance by adding a cross-entity attention mechanism along with a retrieval mechanism to select which entities to attend over. We demonstrate how our new neural architecture, MQRetNN, leverages the encoded contexts from a pretrained baseline model on the entire population to improve forecasting accuracy. Using MQCNN as the baseline model (due to computational constraints, we do not use MQTransformer), we first show on a small demand forecasting dataset that it is possible to achieve ~3% improvement in test loss by adding a cross-entity attention mechanism where each entity attends to all others in the population. We then evaluate the model with our proposed retrieval methods - as a means of approximating an attention over a large population - on a large-scale demand forecasting application with over 2 million products and observe ~1% performance gain over the MQCNN baseline.
    Responsibility: An Example-based Explainable AI approach via Training Process Inspection. (arXiv:2209.03433v1 [cs.LG])
    Explainable Artificial Intelligence (XAI) methods are intended to help human users better understand the decision making of an AI agent. However, many modern XAI approaches are unintuitive to end users, particularly those without prior AI or ML knowledge. In this paper, we present a novel XAI approach we call Responsibility that identifies the most responsible training example for a particular decision. This example can then be shown as an explanation: "this is what I (the AI) learned that led me to do that". We present experimental results across a number of domains along with the results of an Amazon Mechanical Turk user study, comparing responsibility and existing XAI methods on an image classification task. Our results demonstrate that responsibility can help improve accuracy for both human end users and secondary ML models.
    DIY-IPS: Towards an Off-the-Shelf Accurate Indoor Positioning System. (arXiv:2209.03613v1 [cs.NI])
    We present DIY-IPS - Do It Yourself - Indoor Positioning System, an open-source real-time indoor positioning mobile application. DIY-IPS detects users' indoor position by employing dual-band RSSI fingerprinting of available WiFi access points. The app can be used, without additional infrastructural costs, to detect users' indoor positions in real time. We published our app as an open source to save other researchers time recreating it. The app enables researchers/users to (1) collect indoor positioning datasets with a ground truth label, (2) customize the app for higher accuracy or other research purposes (3) test the accuracy of modified methods by live testing with ground truth. We ran preliminary experiments to demonstrate the effectiveness of the app.
    Transformer-based classification of premise in tweets related to COVID-19. (arXiv:2209.03851v1 [cs.CL])
    Automation of social network data assessment is one of the classic challenges of natural language processing. During the COVID-19 pandemic, mining people's stances from public messages have become crucial regarding understanding attitudes towards health orders. In this paper, the authors propose the predictive model based on transformer architecture to classify the presence of premise in Twitter texts. This work is completed as part of the Social Media Mining for Health (SMM4H) Workshop 2022. We explored modern transformer-based classifiers in order to construct the pipeline efficiently capturing tweets semantics. Our experiments on a Twitter dataset showed that RoBERTa is superior to the other transformer models in the case of the premise prediction task. The model achieved competitive performance with respect to ROC AUC value 0.807, and 0.7648 for the F1 score.
    Inapproximability of a Pair of Forms Defining a Partial Boolean Function. (arXiv:2102.04703v4 [cs.LG] UPDATED)
    We consider the problem of jointly minimizing forms of two Boolean functions $f, g \colon \{0,1\}^J \to \{0,1\}$ such that $f + g \leq 1$ and so as to separate disjoint sets $A \cup B \subseteq \{0,1\}^J$ such that $f(A) = \{1\}$ and $g(B) = \{1\}$. We hypothesize that this problem is easier to solve or approximate than the well-understood problem of minimizing the form of one Boolean function $h: \{0,1\}^J \to \{0,1\}$ such that $h(A) = \{1\}$ and $h(B) = \{0\}$. For a large class of forms, including binary decision trees and ordered binary decision diagrams, we refute this hypothesis. For disjunctive normal forms, we show that the problem is at least as hard as MIN-SET-COVER. For all these forms, we establish that no $o(\ln (|A| + |B| -1))$-approximation algorithm exists unless P$=$NP.
    FathomNet: A global image database for enabling artificial intelligence in the ocean. (arXiv:2109.14646v4 [cs.CV] UPDATED)
    The ocean is experiencing unprecedented rapid change, and visually monitoring marine biota at the spatiotemporal scales needed for responsible stewardship is a formidable task. As baselines are sought by the research community, the volume and rate of this required data collection rapidly outpaces our abilities to process and analyze them. Recent advances in machine learning enables fast, sophisticated analysis of visual data, but have had limited success in the ocean due to lack of data standardization, insufficient formatting, and demand for large, labeled datasets. To address this need, we built FathomNet, an open-source image database that standardizes and aggregates expertly curated labeled data. FathomNet has been seeded with existing iconic and non-iconic imagery of marine animals, underwater equipment, debris, and other concepts, and allows for future contributions from distributed data sources. We demonstrate how FathomNet data can be used to train and deploy models on other institutional video to reduce annotation effort, and enable automated tracking of underwater concepts when integrated with robotic vehicles. As FathomNet continues to grow and incorporate more labeled data from the community, we can accelerate the processing of visual data to achieve a healthy and sustainable global ocean.
    OmniXAI: A Library for Explainable AI. (arXiv:2206.01612v6 [cs.LG] UPDATED)
    We introduce OmniXAI (short for Omni eXplainable AI), an open-source Python library of eXplainable AI (XAI), which offers omni-way explainable AI capabilities and various interpretable machine learning techniques to address the pain points of understanding and interpreting the decisions made by machine learning (ML) in practice. OmniXAI aims to be a one-stop comprehensive library that makes explainable AI easy for data scientists, ML researchers and practitioners who need explanation for various types of data, models and explanation methods at different stages of ML process (data exploration, feature engineering, model development, evaluation, and decision-making, etc). In particular, our library includes a rich family of explanation methods integrated in a unified interface, which supports multiple data types (tabular data, images, texts, time-series), multiple types of ML models (traditional ML in Scikit-learn and deep learning models in PyTorch/TensorFlow), and a range of diverse explanation methods including "model-specific" and "model-agnostic" ones (such as feature-attribution explanation, counterfactual explanation, gradient-based explanation, etc). For practitioners, the library provides an easy-to-use unified interface to generate the explanations for their applications by only writing a few lines of codes, and also a GUI dashboard for visualization of different explanations for more insights about decisions. In this technical report, we present OmniXAI's design principles, system architectures, and major functionalities, and also demonstrate several example use cases across different types of data, tasks, and models.
    Combing for Credentials: Active Pattern Extraction from Smart Reply. (arXiv:2207.10802v2 [cs.CR] UPDATED)
    With the wide availability of large pre-trained language models such as GPT-2 and BERT, the recent trend has been to fine-tune a pre-trained model to achieve state-of-the-art performance on a downstream task. One natural example is the "Smart Reply" application where a pre-trained model is tuned to provide suggested responses for a given query message. Since these models are often tuned using sensitive data such as emails or chat transcripts, it is important to understand and mitigate the risk that the model leaks its tuning data. We investigate potential information leakage vulnerabilities in a typical Smart Reply pipeline and introduce a new type of active extraction attack that exploits canonical patterns in text containing sensitive data. We show experimentally that it is possible for an adversary to extract sensitive user information present in the training data. We explore potential mitigation strategies and demonstrate empirically how differential privacy appears to be an effective defense mechanism to such pattern extraction attacks.
    A Decentralized Federated Learning Framework via Committee Mechanism with Convergence Guarantee. (arXiv:2108.00365v2 [cs.LG] UPDATED)
    Federated learning allows multiple participants to collaboratively train an efficient model without exposing data privacy. However, this distributed machine learning training method is prone to attacks from Byzantine clients, which interfere with the training of the global model by modifying the model or uploading the false gradient. In this paper, we propose a novel serverless federated learning framework Committee Mechanism based Federated Learning (CMFL), which can ensure the robustness of the algorithm with convergence guarantee. In CMFL, a committee system is set up to screen the uploaded local gradients. The committee system selects the local gradients rated by the elected members for the aggregation procedure through the selection strategy, and replaces the committee member through the election strategy. Based on the different considerations of model performance and defense, two opposite selection strategies are designed for the sake of both accuracy and robustness. Extensive experiments illustrate that CMFL achieves faster convergence and better accuracy than the typical Federated Learning, in the meanwhile obtaining better robustness than the traditional Byzantine-tolerant algorithms, in the manner of a decentralized approach. In addition, we theoretically analyze and prove the convergence of CMFL under different election and selection strategies, which coincides with the experimental results.
    Multihop: Leveraging Complex Models to Learn Accurate Simple Models. (arXiv:2109.06961v2 [cs.LG] UPDATED)
    Knowledge transfer from a complex high performing model to a simpler and potentially low performing one in order to enhance its performance has been of great interest over the last few years as it finds applications in important problems such as explainable artificial intelligence, model compression, robust model building and learning from small data. Known approaches to this problem (viz. Knowledge Distillation, Model compression, ProfWeight, etc.) typically transfer information directly (i.e. in a single/one hop) from the complex model to the chosen simple model through schemes that modify the target or reweight training examples on which the simple model is trained. In this paper, we propose a meta-approach where we transfer information from the complex model to the simple model by dynamically selecting and/or constructing a sequence of intermediate models of decreasing complexity that are less intricate than the original complex model. Our approach can transfer information between consecutive models in the sequence using any of the previously mentioned approaches as well as work in 1-hop fashion, thus generalizing these approaches. In the experiments on real data, we observe that we get consistent gains for different choices of models over 1-hop, which on average is more than 2\% and reaches up to 8\% in a particular case. We also empirically analyze conditions under which the multi-hop approach is likely to be beneficial over the traditional 1-hop approach, and report other interesting insights. To the best of our knowledge, this is the first work that proposes such a multi-hop approach to perform knowledge transfer given a single high performing complex model, making it in our opinion, an important methodological contribution.
    Recovering network topology and dynamics via sequence characterization. (arXiv:2206.15190v2 [cs.SI] UPDATED)
    Sequences arise in many real-world scenarios; thus, identifying the mechanisms behind symbol generation is essential to understanding many complex systems. This paper analyzes sequences generated by agents walking on a networked topology. Given that in many real scenarios, the underlying processes generating the sequence is hidden, we investigate whether the reconstruction of the network via the co-occurrence method is useful to recover both the network topology and agent dynamics generating sequences. We found that the characterization of reconstructed networks provides valuable information regarding the process and topology used to create the sequences. In a machine learning approach considering 16 combinations of network topology and agent dynamics as classes, we obtained an accuracy of 87% with sequences generated with less than 40% of nodes visited. Larger sequences turned out to generate improved machine learning models. Our findings suggest that the proposed methodology could be extended to classify sequences and understand the mechanisms behind sequence generation.
    Stochastic Coded Federated Learning with Convergence and Privacy Guarantees. (arXiv:2201.10092v5 [cs.LG] UPDATED)
    Federated learning (FL) has attracted much attention as a privacy-preserving distributed machine learning framework, where many clients collaboratively train a machine learning model by exchanging model updates with a parameter server instead of sharing their raw data. Nevertheless, FL training suffers from slow convergence and unstable performance due to stragglers caused by the heterogeneous computational resources of clients and fluctuating communication rates. This paper proposes a coded FL framework to mitigate the straggler issue, namely stochastic coded federated learning (SCFL). In this framework, each client generates a privacy-preserving coded dataset by adding additive noise to the random linear combination of its local data. The server collects the coded datasets from all the clients to construct a composite dataset, which helps to compensate for the straggling effect. In the training process, the server as well as clients perform mini-batch stochastic gradient descent (SGD), and the server adds a make-up term in model aggregation to obtain unbiased gradient estimates. We characterize the privacy guarantee by the mutual information differential privacy (MI-DP) and analyze the convergence performance in federated learning. Besides, we demonstrate a privacy-performance tradeoff of the proposed SCFL method by analyzing the influence of the privacy constraint on the convergence rate. Finally, numerical experiments corroborate our analysis and show the benefits of SCFL in achieving fast convergence while preserving data privacy.
    Zero Pixel Directional Boundary by Vector Transform. (arXiv:2203.08795v2 [cs.CV] UPDATED)
    Boundaries are among the primary visual cues used by human and computer vision systems. One of the key problems in boundary detection is the label representation, which typically leads to class imbalance and, as a consequence, to thick boundaries that require non-differential post-processing steps to be thinned. In this paper, we re-interpret boundaries as 1-D surfaces and formulate a one-to-one vector transform function that allows for training of boundary prediction completely avoiding the class imbalance issue. Specifically, we define the boundary representation at any point as the unit vector pointing to the closest boundary surface. Our problem formulation leads to the estimation of direction as well as richer contextual information of the boundary, and, if desired, the availability of zero-pixel thin boundaries also at training time. Our method uses no hyper-parameter in the training loss and a fixed stable hyper-parameter at inference. We provide theoretical justification/discussions of the vector transform representation. We evaluate the proposed loss method using a standard architecture and show the excellent performance over other losses and representations on several datasets. Code is available at https://github.com/edomel/BoundaryVT.
    Position Aided Beam Prediction in the Real World: How Useful GPS Locations Actually Are?. (arXiv:2205.09054v5 [eess.SP] UPDATED)
    Millimeter-wave (mmWave) communication systems rely on narrow beams for achieving sufficient receive signal power. Adjusting these beams is typically associated with large training overhead, which becomes particularly critical for highly-mobile applications. Intuitively, since optimal beam selection can benefit from the knowledge of the positions of communication terminals, there has been increasing interest in leveraging position data to reduce the overhead in mmWave beam prediction. Prior work, however, studied this problem using only synthetic data that generally does not accurately represent real-world measurements. In this paper, we investigate position-aided beam prediction using a real-world large-scale dataset to derive insights into precisely how much overhead can be saved in practice. Furthermore, we analyze which machine learning algorithms perform best, what factors degrade inference performance in real data, and which machine learning metrics are more meaningful in capturing the actual communication system performance.
    Differentially Empirical Risk Minimization under the Fairness Lens. (arXiv:2106.02674v2 [cs.LG] UPDATED)
    Differential Privacy (DP) is an important privacy-enhancing technology for private machine learning systems. It allows to measure and bound the risk associated with an individual participation in a computation. However, it was recently observed that DP learning systems may exacerbate bias and unfairness for different groups of individuals. This paper builds on these important observations and sheds light on the causes of the disparate impacts arising in the problem of differentially private empirical risk minimization. It focuses on the accuracy disparity arising among groups of individuals in two well-studied DP learning methods: output perturbation and differentially private stochastic gradient descent. The paper analyzes which data and model properties are responsible for the disproportionate impacts, why these aspects are affecting different groups disproportionately and proposes guidelines to mitigate these effects. The proposed approach is evaluated on several datasets and settings.
    Bayesian regularization of empirical MDPs. (arXiv:2208.02362v2 [cs.LG] UPDATED)
    In most applications of model-based Markov decision processes, the parameters for the unknown underlying model are often estimated from the empirical data. Due to noise, the policy learnedfrom the estimated model is often far from the optimal policy of the underlying model. When applied to the environment of the underlying model, the learned policy results in suboptimal performance, thus calling for solutions with better generalization performance. In this work we take a Bayesian perspective and regularize the objective function of the Markov decision process with prior information in order to obtain more robust policies. Two approaches are proposed, one based on $L^1$ regularization and the other on relative entropic regularization. We evaluate our proposed algorithms on synthetic simulations and on real-world search logs of a large scale online shopping store. Our results demonstrate the robustness of regularized MDP policies against the noise present in the models.
    NeuralFMU: Presenting a workflow for integrating hybrid NeuralODEs into real world applications. (arXiv:2209.03933v1 [cs.LG])
    The term NeuralODE describes the structural combination of an Artifical Neural Network (ANN) and a numerical solver for Ordinary Differential Equations (ODEs), the former acts as the right-hand side of the ODE to be solved. This concept was further extended by a black-box model in the form of a Functional Mock-up Unit (FMU) to obtain a subclass of NeuralODEs, named NeuralFMUs. The resulting structure features the advantages of first-principle and data-driven modeling approaches in one single simulation model: A higher prediction accuracy compared to conventional First Principle Models (FPMs), while also a lower training effort compared to purely data-driven models. We present an intuitive workflow to setup and use NeuralFMUs, enabling the encapsulation and reuse of existing conventional models exported from common modeling tools. Moreover, we exemplify this concept by deploying a NeuralFMU for a consumption simulation based on a Vehicle Longitudinal Dynamics Model (VLDM), which is a typical use case in automotive industry. Related challenges that are often neglected in scientific use cases, like real measurements (e.g. noise), an unknown system state or high-frequent discontinuities, are handled in this contribution. For the aim to build a hybrid model with a higher prediction quality than the original FPM, we briefly highlight two open-source libraries: FMI.jl for integrating FMUs into the Julia programming environment, as well as an extension to this library called FMIFlux.jl, that allows for the integration of FMUs into a neural network topology to finally obtain a NeuralFMU.
    Meta Clustering for Collaborative Learning. (arXiv:2006.00082v2 [cs.LG] UPDATED)
    In collaborative learning, learners coordinate to enhance each of their learning performances. From the perspective of any learner, a critical challenge is to filter out unqualified collaborators. We propose a framework named meta clustering to address the challenge. Unlike the classical problem of clustering data points, meta clustering categorizes learners. Assuming each learner performs a supervised regression on a standalone local dataset, we propose a Select-Exchange-Cluster (SEC) method to classify the learners by their underlying supervised functions. We theoretically show that the SEC can cluster learners into accurate collaboration sets. Empirical studies corroborate the theoretical analysis and demonstrate that SEC can be computationally efficient, robust against learner heterogeneity, and effective in enhancing single-learner performance. Also, we show how the proposed approach may be used to enhance data fairness. Supplementary materials for this article are available online.
    PredDiff: Explanations and Interactions from Conditional Expectations. (arXiv:2102.13519v4 [cs.LG] UPDATED)
    PredDiff is a model-agnostic, local attribution method that is firmly rooted in probability theory. Its simple intuition is to measure prediction changes while marginalizing features. In this work, we clarify properties of PredDiff and its close connection to Shapley values. We stress important differences between classification and regression, which require a specific treatment within both formalisms. We extend PredDiff by introducing a new, well-founded measure for interaction effects between arbitrary feature subsets. The study of interaction effects represents an inevitable step towards a comprehensive understanding of black-box models and is particularly important for science applications. Equipped with our novel interaction measure, PredDiff is a promising model-agnostic approach for obtaining reliable, numerically inexpensive and theoretically sound attributions.
    Modified DDPG car-following model with a real-world human driving experience with CARLA simulator. (arXiv:2112.14602v3 [cs.RO] UPDATED)
    In the autonomous driving field, fusion of human knowledge into Deep Reinforcement Learning (DRL) is often based on the human demonstration recorded in a simulated environment. This limits the generalization and the feasibility of application in real-world traffic. We propose a two-stage DRL method to train a car-following agent, that modifies the policy by leveraging the real-world human driving experience and achieves performance superior to the pure DRL agent. Training a DRL agent is done within CARLA framework with Robot Operating System (ROS). For evaluation, we designed different driving scenarios to compare the proposed two-stage DRL car-following agent with other agents. After extracting the "good" behavior from the human driver, the agent becomes more efficient and reasonable, which makes this autonomous agent more suitable for Human-Robot Interaction (HRI) traffic.
    Sub 8-Bit Quantization of Streaming Keyword Spotting Models for Embedded Chipsets. (arXiv:2207.06920v2 [cs.SD] UPDATED)
    We propose a novel 2-stage sub 8-bit quantization aware training algorithm for all components of a 250K parameter feedforward, streaming, state-free keyword spotting model. For the 1st-stage, we adapt a recently proposed quantization technique using a non-linear transformation with tanh(.) on dense layer weights. In the 2nd-stage, we use linear quantization methods on the rest of the network, including other parameters (bias, gain, batchnorm), inputs, and activations. We conduct large scale experiments, training on 26,000 hours of de-identified production, far-field and near-field audio data (evaluating on 4,000 hours of data). We organize our results in two embedded chipset settings: a) with commodity ARM NEON instruction set and 8-bit containers, we present accuracy, CPU, and memory results using sub 8-bit weights (4, 5, 8-bit) and 8-bit quantization of rest of the network; b) with off-the-shelf neural network accelerators, for a range of weight bit widths (1 and 5-bit), while presenting accuracy results, we project reduction in memory utilization. In both configurations, our results show that the proposed algorithm can achieve: a) parity with a full floating point model's operating point on a detection error tradeoff (DET) curve in terms of false detection rate (FDR) at false rejection rate (FRR); b) significant reduction in compute and memory, yielding up to 3 times improvement in CPU consumption and more than 4 times improvement in memory consumption.
    Fixed Points of Cone Mapping with the Application to Neural Networks. (arXiv:2207.09947v2 [math.DS] UPDATED)
    We derive conditions for the existence of fixed points of cone mappings without assuming scalability of functions. Monotonicity and scalability are often inseparable in the literature in the context of searching for fixed points of interference mappings. In applications, such mappings are approximated by non-negative neural networks. It turns out, however, that the process of training non-negative networks requires imposing an artificial constraint on the weights of the model. However, in the case of specific non-negative data, it cannot be said that if the mapping is non-negative, it has only non-negative weights. Therefore, we considered the problem of the existence of fixed points for general neural networks, assuming the conditions of tangency conditions with respect to specific cones. This does not relax the physical assumptions, because even assuming that the input and output are to be non-negative, the weights can have (small, but) less than zero values. Such properties (often found in papers on the interpretability of weights of neural networks) lead to the weakening of the assumptions about the monotonicity or scalability of the mapping associated with the neural network. To the best of our knowledge, this paper is the first to study this phenomenon.
    Actionable Guidance for High-Consequence AI Risk Management: Towards Standards Addressing AI Catastrophic Risks. (arXiv:2206.08966v2 [cs.CY] UPDATED)
    Artificial intelligence (AI) systems can provide many beneficial capabilities but also risks of adverse events. Some AI systems could present risks of events with very high or catastrophic consequences at societal scale. The US National Institute of Standards and Technology (NIST) is developing the NIST Artificial Intelligence Risk Management Framework (AI RMF) as voluntary guidance on AI risk assessment and management for AI developers and others. For addressing risks of events with catastrophic consequences, NIST indicated a need to translate from high level principles to actionable risk management guidance. In this document, we provide detailed actionable-guidance recommendations focused on identifying and managing risks of events with very high or catastrophic consequences, intended as a risk management practices resource for NIST for AI RMF version 1.0 (scheduled for release in early 2023), or for AI RMF users, or for other AI risk management guidance and standards as appropriate. We also provide our methodology for our recommendations. We provide actionable-guidance recommendations for AI RMF 1.0 on: identifying risks from potential unintended uses and misuses of AI systems; including catastrophic-risk factors within the scope of risk assessments and impact assessments; identifying and mitigating human rights harms; and reporting information on AI risk factors including catastrophic-risk factors. In addition, we provide recommendations on additional issues for a roadmap for later versions of the AI RMF or supplementary publications. These include: providing an AI RMF Profile with supplementary guidance for cutting-edge increasingly multi-purpose or general-purpose AI. We aim for this work to be a concrete risk-management practices contribution, and to stimulate constructive dialogue on how to address catastrophic risks and associated issues in AI standards.
    HiFi++: a Unified Framework for Bandwidth Extension and Speech Enhancement. (arXiv:2203.13086v2 [cs.SD] UPDATED)
    Generative adversarial networks have recently demonstrated outstanding performance in neural vocoding outperforming best autoregressive and flow-based models. In this paper, we show that this success can be extended to other tasks of conditional audio generation. In particular, building upon HiFi vocoders, we propose a novel HiFi++ general framework for bandwidth extension and speech enhancement. We show that with the improved generator architecture and simplified multi-discriminator training, HiFi++ performs better or on par with the state-of-the-art in these tasks while spending significantly less computational resources. The effectiveness of our approach is validated through a series of extensive experiments.
    A multi view multi stage and multi window framework for pulmonary artery segmentation from CT scans. (arXiv:2209.03918v1 [eess.IV])
    This is the technical report of the 9th place in the final result of PARSE2022 Challenge. We solve the segmentation problem of the pulmonary artery by using a two-stage method based on a 3D CNN network. The coarse model is used to locate the ROI, and the fine model is used to refine the segmentation result. In addition, in order to improve the segmentation performance, we adopt multi-view and multi-window level method, at the same time we employ a fine-tune strategy to mitigate the impact of inconsistent labeling.
    T$^2$LR-Net: An Unrolling Reconstruction Network Learning Transformed Tensor Low-Rank prior for Dynamic MR Imaging. (arXiv:2209.03832v1 [eess.IV])
    While the methods exploiting the tensor low-rank prior are booming in high-dimensional data processing and have obtained satisfying performance, their applications in dynamic magnetic resonance (MR) image reconstruction are limited. In this paper, we concentrate on the tensor singular value decomposition (t-SVD), which is based on the Fast Fourier Transform (FFT) and only provides the definite and limited tensor low-rank prior in the FFT domain, heavily reliant upon how closely the data and the FFT domain match up. By generalizing the FFT into an arbitrary unitary transformation of the transformed t-SVD and proposing the transformed tensor nuclear norm (TTNN), we introduce a flexible model based on TTNN with the ability to exploit the tensor low-rank prior of a transformed domain in a larger transformation space and elaborately design an iterative optimization algorithm based on the alternating direction method of multipliers (ADMM), which is further unrolled into a model-based deep unrolling reconstruction network to learn the transformed tensor low-rank prior (T$^2$LR-Net). The convolutional neural network (CNN) is incorporated within the T$^2$LR-Net to learn the best-matched transform from the dynamic MR image dataset. The unrolling reconstruction network also provides a new perspective on the low-rank prior utilization by exploiting the low-rank prior in the CNN-extracted feature domain. Experimental results on two cardiac cine MR datasets demonstrate that the proposed framework can provide improved recovery results compared with the state-of-the-art optimization-based and unrolling network-based methods.
    Sequential Information Design: Learning to Persuade in the Dark. (arXiv:2209.03927v1 [cs.LG])
    We study a repeated information design problem faced by an informed sender who tries to influence the behavior of a self-interested receiver. We consider settings where the receiver faces a sequential decision making (SDM) problem. At each round, the sender observes the realizations of random events in the SDM problem. This begets the challenge of how to incrementally disclose such information to the receiver to persuade them to follow (desirable) action recommendations. We study the case in which the sender does not know random events probabilities, and, thus, they have to gradually learn them while persuading the receiver. We start by providing a non-trivial polytopal approximation of the set of sender's persuasive information structures. This is crucial to design efficient learning algorithms. Next, we prove a negative result: no learning algorithm can be persuasive. Thus, we relax persuasiveness requirements by focusing on algorithms that guarantee that the receiver's regret in following recommendations grows sub-linearly. In the full-feedback setting -- where the sender observes all random events realizations -- , we provide an algorithm with $\tilde{O}(\sqrt{T})$ regret for both the sender and the receiver. Instead, in the bandit-feedback setting -- where the sender only observes the realizations of random events actually occurring in the SDM problem -- , we design an algorithm that, given an $\alpha \in [1/2, 1]$ as input, ensures $\tilde{O}({T^\alpha})$ and $\tilde{O}( T^{\max \{ \alpha, 1-\frac{\alpha}{2} \} })$ regrets, for the sender and the receiver respectively. This result is complemented by a lower bound showing that such a regrets trade-off is essentially tight.
    Applying Transformer-based Text Summarization for Keyphrase Generation. (arXiv:2209.03791v1 [cs.CL])
    Keyphrases are crucial for searching and systematizing scholarly documents. Most current methods for keyphrase extraction are aimed at the extraction of the most significant words in the text. But in practice, the list of keyphrases often includes words that do not appear in the text explicitly. In this case, the list of keyphrases represents an abstractive summary of the source text. In this paper, we experiment with popular transformer-based models for abstractive text summarization using four benchmark datasets for keyphrase extraction. We compare the results obtained with the results of common unsupervised and supervised methods for keyphrase extraction. Our evaluation shows that summarization models are quite effective in generating keyphrases in the terms of the full-match F1-score and BERTScore. However, they produce a lot of words that are absent in the author's list of keyphrases, which makes summarization models ineffective in terms of ROUGE-1. We also investigate several ordering strategies to concatenate target keyphrases. The results showed that the choice of strategy affects the performance of keyphrase generation.
    Text-Free Learning of a Natural Language Interface for Pretrained Face Generators. (arXiv:2209.03953v1 [cs.CV])
    We propose Fast text2StyleGAN, a natural language interface that adapts pre-trained GANs for text-guided human face synthesis. Leveraging the recent advances in Contrastive Language-Image Pre-training (CLIP), no text data is required during training. Fast text2StyleGAN is formulated as a conditional variational autoencoder (CVAE) that provides extra control and diversity to the generated images at test time. Our model does not require re-training or fine-tuning of the GANs or CLIP when encountering new text prompts. In contrast to prior work, we do not rely on optimization at test time, making our method orders of magnitude faster than prior work. Empirically, on FFHQ dataset, our method offers faster and more accurate generation of images from natural language descriptions with varying levels of detail compared to prior work.
    Stochastic gradient descent with gradient estimator for categorical features. (arXiv:2209.03771v1 [cs.LG])
    Categorical data are present in key areas such as health or supply chain, and this data require specific treatment. In order to apply recent machine learning models on such data, encoding is needed. In order to build interpretable models, one-hot encoding is still a very good solution, but such encoding creates sparse data. Gradient estimators are not suited for sparse data: the gradient is mainly considered as zero while it simply does not always exists, thus a novel gradient estimator is introduced. We show what this estimator minimizes in theory and show its efficiency on different datasets with multiple model architectures. This new estimator performs better than common estimators under similar settings. A real world retail dataset is also released after anonymization. Overall, the aim of this paper is to thoroughly consider categorical data and adapt models and optimizers to these key features.
    ReX: A Framework for Generating Local Explanations to Recurrent Neural Networks. (arXiv:2209.03798v1 [cs.LG])
    We propose a general framework to adapt various local explanation techniques to recurrent neural networks. In particular, our explanations add temporal information, which expand explanations generated from existing techniques to cover data points that have different lengths compared to the original input data point. Our approach is general as it only modifies the perturbation model and feature representation of existing techniques without touching their core algorithms. We have instantiated our approach on LIME and Anchors. Our empirical evaluation shows that it effectively improves the usefulness of explanations generated by these two techniques on a sentiment analysis network and an anomaly detection network.
    Meta Discovery: Learning to Discover Novel Classes given Very Limited Data. (arXiv:2102.04002v4 [cs.LG] UPDATED)
    In novel class discovery (NCD), we are given labeled data from seen classes and unlabeled data from unseen classes, and we train clustering models for the unseen classes. However, the implicit assumptions behind NCD are still unclear. In this paper, we demystify assumptions behind NCD and find that high-level semantic features should be shared among the seen and unseen classes. Based on this finding, NCD is theoretically solvable under certain assumptions and can be naturally linked to meta-learning that has exactly the same assumption as NCD. Thus, we can empirically solve the NCD problem by meta-learning algorithms after slight modifications. This meta-learning-based methodology significantly reduces the amount of unlabeled data needed for training and makes it more practical, as demonstrated in experiments. The use of very limited data is also justified by the application scenario of NCD: since it is unnatural to label only seen-class data, NCD is sampling instead of labeling in causality. Therefore, unseen-class data should be collected on the way of collecting seen-class data, which is why they are novel and first need to be clustered.
    Data Feedback Loops: Model-driven Amplification of Dataset Biases. (arXiv:2209.03942v1 [cs.LG])
    Datasets scraped from the internet have been critical to the successes of large-scale machine learning. Yet, this very success puts the utility of future internet-derived datasets at potential risk, as model outputs begin to replace human annotations as a source of supervision. In this work, we first formalize a system where interactions with one model are recorded as history and scraped as training data in the future. We then analyze its stability over time by tracking changes to a test-time bias statistic (e.g. gender bias of model predictions). We find that the degree of bias amplification is closely linked to whether the model's outputs behave like samples from the training distribution, a behavior which we characterize and define as consistent calibration. Experiments in three conditional prediction scenarios - image classification, visual role-labeling, and language generation - demonstrate that models that exhibit a sampling-like behavior are more calibrated and thus more stable. Based on this insight, we propose an intervention to help calibrate and stabilize unstable feedback systems. Code is available at https://github.com/rtaori/data_feedback.
    Histogram Layers for Synthetic Aperture Sonar Imagery. (arXiv:2209.03878v1 [cs.CV])
    Synthetic aperture sonar (SAS) imagery is crucial for several applications, including target recognition and environmental segmentation. Deep learning models have led to much success in SAS analysis; however, the features extracted by these approaches may not be suitable for capturing certain textural information. To address this problem, we present a novel application of histogram layers on SAS imagery. The addition of histogram layer(s) within the deep learning models improved performance by incorporating statistical texture information on both synthetic and real-world datasets.
    End-to-end Robustness for Sensing-Reasoning Machine Learning Pipelines. (arXiv:2003.00120v4 [cs.LG] UPDATED)
    Intensive algorithmic efforts have been made to enable the rapid improvements of certificated robustness for complex ML models recently. However, current robustness certification methods are only able to certify under a limited perturbation radius. Given that existing pure data-driven statistical approaches have reached a bottleneck, in this paper, we propose to integrate statistical ML models with knowledge (expressed as logical rules) as a reasoning component using Markov logic networks (MLN, so as to further improve the overall certified robustness. This opens new research questions about certifying the robustness of such a paradigm, especially the reasoning component (e.g., MLN). As the first step towards understanding these questions, we first prove that the computational complexity of certifying the robustness of MLN is #P-hard. Guided by this hardness result, we then derive the first certified robustness bound for MLN by carefully analyzing different model regimes. Finally, we conduct extensive experiments on five datasets including both high-dimensional images and natural language texts, and we show that the certified robustness with knowledge-based logical reasoning indeed significantly outperforms that of the state-of-the-art.
    Tuning arrays with rays: Physics-informed tuning of quantum dot charge states. (arXiv:2209.03837v1 [cond-mat.mes-hall])
    Quantum computers based on gate-defined quantum dots (QDs) are expected to scale. However, as the number of qubits increases, the burden of manually calibrating these systems becomes unreasonable and autonomous tuning must be used. There have been a range of recent demonstrations of automated tuning of various QD parameters such as coarse gate ranges, global state topology (e.g. single QD, double QD), charge, and tunnel coupling with a variety of methods. Here, we demonstrate an intuitive, reliable, and data-efficient set of tools for automated global state and charge tuning in a framework deemed physics-informed tuning (PIT). The first module of PIT is an action-based algorithm that combines a machine learning (ML) classifier with physics knowledge to navigate to a target global state. The second module uses a series of one-dimensional measurements to tune to a target charge state by first emptying the QDs of charge, followed by calibrating capacitive couplings, and navigating to the target charge state. The success rate for the action-based tuning consistently surpasses $95~\%$ on both simulated and experimental data suitable for off-line testing. The success rate for charge setting is comparable when testing with simulated data, at $95.5(5.4)~\%$, and only slightly worse for off-line experimental tests, with an average of $89.7(17.4)~\%$ (median $97.5~\%$). It's noteworthy that the high performance is demonstrated both on data from samples fabricated in an academic cleanroom as well as on an industrial 300 mm process line, further underlining the device-agnosticity of PIT. Together, these tests on a range of simulated and experimental devices demonstrate the effectiveness and robustness of PIT.
    FADE: Enabling Large-Scale Federated Adversarial Training on Resource-Constrained Edge Devices. (arXiv:2209.03839v1 [cs.LG])
    Adversarial Training (AT) has been proven to be an effective method of introducing strong adversarial robustness into deep neural networks. However, the high computational cost of AT prohibits the deployment of large-scale AT on resource-constrained edge devices, e.g., with limited computing power and small memory footprint, in Federated Learning (FL) applications. Very few previous studies have tried to tackle these constraints in FL at the same time. In this paper, we propose a new framework named Federated Adversarial Decoupled Learning (FADE) to enable AT on resource-constrained edge devices in FL. FADE reduces the computation and memory usage by applying Decoupled Greedy Learning (DGL) to federated adversarial training such that each client only needs to perform AT on a small module of the entire model in each communication round. In addition, we improve vanilla DGL by adding an auxiliary weight decay to alleviate objective inconsistency and achieve better performance. FADE offers a theoretical guarantee for the adversarial robustness and convergence. The experimental results also show that FADE can significantly reduce the computing resources consumed by AT while maintaining almost the same accuracy and robustness as fully joint training.
    Mean Field Games on Weighted and Directed Graphs via Colored Digraphons. (arXiv:2209.03887v1 [cs.MA])
    The field of multi-agent reinforcement learning (MARL) has made considerable progress towards controlling challenging multi-agent systems by employing various learning methods. Numerous of these approaches focus on empirical and algorithmic aspects of the MARL problems and lack a rigorous theoretical foundation. Graphon mean field games (GMFGs) on the other hand provide a scalable and mathematically well-founded approach to learning problems that involve a large number of connected agents. In standard GMFGs, the connections between agents are undirected, unweighted and invariant over time. Our paper introduces colored digraphon mean field games (CDMFGs) which allow for weighted and directed links between agents that are also adaptive over time. Thus, CDMFGs are able to model more complex connections than standard GMFGs. Besides a rigorous theoretical analysis including both existence and convergence guarantees, we provide a learning scheme and illustrate our findings with an epidemics model and a model of the systemic risk in financial markets.
    Multiobjective Ranking and Selection Using Stochastic Kriging. (arXiv:2209.03919v1 [stat.ML])
    We consider multiobjective simulation optimization problems, where several conflicting objectives are optimized simultaneously, and can only be observed via stochastic simulation. The goal is to find or approximate a (discrete) set of Pareto-optimal solutions that reveal the essential trade-offs between the objectives, where optimality means that no objective can be improved without deteriorating the quality of any other objective. The noise in the observed performance may lead to two possible misclassification errors: solutions that are truly Pareto-optimal can be wrongly considered dominated, and solutions that are truly dominated can be wrongly considered Pareto-optimal. We propose a Bayesian multiobjective ranking and selection method to reduce the number of errors when identifying the solutions with the true best expected performance. We use stochastic kriging metamodels to build reliable predictive distributions of the objectives, and exploit this information in two efficient screening procedures and two novel sampling criteria. We use these in a sequential sampling algorithm to decide how to allocate samples. Experimental results show that the proposed method only requires a small fraction of samples compared to the standard allocation method, and it's competitive against the state-of-the-art, with the exploitation of the correlation structure being the dominant contributor to the improvement.
    Toward Robust Autotuning of Noisy Quantum Dot Devices. (arXiv:2108.00043v3 [quant-ph] UPDATED)
    The current autotuning approaches for quantum dot (QD) devices, while showing some success, lack an assessment of data reliability. This leads to unexpected failures when noisy or otherwise low-quality data is processed by an autonomous system. In this work, we propose a framework for robust autotuning of QD devices that combines a machine learning (ML) state classifier with a data quality control module. The data quality control module acts as a "gatekeeper" system, ensuring that only reliable data are processed by the state classifier. Lower data quality results in either device recalibration or termination. To train both ML systems, we enhance the QD simulation by incorporating synthetic noise typical of QD experiments. We confirm that the inclusion of synthetic noise in the training of the state classifier significantly improves the performance, resulting in an accuracy of 95.0(9) % when tested on experimental data. We then validate the functionality of the data quality control module by showing that the state classifier performance deteriorates with decreasing data quality, as expected. Our results establish a robust and flexible ML framework for autonomous tuning of noisy QD devices.
    Regularized Frank-Wolfe for Dense CRFs: Generalizing Mean Field and Beyond. (arXiv:2110.14759v2 [cs.LG] UPDATED)
    We introduce regularized Frank-Wolfe, a general and effective algorithm for inference and learning of dense conditional random fields (CRFs). The algorithm optimizes a nonconvex continuous relaxation of the CRF inference problem using vanilla Frank-Wolfe with approximate updates, which are equivalent to minimizing a regularized energy function. Our proposed method is a generalization of existing algorithms such as mean field or concave-convex procedure. This perspective not only offers a unified analysis of these algorithms, but also allows an easy way of exploring different variants that potentially yield better performance. We illustrate this in our empirical results on standard semantic segmentation datasets, where several instantiations of our regularized Frank-Wolfe outperform mean field inference, both as a standalone component and as an end-to-end trainable layer in a neural network. We also show that dense CRFs, coupled with our new algorithms, produce significant improvements over strong CNN baselines.
    SafeNet: The Unreasonable Effectiveness of Ensembles in Private Collaborative Learning. (arXiv:2205.09986v2 [cs.CR] UPDATED)
    Secure multiparty computation (MPC) has been proposed to allow multiple mutually distrustful data owners to jointly train machine learning (ML) models on their combined data. However, by design, MPC protocols faithfully compute the training functionality, which the adversarial ML community has shown to leak private information and can be tampered with in poisoning attacks. In this work, we argue that model ensembles, implemented in our framework called SafeNet, are a highly MPC-amenable way to avoid many adversarial ML attacks. The natural partitioning of data amongst owners in MPC training allows this approach to be highly scalable at training time, provide provable protection from poisoning attacks, and provably defense against a number of privacy attacks. We demonstrate SafeNet's efficiency, accuracy, and resilience to poisoning on several machine learning datasets and models trained in end-to-end and transfer learning scenarios. For instance, SafeNet reduces backdoor attack success significantly, while achieving $39\times$ faster training and $36 \times$ less communication than the four-party MPC framework of Dalskov et al. Our experiments show that ensembling retains these benefits even in many non-iid settings. The simplicity, cheap setup, and robustness properties of ensembling make it a strong first choice for training ML models privately in MPC.
    A Light-Weight Multi-Objective Asynchronous Hyper-Parameter Optimizer. (arXiv:2202.07735v2 [cs.LG] UPDATED)
    We describe a light-weight yet performant system for hyper-parameter optimization that approximately minimizes an overall scalar cost function that is obtained by combining multiple performance objectives using a target-priority-limit scalarizer. It also supports a trade-off mode, where the goal is to find an appropriate trade-off among objectives by interacting with the user. We focus on the common scenario where there are on the order of tens of hyper-parameters, each with various attributes such as a range of continuous values, or a finite list of values, and whether it should be treated on a linear or logarithmic scale. The system supports multiple asynchronous simulations and is robust to simulation stragglers and failures.
    Deep Neural Networks to Correct Sub-Precision Errors in CFD. (arXiv:2202.04233v2 [physics.flu-dyn] UPDATED)
    Loss of information in numerical simulations can arise from various sources while solving discretized partial differential equations. In particular, precision-related errors can accumulate in the quantities of interest when the simulations are performed using low-precision 16-bit floating-point arithmetic compared to an equivalent 64-bit simulation. Here, low-precision computation requires much lower resources than high-precision computation. Several machine learning (ML) techniques proposed recently have been successful in correcting the errors arising from spatial discretization. In this work, we extend these techniques to improve Computational Fluid Dynamics (CFD) simulations performed using low numerical precision. We first quantify the precision related errors accumulated in a Kolmogorov forced turbulence test case. Subsequently, we employ a Convolutional Neural Network together with a fully differentiable numerical solver performing 16-bit arithmetic to learn a tightly-coupled ML-CFD hybrid solver. Compared to the 16-bit solver, we demonstrate the efficacy of the ML-CFD hybrid solver towards reducing the error accumulation in the velocity field and improving the kinetic energy spectrum at higher frequencies.
    Differential Privacy and Fairness in Decisions and Learning Tasks: A Survey. (arXiv:2202.08187v2 [cs.LG] UPDATED)
    This paper surveys recent work in the intersection of differential privacy (DP) and fairness. It reviews the conditions under which privacy and fairness may have aligned or contrasting goals, analyzes how and why DP may exacerbate bias and unfairness in decision problems and learning tasks, and describes available mitigation measures for the fairness issues arising in DP systems. The survey provides a unified understanding of the main challenges and potential risks arising when deploying privacy-preserving machine-learning or decisions-making tasks under a fairness lens.
    Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three Regimes. (arXiv:2209.03695v1 [cs.LG])
    A fundamental property of deep learning normalization techniques, such as batch normalization, is making the pre-normalization parameters scale invariant. The intrinsic domain of such parameters is the unit sphere, and therefore their gradient optimization dynamics can be represented via spherical optimization with varying effective learning rate (ELR), which was studied previously. In this work, we investigate the properties of training scale-invariant neural networks directly on the sphere using a fixed ELR. We discover three regimes of such training depending on the ELR value: convergence, chaotic equilibrium, and divergence. We study these regimes in detail both on a theoretical examination of a toy example and on a thorough empirical analysis of real scale-invariant deep learning models. Each regime has unique features and reflects specific properties of the intrinsic loss landscape, some of which have strong parallels with previous research on both regular and scale-invariant neural networks training. Finally, we demonstrate how the discovered regimes are reflected in conventional training of normalized networks and how they can be leveraged to achieve better optima.
    Learning Sparse Graphon Mean Field Games. (arXiv:2209.03880v1 [cs.MA])
    Although the field of multi-agent reinforcement learning (MARL) has made considerable progress in the last years, solving systems with a large number of agents remains a hard challenge. Graphon mean field games (GMFGs) enable the scalable analysis of MARL problems that are otherwise intractable. By the mathematical structure of graphons, this approach is limited to dense graphs which are insufficient to describe many real-world networks such as power law graphs. Our paper introduces a novel formulation of GMFGs, called LPGMFGs, which leverages the graph theoretical concept of $L^p$ graphons and provides a machine learning tool to efficiently and accurately approximate solutions for sparse network problems. This especially includes power law networks which are empirically observed in various application areas and cannot be captured by standard graphons. We derive theoretical existence and convergence guarantees and give empirical examples that demonstrate the accuracy of our learning approach for systems with many agents. Furthermore, we rigorously extend the Online Mirror Descent (OMD) learning algorithm to our setup to accelerate learning speed, allow for agent interaction through the mean field in the transition kernel, and empirically show its capabilities. In general, we provide a scalable, mathematically well-founded machine learning approach to a large class of otherwise intractable problems of great relevance in numerous research fields.
    Sparse Coding with Multi-Layer Decoders using Variance Regularization. (arXiv:2112.09214v2 [cs.CV] UPDATED)
    Sparse representations of images are useful in many computer vision applications. Sparse coding with an $l_1$ penalty and a learned linear dictionary requires regularization of the dictionary to prevent a collapse in the $l_1$ norms of the codes. Typically, this regularization entails bounding the Euclidean norms of the dictionary's elements. In this work, we propose a novel sparse coding protocol which prevents a collapse in the codes without the need to regularize the decoder. Our method regularizes the codes directly so that each latent code component has variance greater than a fixed threshold over a set of sparse representations for a given set of inputs. Furthermore, we explore ways to effectively train sparse coding systems with multi-layer decoders since they can model more complex relationships than linear dictionaries. In our experiments with MNIST and natural image patches, we show that decoders learned with our approach have interpretable features both in the linear and multi-layer case. Moreover, we show that sparse autoencoders with multi-layer decoders trained using our variance regularization method produce higher quality reconstructions with sparser representations when compared to autoencoders with linear dictionaries. Additionally, sparse representations obtained with our variance regularization approach are useful in the downstream tasks of denoising and classification in the low-data regime.
    IMAP: Individual huMAn mobility Patterns visualizing platform. (arXiv:2209.03615v1 [cs.SI])
    Understanding human mobility is essential for the development of smart cities and social behavior research. Human mobility models may be used in numerous applications, including pandemic control, urban planning, and traffic management. The existing models' accuracy in predicting users' mobility patterns is less than 25%. The low accuracy may be justified by the flexible nature of the human movement. Indeed, humans are not rigid in their daily movement. In addition, the rigid mobility models may result in missing the hidden regularities in users' records. Thus, we propose a novel perspective to study and analyze human mobility patterns and capture their flexibility. Typically, the mobility patterns are represented by a sequence of locations. We propose to define the mobility patterns by abstracting these locations into a set of places. Labeling these locations will allow us to detect close-to-reality hidden patterns. We present IMAP, an Individual huMAn mobility Patterns visualizing platform. Our platform enables users to visualize a graph of the places they visited based on their history records. In addition, our platform displays the most frequent mobility patterns computed using a modified PrefixSpan approach.
    On the Near-Optimality of Local Policies in Large Cooperative Multi-Agent Reinforcement Learning. (arXiv:2209.03491v1 [cs.LG])
    We show that in a cooperative $N$-agent network, one can design locally executable policies for the agents such that the resulting discounted sum of average rewards (value) well approximates the optimal value computed over all (including non-local) policies. Specifically, we prove that, if $|\mathcal{X}|, |\mathcal{U}|$ denote the size of state, and action spaces of individual agents, then for sufficiently small discount factor, the approximation error is given by $\mathcal{O}(e)$ where $e\triangleq \frac{1}{\sqrt{N}}\left[\sqrt{|\mathcal{X}|}+\sqrt{|\mathcal{U}|}\right]$. Moreover, in a special case where the reward and state transition functions are independent of the action distribution of the population, the error improves to $\mathcal{O}(e)$ where $e\triangleq \frac{1}{\sqrt{N}}\sqrt{|\mathcal{X}|}$. Finally, we also devise an algorithm to explicitly construct a local policy. With the help of our approximation results, we further establish that the constructed local policy is within $\mathcal{O}(\max\{e,\epsilon\})$ distance of the optimal policy, and the sample complexity to achieve such a local policy is $\mathcal{O}(\epsilon^{-3})$, for any $\epsilon>0$.
    Incremental Correction in Dynamic Systems Modelled with Neural Networks for Constraint Satisfaction. (arXiv:2209.03698v1 [math.OC])
    This study presents incremental correction methods for refining neural network parameters or control functions entering into a continuous-time dynamic system to achieve improved solution accuracy in satisfying the interim point constraints placed on the performance output variables. The proposed approach is to linearise the dynamics around the baseline values of its arguments, and then to solve for the corrective input required to transfer the perturbed trajectory to precisely known or desired values at specific time points, i.e., the interim points. Depending on the type of decision variables to adjust, parameter correction and control function correction methods are developed. These incremental correction methods can be utilised as a means to compensate for the prediction errors of pre-trained neural networks in real-time applications where high accuracy of the prediction of dynamical systems at prescribed time points is imperative. In this regard, the online update approach can be useful for enhancing overall targeting accuracy of finite-horizon control subject to point constraints using a neural policy. Numerical example demonstrates the effectiveness of the proposed approach in an application to a powered descent problem at Mars.
    Representation Learning for Appliance Recognition: A Comparison to Classical Machine Learning. (arXiv:2209.03759v1 [eess.SP])
    Non-intrusive load monitoring (NILM) aims at energy consumption and appliance state information retrieval from aggregated consumption measurements, with the help of signal processing and machine learning algorithms. Representation learning with deep neural networks is successfully applied to several related disciplines. The main advantage of representation learning lies in replacing an expert-driven, hand-crafted feature extraction with hierarchical learning from many representations in raw data format. In this paper, we show how the NILM processing-chain can be improved, reduced in complexity and alternatively designed with recent deep learning algorithms. On the basis of an event-based appliance recognition approach, we evaluate seven different classification models: a classical machine learning approach that is based on a hand-crafted feature extraction, three different deep neural network architectures for automated feature extraction on raw waveform data, as well as three baseline approaches for raw data processing. We evaluate all approaches on two large-scale energy consumption datasets with more than 50,000 events of 44 appliances. We show that with the use of deep learning, we are able to reach and surpass the performance of the state-of-the-art classical machine learning approach for appliance recognition with an F-Score of 0.75 and 0.86 compared to 0.69 and 0.87 of the classical approach.
    Fact-Saboteurs: A Taxonomy of Evidence Manipulation Attacks against Fact-Verification Systems. (arXiv:2209.03755v1 [cs.CR])
    Mis- and disinformation are now a substantial global threat to our security and safety. To cope with the scale of online misinformation, one viable solution is to automate the fact-checking of claims by retrieving and verifying against relevant evidence. While major recent advances have been achieved in pushing forward the automatic fact-verification, a comprehensive evaluation of the possible attack vectors against such systems is still lacking. Particularly, the automated fact-verification process might be vulnerable to the exact disinformation campaigns it is trying to combat. In this work, we assume an adversary that automatically tampers with the online evidence in order to disrupt the fact-checking model via camouflaging the relevant evidence, or planting a misleading one. We first propose an exploratory taxonomy that spans these two targets and the different threat model dimensions. Guided by this, we design and propose several potential attack methods. We show that it is possible to subtly modify claim-salient snippets in the evidence, in addition to generating diverse and claim-aligned evidence. As a result, we highly degrade the fact-checking performance under many different permutations of the taxonomy's dimensions. The attacks are also robust against post-hoc modifications of the claim. Our analysis further hints at potential limitations in models' inference when faced with contradicting evidence. We emphasize that these attacks can have harmful implications on the inspectable and human-in-the-loop usage scenarios of such models, and we conclude by discussing challenges and directions for future defenses.
    An Empirical Evaluation of Posterior Sampling for Constrained Reinforcement Learning. (arXiv:2209.03596v1 [cs.LG])
    We study a posterior sampling approach to efficient exploration in constrained reinforcement learning. Alternatively to existing algorithms, we propose two simple algorithms that are more efficient statistically, simpler to implement and computationally cheaper. The first algorithm is based on a linear formulation of CMDP, and the second algorithm leverages the saddle-point formulation of CMDP. Our empirical results demonstrate that, despite its simplicity, posterior sampling achieves state-of-the-art performance and, in some cases, significantly outperforms optimistic algorithms.
    Losing momentum in continuous-time stochastic optimisation. (arXiv:2209.03705v1 [math.OC])
    The training of deep neural networks and other modern machine learning models usually consists in solving non-convex optimisation problems that are high-dimensional and subject to large-scale data. Here, momentum-based stochastic optimisation algorithms have become especially popular in recent years. The stochasticity arises from data subsampling which reduces computational cost. Moreover, both, momentum and stochasticity are supposed to help the algorithm to overcome local minimisers and, hopefully, converge globally. Theoretically, this combination of stochasticity and momentum is badly understood. In this work, we propose and analyse a continuous-time model for stochastic gradient descent with momentum. This model is a piecewise-deterministic Markov process that represents the particle movement by an underdamped dynamical system and the data subsampling through a stochastic switching of the dynamical system. In our analysis, we investigate longtime limits, the subsampling-to-no-subsampling limit, and the momentum-to-no-momentum limit. We are particularly interested in the case of reducing the momentum over time: intuitively, the momentum helps to overcome local minimisers in the initial phase of the algorithm, but prohibits fast convergence to a global minimiser later. Under convexity assumptions, we show convergence of our dynamical system to the global minimiser when reducing momentum over time and let the subsampling rate go to infinity. We then propose a stable, symplectic discretisation scheme to construct an algorithm from our continuous-time dynamical system. In numerical experiments, we study our discretisation scheme in convex and non-convex test problems. Additionally, we train a convolutional neural network to solve the CIFAR-10 image classification problem. Here, our algorithm reaches competitive results compared to stochastic gradient descent with momentum.
    Analyzing the Effect of Sampling in GNNs on Individual Fairness. (arXiv:2209.03904v1 [cs.LG])
    Graph neural network (GNN) based methods have saturated the field of recommender systems. The gains of these systems have been significant, showing the advantages of interpreting data through a network structure. However, despite the noticeable benefits of using graph structures in recommendation tasks, this representational form has also bred new challenges which exacerbate the complexity of mitigating algorithmic bias. When GNNs are integrated into downstream tasks, such as recommendation, bias mitigation can become even more difficult. Furthermore, the intractability of applying existing methods of fairness promotion to large, real world datasets places even more serious constraints on mitigation attempts. Our work sets out to fill in this gap by taking an existing method for promoting individual fairness on graphs and extending it to support mini-batch, or sub-sample based, training of a GNN, thus laying the groundwork for applying this method to a downstream recommendation task. We evaluate two popular GNN methods: Graph Convolutional Network (GCN), which trains on the entire graph, and GraphSAGE, which uses probabilistic random walks to create subgraphs for mini-batch training, and assess the effects of sub-sampling on individual fairness. We implement an individual fairness notion called \textit{REDRESS}, proposed by Dong et al., which uses rank optimization to learn individual fair node, or item, embeddings. We empirically show on two real world datasets that GraphSAGE is able to achieve, not just, comparable accuracy, but also, improved fairness as compared with the GCN model. These finding have consequential ramifications to individual fairness promotion, GNNs, and in downstream form, recommender systems, showing that mini-batch training facilitate individual fairness promotion by allowing for local nuance to guide the process of fairness promotion in representation learning.
    Hierarchical Graph Pooling is an Effective Citywide Traffic Condition Prediction Model. (arXiv:2209.03629v1 [cs.LG])
    Accurate traffic conditions prediction provides a solid foundation for vehicle-environment coordination and traffic control tasks. Because of the complexity of road network data in spatial distribution and the diversity of deep learning methods, it becomes challenging to effectively define traffic data and adequately capture the complex spatial nonlinear features in the data. This paper applies two hierarchical graph pooling approaches to the traffic prediction task to reduce graph information redundancy. First, this paper verifies the effectiveness of hierarchical graph pooling methods in traffic prediction tasks. The hierarchical graph pooling methods are contrasted with the other baselines on predictive performance. Second, two mainstream hierarchical graph pooling methods, node clustering pooling and node drop pooling, are applied to analyze advantages and weaknesses in traffic prediction. Finally, for the mentioned graph neural networks, this paper compares the predictive effects of different graph network inputs on traffic prediction accuracy. The efficient ways of defining graph networks are analyzed and summarized.
    Application of image-to-image translation in improving pedestrian detection. (arXiv:2209.03625v1 [cs.CV])
    The lack of effective target regions makes it difficult to perform several visual functions in low intensity light, including pedestrian recognition, and image-to-image translation. In this situation, with the accumulation of high-quality information by the combined use of infrared and visible images it is possible to detect pedestrians even in low light. In this study we are going to use advanced deep learning models like pix2pixGAN and YOLOv7 on LLVIP dataset, containing visible-infrared image pairs for low light vision. This dataset contains 33672 images and most of the images were captured in dark scenes, tightly synchronized with time and location.
    Predict+Optimize for Packing and Covering LPs with Unknown Parameters in Constraints. (arXiv:2209.03668v1 [cs.AI])
    Predict+Optimize is a recently proposed framework which combines machine learning and constrained optimization, tackling optimization problems that contain parameters that are unknown at solving time. The goal is to predict the unknown parameters and use the estimates to solve for an estimated optimal solution to the optimization problem. However, all prior works have focused on the case where unknown parameters appear only in the optimization objective and not the constraints, for the simple reason that if the constraints were not known exactly, the estimated optimal solution might not even be feasible under the true parameters. The contributions of this paper are two-fold. First, we propose a novel and practically relevant framework for the Predict+Optimize setting, but with unknown parameters in both the objective and the constraints. We introduce the notion of a correction function, and an additional penalty term in the loss function, modelling practical scenarios where an estimated optimal solution can be modified into a feasible solution after the true parameters are revealed, but at an additional cost. Second, we propose a corresponding algorithmic approach for our framework, which handles all packing and covering linear programs. Our approach is inspired by the prior work of Mandi and Guns, though with crucial modifications and re-derivations for our very different setting. Experimentation demonstrates the superior empirical performance of our method over classical approaches.
    Stochastic Frank-Wolfe for Constrained Finite-Sum Minimization. (arXiv:2002.11860v6 [math.OC] UPDATED)
    We propose a novel Stochastic Frank-Wolfe (a.k.a. conditional gradient) algorithm for constrained smooth finite-sum minimization with a generalized linear prediction/structure. This class of problems includes empirical risk minimization with sparse, low-rank, or other structured constraints. The proposed method is simple to implement, does not require step-size tuning, and has a constant per-iteration cost that is independent of the dataset size. Furthermore, as a byproduct of the method we obtain a stochastic estimator of the Frank-Wolfe gap that can be used as a stopping criterion. Depending on the setting, the proposed method matches or improves on the best computational guarantees for Stochastic Frank-Wolfe algorithms. Benchmarks on several datasets highlight different regimes in which the proposed method exhibits a faster empirical convergence than related methods. Finally, we provide an implementation of all considered methods in an open-source package.
    CGAN-ECT: Tomography Image Reconstruction from Electrical Capacitance Measurements Using CGANs. (arXiv:2209.03737v1 [eess.IV])
    Due to the rapid growth of Electrical Capacitance Tomography (ECT) applications in several industrial fields, there is a crucial need for developing high quality, yet fast, methodologies of image reconstruction from raw capacitance measurements. Deep learning, as an effective non-linear mapping tool for complicated functions, has been going viral in many fields including electrical tomography. In this paper, we propose a Conditional Generative Adversarial Network (CGAN) model for reconstructing ECT images from capacitance measurements. The initial image of the CGAN model is constructed from the capacitance measurement. To our knowledge, this is the first time to represent the capacitance measurements in an image form. We have created a new massive ECT dataset of 320K synthetic image measurements pairs for training, and testing the proposed model. The feasibility and generalization ability of the proposed CGAN-ECT model are evaluated using testing dataset, contaminated data and flow patterns that are not exposed to the model during the training phase. The evaluation results prove that the proposed CGAN-ECT model can efficiently create more accurate ECT images than traditional and other deep learning-based image reconstruction algorithms. CGAN-ECT achieved an average image correlation coefficient of more than 99.3% and an average relative image error about 0.07.
    Simpler is better: Multilevel Abstraction with Graph Convolutional Recurrent Neural Network Cells for Traffic Prediction. (arXiv:2209.03858v1 [cs.LG])
    In recent years, graph neural networks (GNNs) combined with variants of recurrent neural networks (RNNs) have reached state-of-the-art performance in spatiotemporal forecasting tasks. This is particularly the case for traffic forecasting, where GNN models use the graph structure of road networks to account for spatial correlation between links and nodes. Recent solutions are either based on complex graph operations or avoiding predefined graphs. This paper proposes a new sequence-to-sequence architecture to extract the spatiotemporal correlation at multiple levels of abstraction using GNN-RNN cells with sparse architecture to decrease training time compared to more complex designs. Encoding the same input sequence through multiple encoders, with an incremental increase in encoder layers, enables the network to learn general and detailed information through multilevel abstraction. We further present a new benchmark dataset of street-level segment traffic data from Montreal, Canada. Unlike highways, urban road segments are cyclic and characterized by complicated spatial dependencies. Experimental results on the METR-LA benchmark highway and our MSLTD street-level segment datasets demonstrate that our model improves performance by more than 7% for one-hour prediction compared to the baseline methods while reducing computing resource requirements by more than half compared to other competing methods.  ( 2 min )
    TAG: Learning Circuit Spatial Embedding From Layouts. (arXiv:2209.03465v1 [cs.AR])
    Analog and mixed-signal (AMS) circuit designs still rely on human design expertise. Machine learning has been assisting circuit design automation by replacing human experience with artificial intelligence. This paper presents TAG, a new paradigm of learning the circuit representation from layouts leveraging text, self-attention and graph. The embedding network model learns spatial information without manual labeling. We introduce text embedding and a self-attention mechanism to AMS circuit learning. Experimental results demonstrate the ability to predict layout distances between instances with industrial FinFET technology benchmarks. The effectiveness of the circuit representation is verified by showing the transferability to three other learning tasks with limited data in the case studies: layout matching prediction, wirelength estimation, and net parasitic capacitance prediction.
    Geolocation of Cultural Heritage using Multi-View Knowledge Graph Embedding. (arXiv:2209.03638v1 [cs.LG])
    Knowledge Graphs (KGs) have proven to be a reliable way of structuring data. They can provide a rich source of contextual information about cultural heritage collections. However, cultural heritage KGs are far from being complete. They are often missing important attributes such as geographical location, especially for sculptures and mobile or indoor entities such as paintings. In this paper, we first present a framework for ingesting knowledge about tangible cultural heritage entities from various data sources and their connected multi-hop knowledge into a geolocalized KG. Secondly, we propose a multi-view learning model for estimating the relative distance between a given pair of cultural heritage entities, based on the geographical as well as the knowledge connections of the entities.
    Impact of dataset size and long-term ECoG-based BCI usage on deep learning decoders performance. (arXiv:2209.03789v1 [eess.SP])
    In brain-computer interfaces (BCI) research, recording data is time-consuming and expensive, which limits access to big datasets. This may influence the BCI system performance as machine learning methods depend strongly on the training dataset size. Important questions arise: taking into account neuronal signal characteristics (e.g., non-stationarity), can we achieve higher decoding performance with more data to train decoders? What is the perspective for further improvement with time in the case of long-term BCI studies? In this study, we investigated the impact of long-term recordings on motor imagery decoding from two main perspectives: model requirements regarding dataset size and potential for patient adaptation. We evaluated the multilinear model and two deep learning (DL) models on a long-term BCI and Tetraplegia NCT02550522 clinical trial dataset containing 43 sessions of ECoG recordings performed with a tetraplegic patient. In the experiment, a participant executed 3D virtual hand translation using motor imagery patterns. We designed multiple computational experiments in which training datasets were increased or translated to investigate the relationship between models' performance and different factors influencing recordings. Our analysis showed that adding more data to the training dataset may not instantly increase performance for datasets already containing 40 minutes of the signal. DL decoders showed similar requirements regarding the dataset size compared to the multilinear model while demonstrating higher decoding performance. Moreover, high decoding performance was obtained with relatively small datasets recorded later in the experiment, suggesting motor imagery patterns improvement and patient adaptation. Finally, we proposed UMAP embeddings and local intrinsic dimensionality as a way to visualize the data and potentially evaluate data quality.
    A Survey on Large-Population Systems and Scalable Multi-Agent Reinforcement Learning. (arXiv:2209.03859v1 [cs.MA])
    The analysis and control of large-population systems is of great interest to diverse areas of research and engineering, ranging from epidemiology over robotic swarms to economics and finance. An increasingly popular and effective approach to realizing sequential decision-making in multi-agent systems is through multi-agent reinforcement learning, as it allows for an automatic and model-free analysis of highly complex systems. However, the key issue of scalability complicates the design of control and reinforcement learning algorithms particularly in systems with large populations of agents. While reinforcement learning has found resounding empirical success in many scenarios with few agents, problems with many agents quickly become intractable and necessitate special consideration. In this survey, we will shed light on current approaches to tractably understanding and analyzing large-population systems, both through multi-agent reinforcement learning and through adjacent areas of research such as mean-field games, collective intelligence, or complex network theory. These classically independent subject areas offer a variety of approaches to understanding or modeling large-population systems, which may be of great use for the formulation of tractable MARL algorithms in the future. Finally, we survey potential areas of application for large-scale control and identify fruitful future applications of learning algorithms in practical systems. We hope that our survey could provide insight and future directions to junior and senior researchers in theoretical and applied sciences alike.
    Convolutional Neural Network (CNN) to reduce construction loss in JPEG compression. (arXiv:2209.03475v1 [eess.IV])
    In recent decades, digital image processing has gained enormous popularity. Consequently, a number of data compression strategies have been put forth, with the goal of minimizing the amount of information required to represent images. Among them, JPEG compression is one of the most popular methods that has been widely applied in multimedia and digital applications. The periodic nature of DFT makes it impossible to meet the periodic condition of an image's opposing edges without producing severe artifacts, which lowers the image's perceptual visual quality. On the other hand, deep learning has recently achieved outstanding results for applications like speech recognition, image reduction, and natural language processing. Convolutional Neural Networks (CNN) have received more attention than most other types of deep neural networks. The use of convolution in feature extraction results in a less redundant feature map and a smaller dataset, both of which are crucial for image compression. In this work, an effective image compression method is purposed using autoencoders. The study's findings revealed a number of important trends that suggested better reconstruction along with good compression can be achieved using autoencoders.  ( 2 min )
    Lightweight Long-Range Generative Adversarial Networks. (arXiv:2209.03793v1 [cs.CV])
    In this paper, we introduce novel lightweight generative adversarial networks, which can effectively capture long-range dependencies in the image generation process, and produce high-quality results with a much simpler architecture. To achieve this, we first introduce a long-range module, allowing the network to dynamically adjust the number of focused sampling pixels and to also augment sampling locations. Thus, it can break the limitation of the fixed geometric structure of the convolution operator, and capture long-range dependencies in both spatial and channel-wise directions. Also, the proposed long-range module can highlight negative relations between pixels, working as a regularization to stabilize training. Furthermore, we propose a new generation strategy through which we introduce metadata into the image generation process to provide basic information about target images, which can stabilize and speed up the training process. Our novel long-range module only introduces few additional parameters and is easily inserted into existing models to capture long-range dependencies. Extensive experiments demonstrate the competitive performance of our method with a lightweight architecture.  ( 2 min )
    Physics-Guided Adversarial Machine Learning for Aircraft Systems Simulation. (arXiv:2209.03431v1 [cs.LG])
    In the context of aircraft system performance assessment, deep learning technologies allow to quickly infer models from experimental measurements, with less detailed system knowledge than usually required by physics-based modeling. However, this inexpensive model development also comes with new challenges regarding model trustworthiness. This work presents a novel approach, physics-guided adversarial machine learning (ML), that improves the confidence over the physics consistency of the model. The approach performs, first, a physics-guided adversarial testing phase to search for test inputs revealing behavioral system inconsistencies, while still falling within the range of foreseeable operational conditions. Then, it proceeds with physics-informed adversarial training to teach the model the system-related physics domain foreknowledge through iteratively reducing the unwanted output deviations on the previously-uncovered counterexamples. Empirical evaluation on two aircraft system performance models shows the effectiveness of our adversarial ML approach in exposing physical inconsistencies of both models and in improving their propensity to be consistent with physics domain knowledge.
    A deep learning-based model reduction (DeePMR) method for simplifying chemical kinetics. (arXiv:2201.02025v3 [cs.LG] UPDATED)
    A deep learning-based model reduction (DeePMR) method for simplifying chemical kinetics is proposed and validated using high-temperature auto-ignitions, perfectly stirred reactors (PSR), and one-dimensional freely propagating flames of n-heptane/air mixtures. The mechanism reduction is modeled as an optimization problem on Boolean space, where a Boolean vector, each entry corresponding to a species, represents a reduced mechanism. The optimization goal is to minimize the reduced mechanism size given the error tolerance of a group of pre-selected benchmark quantities. The key idea of the DeePMR is to employ a deep neural network (DNN) to formulate the objective function in the optimization problem. In order to explore high dimensional Boolean space efficiently, an iterative DNN-assisted data sampling and DNN training procedure are implemented. The results show that DNN-assistance improves sampling efficiency significantly, selecting only $10^5$ samples out of $10^{34}$ possible samples for DNN to achieve sufficient accuracy. The results demonstrate the capability of the DNN to recognize key species and reasonably predict reduced mechanism performance. The well-trained DNN guarantees the optimal reduced mechanism by solving an inverse optimization problem. By comparing ignition delay times, laminar flame speeds, temperatures in PSRs, the resulting skeletal mechanism has fewer species (45 species) but the same level of accuracy as the skeletal mechanism (56 species) obtained by the Path Flux Analysis (PFA) method. In addition, the skeletal mechanism can be further reduced to 28 species if only considering atmospheric, near-stoichiometric conditions (equivalence ratio between 0.6 and 1.2). The DeePMR provides an innovative way to perform model reduction and demonstrates the great potential of data-driven methods in the combustion area.
    AST-GIN: Attribute-Augmented Spatial-Temporal Graph Informer Network for Electric Vehicle Charging Station Availability Forecasting. (arXiv:2209.03356v1 [cs.LG])
    Electric Vehicle (EV) charging demand and charging station availability forecasting is one of the challenges in the intelligent transportation system. With the accurate EV station situation prediction, suitable charging behaviors could be scheduled in advance to relieve range anxiety. Many existing deep learning methods are proposed to address this issue, however, due to the complex road network structure and comprehensive external factors, such as point of interests (POIs) and weather effects, many commonly used algorithms could just extract the historical usage information without considering comprehensive influence of external factors. To enhance the prediction accuracy and interpretability, the Attribute-Augmented Spatial-Temporal Graph Informer (AST-GIN) structure is proposed in this study by combining the Graph Convolutional Network (GCN) layer and the Informer layer to extract both external and internal spatial-temporal dependence of relevant transportation data. And the external factors are modeled as dynamic attributes by the attribute-augmented encoder for training. AST-GIN model is tested on the data collected in Dundee City and experimental results show the effectiveness of our model considering external factors influence over various horizon settings compared with other baselines.
    SmOOD: Smoothness-based Out-of-Distribution Detection Approach for Surrogate Neural Networks in Aircraft Design. (arXiv:2209.03438v1 [cs.LG])
    Aircraft industry is constantly striving for more efficient design optimization methods in terms of human efforts, computation time, and resource consumption. Hybrid surrogate optimization maintains high results quality while providing rapid design assessments when both the surrogate model and the switch mechanism for eventually transitioning to the HF model are calibrated properly. Feedforward neural networks (FNNs) can capture highly nonlinear input-output mappings, yielding efficient surrogates for aircraft performance factors. However, FNNs often fail to generalize over the out-of-distribution (OOD) samples, which hinders their adoption in critical aircraft design optimization. Through SmOOD, our smoothness-based out-of-distribution detection approach, we propose to codesign a model-dependent OOD indicator with the optimized FNN surrogate, to produce a trustworthy surrogate model with selective but credible predictions. Unlike conventional uncertainty-grounded methods, SmOOD exploits inherent smoothness properties of the HF simulations to effectively expose OODs through revealing their suspicious sensitivities, thereby avoiding over-confident uncertainty estimates on OOD samples. By using SmOOD, only high-risk OOD inputs are forwarded to the HF model for re-evaluation, leading to more accurate results at a low overhead cost. Three aircraft performance models are investigated. Results show that FNN-based surrogates outperform their Gaussian Process counterparts in terms of predictive performance. Moreover, SmOOD does cover averagely 85% of actual OODs on all the study cases. When SmOOD plus FNN surrogates are deployed in hybrid surrogate optimization settings, they result in a decrease error rate of 34.65% and a computational speed up rate of 58.36 times, respectively.
    A simple approach for quantizing neural networks. (arXiv:2209.03487v1 [cs.LG])
    In this short note, we propose a new method for quantizing the weights of a fully trained neural network. A simple deterministic pre-processing step allows us to quantize network layers via memoryless scalar quantization while preserving the network performance on given training data. On one hand, the computational complexity of this pre-processing slightly exceeds that of state-of-the-art algorithms in the literature. On the other hand, our approach does not require any hyper-parameter tuning and, in contrast to previous methods, allows a plain analysis. We provide rigorous theoretical guarantees in the case of quantizing single network layers and show that the relative error decays with the number of parameters in the network if the training data behaves well, e.g., if it is sampled from suitable random distributions. The developed method also readily allows the quantization of deep networks by consecutive application to single layers.
    Reward Delay Attacks on Deep Reinforcement Learning. (arXiv:2209.03540v1 [cs.LG])
    Most reinforcement learning algorithms implicitly assume strong synchrony. We present novel attacks targeting Q-learning that exploit a vulnerability entailed by this assumption by delaying the reward signal for a limited time period. We consider two types of attack goals: targeted attacks, which aim to cause a target policy to be learned, and untargeted attacks, which simply aim to induce a policy with a low reward. We evaluate the efficacy of the proposed attacks through a series of experiments. Our first observation is that reward-delay attacks are extremely effective when the goal is simply to minimize reward. Indeed, we find that even naive baseline reward-delay attacks are also highly successful in minimizing the reward. Targeted attacks, on the other hand, are more challenging, although we nevertheless demonstrate that the proposed approaches remain highly effective at achieving the attacker's targets. In addition, we introduce a second threat model that captures a minimal mitigation that ensures that rewards cannot be used out of sequence. We find that this mitigation remains insufficient to ensure robustness to attacks that delay, but preserve the order, of rewards.
    A Novel Semi-supervised Meta Learning Method for Subject-transfer Brain-computer Interface. (arXiv:2209.03785v1 [eess.SP])
    Brain-computer interface (BCI) provides a direct communication pathway between human brain and external devices. Before a new subject could use BCI, a calibration procedure is usually required. Because the inter- and intra-subject variances are so large that the models trained by the existing subjects perform poorly on new subjects. Therefore, effective subject-transfer and calibration method is essential. In this paper, we propose a semi-supervised meta learning (SSML) method for subject-transfer learning in BCIs. The proposed SSML learns a meta model with the existing subjects first, then fine-tunes the model in a semi-supervised learning manner, i.e. using few labeled and many unlabeled samples of target subject for calibration. It is significant for BCI applications where the labeled data are scarce or expensive while unlabeled data are readily available. To verify the SSML method, three different BCI paradigms are tested: 1) event-related potential detection; 2) emotion recognition; and 3) sleep staging. The SSML achieved significant improvements of over 15% on the first two paradigms and 4.9% on the third. The experimental results demonstrated the effectiveness and potential of the SSML method in BCI applications.
    Black-Box Audits for Group Distribution Shifts. (arXiv:2209.03620v1 [cs.LG])
    When a model informs decisions about people, distribution shifts can create undue disparities. However, it is hard for external entities to check for distribution shift, as the model and its training set are often proprietary. In this paper, we introduce and study a black-box auditing method to detect cases of distribution shift that lead to a performance disparity of the model across demographic groups. By extending techniques used in membership and property inference attacks -- which are designed to expose private information from learned models -- we demonstrate that an external auditor can gain the information needed to identify these distribution shifts solely by querying the model. Our experimental results on real-world datasets show that this approach is effective, achieving 80--100% AUC-ROC in detecting shifts involving the underrepresentation of a demographic group in the training set. Researchers and investigative journalists can use our tools to perform non-collaborative audits of proprietary models and expose cases of underrepresentation in the training datasets.
    Knowledge Based Template Machine Translation In Low-Resource Setting. (arXiv:2209.03554v1 [cs.CL])
    Incorporating tagging into neural machine translation (NMT) systems has shown promising results in helping translate rare words such as named entities (NE). However, translating NE in low-resource setting remains a challenge. In this work, we investigate the effect of using tags and NE hypernyms from knowledge graphs (KGs) in parallel corpus in different levels of resource conditions. We find the tag-and-copy mechanism (tag the NEs in the source sentence and copy them to the target sentence) improves translation in high-resource settings only. Introducing copying also results in polarizing effects in translating different parts-of-speech (POS). Interestingly, we find that copy accuracy for hypernyms is consistently higher than that of entities. As a way of avoiding "hard" copying and utilizing hypernym in bootstrapping rare entities, we introduced a "soft" tagging mechanism and found consistent improvement in high and low-resource settings.
    Too Fine or Too Coarse? The Goldilocks Composition of Data Complexity for Robust Left-Right Eye-Tracking Classifiers. (arXiv:2209.03761v1 [eess.SP])
    The differences in distributional patterns between benchmark data and real-world data have been one of the main challenges of using electroencephalogram (EEG) signals for eye-tracking (ET) classification. Therefore, increasing the robustness of machine learning models in predicting eye-tracking positions from EEG data is integral for both research and consumer use. Previously, we compared the performance of classifiers trained solely on finer-grain data to those trained solely on coarse-grain. Results indicated that despite the overall improvement in robustness, the performance of the fine-grain trained models decreased, compared to coarse-grain trained models, when the testing and training set contained the same distributional patterns \cite{vectorbased}. This paper aims to address this case by training models using datasets of mixed data complexity to determine the ideal distribution of fine- and coarse-grain data. We train machine learning models utilizing a mixed dataset composed of both fine- and coarse-grain data and then compare the accuracies to models trained using solely fine- or coarse-grain data. For our purposes, finer-grain data refers to data collected using more complex methods whereas coarser-grain data refers to data collected using more simple methods. We apply covariate distributional shifts to test for the susceptibility of each training set. Our results indicated that the optimal training dataset for EEG-ET classification is not composed of solely fine- or coarse-grain data, but rather a mix of the two, leaning towards finer-grain.
    Distilling Deep RL Models Into Interpretable Neuro-Fuzzy Systems. (arXiv:2209.03357v1 [cs.LG])
    Deep Reinforcement Learning uses a deep neural network to encode a policy, which achieves very good performance in a wide range of applications but is widely regarded as a black box model. A more interpretable alternative to deep networks is given by neuro-fuzzy controllers. Unfortunately, neuro-fuzzy controllers often need a large number of rules to solve relatively simple tasks, making them difficult to interpret. In this work, we present an algorithm to distill the policy from a deep Q-network into a compact neuro-fuzzy controller. This allows us to train compact neuro-fuzzy controllers through distillation to solve tasks that they are unable to solve directly, combining the flexibility of deep reinforcement learning and the interpretability of compact rule bases. We demonstrate the algorithm on three well-known environments from OpenAI Gym, where we nearly match the performance of a DQN agent using only 2 to 6 fuzzy rules.
    Aerial View Goal Localization with Reinforcement Learning. (arXiv:2209.03694v1 [cs.CV])
    With an increased amount and availability of unmanned aerial vehicles (UAVs) and other remote sensing devices (e.g. satellites), we have recently seen a vast increase in computer vision methods for aerial view data. One application of such technologies is within search-and-rescue (SAR), where the task is to localize and assist one or several people who are missing, for example after a natural disaster. In many cases the rough location may be known and a UAV can be deployed to explore a given, confined area to precisely localize the missing people. Due to time and battery constraints it is often critical that localization is performed as efficiently as possible. In this work, we approach this type of problem by abstracting it as an aerial view goal localization task in a framework that emulates a SAR-like setup without requiring access to actual UAVs. In this framework, an agent operates on top of an aerial image (proxy for a search area) and is tasked with localizing a goal that is described in terms of visual cues. To further mimic the situation on an actual UAV, the agent is not able to observe the search area in its entirety, not even at low resolution, and thus it has to operate solely based on partial glimpses when navigating towards the goal. To tackle this task, we propose AiRLoc, a reinforcement learning (RL)-based model that decouples exploration (searching for distant goals) and exploitation (localizing nearby goals). Extensive evaluations show that AiRLoc outperforms heuristic search methods as well as alternative learnable approaches. We also conduct a proof-of-concept study which indicates that the learnable methods outperform humans on average. Code has been made publicly available: https://github.com/aleksispi/airloc.
    A Framework for Evaluating Privacy-Utility Trade-off in Vertical Federated Learning. (arXiv:2209.03885v1 [cs.LG])
    Federated learning (FL) has emerged as a practical solution to tackle data silo issues without compromising user privacy. One of its variants, vertical federated learning (VFL), has recently gained increasing attention as the VFL matches the enterprises' demands of leveraging more valuable features to build better machine learning models while preserving user privacy. Current works in VFL concentrate on developing a specific protection or attack mechanism for a particular VFL algorithm. In this work, we propose an evaluation framework that formulates the privacy-utility evaluation problem. We then use this framework as a guide to comprehensively evaluate a broad range of protection mechanisms against most of the state-of-the-art privacy attacks for three widely-deployed VFL algorithms. These evaluations may help FL practitioners select appropriate protection mechanisms given specific requirements. Our evaluation results demonstrate that: the model inversion and most of the label inference attacks can be thwarted by existing protection mechanisms; the model completion (MC) attack is difficult to be prevented, which calls for more advanced MC-targeted protection mechanisms. Based on our evaluation results, we offer concrete advice on improving the privacy-preserving capability of VFL systems.
    Self-Supervised Multimodal Fusion Transformer for Passive Activity Recognition. (arXiv:2209.03765v1 [eess.SP])
    The pervasiveness of Wi-Fi signals provides significant opportunities for human sensing and activity recognition in fields such as healthcare. The sensors most commonly used for passive Wi-Fi sensing are based on passive Wi-Fi radar (PWR) and channel state information (CSI) data, however current systems do not effectively exploit the information acquired through multiple sensors to recognise the different activities. In this paper, we explore new properties of the Transformer architecture for multimodal sensor fusion. We study different signal processing techniques to extract multiple image-based features from PWR and CSI data such as spectrograms, scalograms and Markov transition field (MTF). We first propose the Fusion Transformer, an attention-based model for multimodal and multi-sensor fusion. Experimental results show that our Fusion Transformer approach can achieve competitive results compared to a ResNet architecture but with much fewer resources. To further improve our model, we propose a simple and effective framework for multimodal and multi-sensor self-supervised learning (SSL). The self-supervised Fusion Transformer outperforms the baselines, achieving a F1-score of 95.9%. Finally, we show how this approach significantly outperforms the others when trained with as little as 1% (2 minutes) of labelled training data to 20% (40 minutes) of labelled training data.  ( 3 min )
    What and How of Machine Learning Transparency: Building Bespoke Explainability Tools with Interoperable Algorithmic Components. (arXiv:2209.03813v1 [cs.LG])
    Explainability techniques for data-driven predictive models based on artificial intelligence and machine learning algorithms allow us to better understand the operation of such systems and help to hold them accountable. New transparency approaches are developed at breakneck speed, enabling us to peek inside these black boxes and interpret their decisions. Many of these techniques are introduced as monolithic tools, giving the impression of one-size-fits-all and end-to-end algorithms with limited customisability. Nevertheless, such approaches are often composed of multiple interchangeable modules that need to be tuned to the problem at hand to produce meaningful explanations. This paper introduces a collection of hands-on training materials -- slides, video recordings and Jupyter Notebooks -- that provide guidance through the process of building and evaluating bespoke modular surrogate explainers for tabular data. These resources cover the three core building blocks of this technique: interpretable representation composition, data sampling and explanation generation.
    Improved Robust Algorithms for Learning with Discriminative Feature Feedback. (arXiv:2209.03753v1 [cs.LG])
    Discriminative Feature Feedback is a setting proposed by Dastupta et al. (2018), which provides a protocol for interactive learning based on feature explanations that are provided by a human teacher. The features distinguish between the labels of pairs of possibly similar instances. That work has shown that learning in this model can have considerable statistical and computational advantages over learning in standard label-based interactive learning models. In this work, we provide new robust interactive learning algorithms for the Discriminative Feature Feedback model, with mistake bounds that are significantly lower than those of previous robust algorithms for this setting. In the adversarial setting, we reduce the dependence on the number of protocol exceptions from quadratic to linear. In addition, we provide an algorithm for a slightly more restricted model, which obtains an even smaller mistake bound for large models with many exceptions. In the stochastic setting, we provide the first algorithm that converges to the exception rate with a polynomial sample complexity. Our algorithm and analysis for the stochastic setting involve a new construction that we call Feature Influence, which may be of wider applicability.  ( 2 min )
    Developing a multi-variate prediction model for the detection of COVID-19 from Crowd-sourced Respiratory Voice Data. (arXiv:2209.03727v1 [cs.SD])
    COVID-19 has affected more than 223 countries worldwide. There is a pressing need for non invasive, low costs and highly scalable solutions to detect COVID-19, especially in low-resource countries where PCR testing is not ubiquitously available. Our aim is to develop a deep learning model identifying COVID-19 using voice data recordings spontaneously provided by the general population (voice recordings and a short questionnaire) via their personal devices. The novelty of this work is in the development of a deep learning model for the identification of COVID-19 patients from voice recordings. Methods: We used the Cambridge University dataset consisting of 893 audio samples, crowd-sourced from 4352 participants that used a COVID-19 Sounds app. Voice features were extracted using a Mel-spectrogram analysis. Based on the voice data, we developed deep learning classification models to detect positive COVID-19 cases. These models included Long-Short Term Memory (LSTM) and Convolutional Neural Network (CNN). We compared their predictive power to baseline classification models, namely Logistic Regression and Support Vector Machine. Results: LSTM based on a Mel-frequency cepstral coefficients (MFCC) features achieved the highest accuracy (89%,) with a sensitivity and specificity of respectively 89% and 89%, The results achieved with the proposed model suggest a significant improvement in the prediction accuracy of COVID-19 diagnosis compared to the results obtained in the state of the art. Conclusion: Deep learning can detect subtle changes in the voice of COVID-19 patients with promising results. As an addition to the current testing techniques this model may aid health professionals in fast diagnosis and tracing of COVID-19 cases using simple voice analysis
    Learning-based and unrolled motion-compensated reconstruction for cardiac MR CINE imaging. (arXiv:2209.03671v1 [eess.IV])
    Motion-compensated MR reconstruction (MCMR) is a powerful concept with considerable potential, consisting of two coupled sub-problems: Motion estimation, assuming a known image, and image reconstruction, assuming known motion. In this work, we propose a learning-based self-supervised framework for MCMR, to efficiently deal with non-rigid motion corruption in cardiac MR imaging. Contrary to conventional MCMR methods in which the motion is estimated prior to reconstruction and remains unchanged during the iterative optimization process, we introduce a dynamic motion estimation process and embed it into the unrolled optimization. We establish a cardiac motion estimation network that leverages temporal information via a group-wise registration approach, and carry out a joint optimization between the motion estimation and reconstruction. Experiments on 40 acquired 2D cardiac MR CINE datasets demonstrate that the proposed unrolled MCMR framework can reconstruct high quality MR images at high acceleration rates where other state-of-the-art methods fail. We also show that the joint optimization mechanism is mutually beneficial for both sub-tasks, i.e., motion estimation and image reconstruction, especially when the MR image is highly undersampled.  ( 2 min )
    SE(3)-DiffusionFields: Learning cost functions for joint grasp and motion optimization through diffusion. (arXiv:2209.03855v1 [cs.RO])
    Multi-objective high-dimensional motion optimization problems are ubiquitous in robotics and highly benefit from informative gradients. To this end, we require all cost functions to be differentiable. We propose learning task-space, data-driven cost functions as diffusion models. Diffusion models represent expressive multimodal distributions and exhibit proper gradients over the entire space. We exploit these properties for motion optimization by integrating the learned cost functions with other potentially learned or hand-tuned costs in a single objective function and optimize all of them jointly by gradient descent. We showcase the benefits of joint optimization in a set of complex grasp and motion planning problems and compare against hierarchical approaches that decouple grasp selection from motion optimization.  ( 2 min )
    Towards Multidimensional Textural Perception and Classification Through Whisker. (arXiv:2209.03750v1 [eess.SP])
    Texture-based studies and designs have been in focus recently. Whisker-based multidimensional surface texture data is missing in the literature. This data is critical for robotics and machine perception algorithms in the classification and regression of textural surfaces. In this study, we present a novel sensor design to acquire multidimensional texture information. The surface texture's roughness and hardness were measured experimentally using sweeping and dabbing. Three machine learning models (SVM, RF, and MLP) showed excellent classification accuracy for the roughness and hardness of surface textures. We show that the combination of pressure and accelerometer data, collected from a standard machined specimen using the whisker sensor, improves classification accuracy. Further, we experimentally validate that the sensor can classify texture with roughness depths as low as $2.5\mu m$ at an accuracy of $90\%$ or more and segregate materials based on their roughness and hardness. We present a novel metric to consider while designing a whisker sensor to guarantee the quality of texture data acquisition beforehand. The machine learning model performance was validated against the data collected from the laser sensor from the same set of surface textures. As part of our work, we are releasing two-dimensional texture data: roughness and hardness to the research community.
    Tag-Aware Document Representation for Research Paper Recommendation. (arXiv:2209.03660v1 [cs.IR])
    Finding online research papers relevant to one's interests is very challenging due to the increasing number of publications. Therefore, personalized research paper recommendation has become a significant and timely research topic. Collaborative filtering is a successful recommendation approach, which exploits the ratings given to items by users as a source of information for learning to make accurate recommendations. However, the ratings are often very sparse as in the research paper domain, due to the huge number of publications growing every year. Therefore, more attention has been drawn to hybrid methods that consider both ratings and content information. Nevertheless, most of the hybrid recommendation approaches that are based on text embedding have utilized bag-of-words techniques, which ignore word order and semantic meaning. In this paper, we propose a hybrid approach that leverages deep semantic representation of research papers based on social tags assigned by users. The experimental evaluation is performed on CiteULike, a real and publicly available dataset. The obtained findings show that the proposed model is effective in recommending research papers even when the rating data is very sparse.
    Securing the Spike: On the Transferabilty and Security of Spiking Neural Networks to Adversarial Examples. (arXiv:2209.03358v1 [cs.NE])
    Spiking neural networks (SNNs) have attracted much attention for their high energy efficiency and for recent advances in their classification performance. However, unlike traditional deep learning approaches, the analysis and study of the robustness of SNNs to adversarial examples remains relatively underdeveloped. In this work we advance the field of adversarial machine learning through experimentation and analyses of three important SNN security attributes. First, we show that successful white-box adversarial attacks on SNNs are highly dependent on the underlying surrogate gradient technique. Second, we analyze the transferability of adversarial examples generated by SNNs and other state-of-the-art architectures like Vision Transformers and Big Transfer CNNs. We demonstrate that SNNs are not often deceived by adversarial examples generated by Vision Transformers and certain types of CNNs. Lastly, we develop a novel white-box attack that generates adversarial examples capable of fooling both SNN models and non-SNN models simultaneously. Our experiments and analyses are broad and rigorous covering two datasets (CIFAR-10 and CIFAR-100), five different white-box attacks and twelve different classifier models.
    Kernel-Segregated Transpose Convolution Operation. (arXiv:2209.03704v1 [cs.LG])
    Transpose convolution has shown prominence in many deep learning applications. However, transpose convolution layers are computationally intensive due to the increased feature map size due to adding zeros after each element in each row and column. Thus, convolution operation on the expanded input feature map leads to poor utilization of hardware resources. The main reason for unnecessary multiplication operations is zeros at predefined positions in the input feature map. We propose an algorithmic-level optimization technique for the effective transpose convolution implementation to solve these problems. Based on kernel activations, we segregated the original kernel into four sub-kernels. This scheme could reduce memory requirements and unnecessary multiplications. Our proposed method was $3.09 (3.02) \times$ faster computation using the Titan X GPU (Intel Dual Core CPU) with a flower dataset from the Kaggle website. Furthermore, the proposed optimization method can be generalized to existing devices without additional hardware requirements. A simple deep learning model containing one transpose convolution layer was used to evaluate the optimization method. It showed $2.2 \times$ faster training using the MNIST dataset with an Intel Dual-core CPU than the conventional implementation.
    CAP: instance complexity-aware network pruning. (arXiv:2209.03534v1 [cs.LG])
    Existing differentiable channel pruning methods often attach scaling factors or masks behind channels to prune filters with less importance, and assume uniform contribution of input samples to filter importance. Specifically, the effects of instance complexity on pruning performance are not yet fully investigated. In this paper, we propose a simple yet effective differentiable network pruning method CAP based on instance complexity-aware filter importance scores. We define instance complexity related weight for each sample by giving higher weights to hard samples, and measure the weighted sum of sample-specific soft masks to model non-uniform contribution of different inputs, which encourages hard samples to dominate the pruning process and the model performance to be well preserved. In addition, we introduce a new regularizer to encourage polarization of the masks, such that a sweet spot can be easily found to identify the filters to be pruned. Performance evaluations on various network architectures and datasets demonstrate CAP has advantages over the state-of-the-arts in pruning large networks. For instance, CAP improves the accuracy of ResNet56 on CIFAR-10 dataset by 0.33% aftering removing 65.64% FLOPs, and prunes 87.75% FLOPs of ResNet50 on ImageNet dataset with only 0.89% Top-1 accuracy loss.  ( 2 min )
    Deep Learning-Based Automatic Diagnosis System for Developmental Dysplasia of the Hip. (arXiv:2209.03440v1 [eess.IV])
    As the first-line diagnostic imaging modality, radiography plays an essential role in the early detection of developmental dysplasia of the hip (DDH). Clinically, the diagnosis of DDH relies on manual measurements and subjective evaluation of different anatomical features from pelvic radiographs. This process is inefficient and error-prone and requires years of clinical experience. In this study, we propose a deep learning-based system that automatically detects 14 keypoints from a radiograph, measures three anatomical angles (center-edge, T\"onnis, and Sharp angles), and classifies DDH hips as grades I-IV based on the Crowe criteria. Moreover, a novel data-driven scoring system is proposed to quantitatively integrate the information from the three angles for DDH diagnosis. The proposed keypoint detection model achieved a mean (95% confidence interval [CI]) average precision of 0.807 (0.804-0.810). The mean (95% CI) intraclass correlation coefficients between the center-edge, Tonnis, and Sharp angles measured by the proposed model and the ground-truth were 0.957 (0.952-0.962), 0.947 (0.941-0.953), and 0.953 (0.947-0.960), respectively, which were significantly higher than those of experienced orthopedic surgeons (p<0.0001). In addition, the mean (95% CI) test diagnostic agreement (Cohen's kappa) obtained using the proposed scoring system was 0.84 (0.83-0.85), which was significantly higher than those obtained from diagnostic criteria for individual angle (0.76 [0.75-0.77]) and orthopedists (0.71 [0.63-0.79]). To the best of our knowledge, this is the first study for objective DDH diagnosis by leveraging deep learning keypoint detection and integrating different anatomical measurements, which can provide reliable and explainable support for clinical decision-making.  ( 3 min )
    Learned Image Compression with Generalized Octave Convolution and Cross-Resolution Parameter Estimation. (arXiv:2209.03353v1 [eess.IV])
    The application of the context-adaptive entropy model significantly improves the rate-distortion (R-D) performance, in which hyperpriors and autoregressive models are jointly utilized to effectively capture the spatial redundancy of the latent representations. However, the latent representations still contain some spatial correlations. In addition, these methods based on the context-adaptive entropy model cannot be accelerated in the decoding process by parallel computing devices, e.g. FPGA or GPU. To alleviate these limitations, we propose a learned multi-resolution image compression framework, which exploits the recently developed octave convolutions to factorize the latent representations into the high-resolution (HR) and low-resolution (LR) parts, similar to wavelet transform, which further improves the R-D performance. To speed up the decoding, our scheme does not use context-adaptive entropy model. Instead, we exploit an additional hyper layer including hyper encoder and hyper decoder to further remove the spatial redundancy of the latent representation. Moreover, the cross-resolution parameter estimation (CRPE) is introduced into the proposed framework to enhance the flow of information and further improve the rate-distortion performance. An additional information-fidelity loss is proposed to the total loss function to adjust the contribution of the LR part to the final bit stream. Experimental results show that our method separately reduces the decoding time by approximately 73.35 % and 93.44 % compared with that of state-of-the-art learned image compression methods, and the R-D performance is still better than H.266/VVC(4:2:0) and some learning-based methods on both PSNR and MS-SSIM metrics across a wide bit rates.  ( 3 min )
    A Survey of Neural Trees. (arXiv:2209.03415v1 [cs.LG])
    Neural networks (NNs) and decision trees (DTs) are both popular models of machine learning, yet coming with mutually exclusive advantages and limitations. To bring the best of the two worlds, a variety of approaches are proposed to integrate NNs and DTs explicitly or implicitly. In this survey, these approaches are organized in a school which we term as neural trees (NTs). This survey aims to present a comprehensive review of NTs and attempts to identify how they enhance the model interpretability. We first propose a thorough taxonomy of NTs that expresses the gradual integration and co-evolution of NNs and DTs. Afterward, we analyze NTs in terms of their interpretability and performance, and suggest possible solutions to the remaining challenges. Finally, this survey concludes with a discussion about other considerations like conditional computation and promising directions towards this field. A list of papers reviewed in this survey, along with their corresponding codes, is available at: https://github.com/zju-vipa/awesome-neural-trees  ( 2 min )
    The (Un)Scalability of Heuristic Approximators for NP-Hard Search Problems. (arXiv:2209.03393v1 [cs.AI])
    The A* algorithm is commonly used to solve NP-hard combinatorial optimization problems. When provided with an accurate heuristic function, A* can solve such problems in time complexity that is polynomial in the solution depth. This fact implies that accurate heuristic approximation for many such problems is also NP-hard. In this context, we examine a line of recent publications that propose the use of deep neural networks for heuristic approximation. We assert that these works suffer from inherent scalability limitations since -- under the assumption that P$\ne$NP -- such approaches result in either (a) network sizes that scale exponentially in the instance sizes or (b) heuristic approximation accuracy that scales inversely with the instance sizes. Our claim is supported by experimental results for three representative NP-hard search problems that show that fitting deep neural networks accurately to heuristic functions necessitates network sizes that scale exponentially with the instance size.  ( 2 min )
    Blessing of Class Diversity in Pre-training. (arXiv:2209.03447v1 [cs.LG])
    This paper presents a new statistical analysis aiming to explain the recent superior achievements of the pre-training techniques in natural language processing (NLP). We prove that when the classes of the pre-training task (e.g., different words in the masked language model task) are sufficiently diverse, in the sense that the least singular value of the last linear layer in pre-training (denoted as $\tilde{\nu}$) is large, then pre-training can significantly improve the sample efficiency of downstream tasks. Specially, we show the transfer learning excess risk enjoys an $O\left(\frac{1}{\tilde{\nu} \sqrt{n}}\right)$ rate, in contrast to the $O\left(\frac{1}{\sqrt{m}}\right)$ rate in the standard supervised learning. Here, $n$ is the number of pre-training data and $m$ is the number of data in the downstream task, and typically $n \gg m$. Our proof relies on a vector-form Rademacher complexity chain rule for disassembling composite function classes and a modified self-concordance condition. These techniques can be of independent interest.  ( 2 min )
    AILAB-Udine@SMM4H 22: Limits of Transformers and BERT Ensembles. (arXiv:2209.03452v1 [cs.CL])
    This paper describes the models developed by the AILAB-Udine team for the SMM4H 22 Shared Task. We explored the limits of Transformer based models on text classification, entity extraction and entity normalization, tackling Tasks 1, 2, 5, 6 and 10. The main take-aways we got from participating in different tasks are: the overwhelming positive effects of combining different architectures when using ensemble learning, and the great potential of generative models for term normalization.  ( 2 min )
    Bispectral Neural Networks. (arXiv:2209.03416v1 [cs.LG])
    We present a novel machine learning architecture, Bispectral Neural Networks (BNNs), for learning representations of data that are invariant to the actions of groups on the space over which a signal is defined. The model incorporates the ansatz of the bispectrum, an analytically defined group invariant that is complete--that is, it preserves all signal structure while removing only the variation due to group actions. Here, we demonstrate that BNNs are able to discover arbitrary commutative group structure in data, with the trained models learning the irreducible representations of the groups, which allows for the recovery of the group Cayley tables. Remarkably, trained networks learn to approximate bispectra on these groups, and thus possess the robustness, completeness, and generality of the analytical object.  ( 2 min )
    CLaCLab at SocialDisNER: Using Medical Gazetteers for Named-Entity Recognition of Disease Mentions in Spanish Tweets. (arXiv:2209.03528v1 [cs.CL])
    This paper summarizes the CLaC submission for SMM4H 2022 Task 10 which concerns the recognition of diseases mentioned in Spanish tweets. Before classifying each token, we encode each token with a transformer encoder using features from Multilingual RoBERTa Large, UMLS gazetteer, and DISTEMIST gazetteer, among others. We obtain a strict F1 score of 0.869, with competition mean of 0.675, standard deviation of 0.245, and median of 0.761.  ( 2 min )
    Generative Adversarial Super-Resolution at the Edge with Knowledge Distillation. (arXiv:2209.03355v1 [eess.IV])
    Single-Image Super-Resolution can support robotic tasks in environments where a reliable visual stream is required to monitor the mission, handle teleoperation or study relevant visual details. In this work, we propose an efficient Generative Adversarial Network model for real-time Super-Resolution. We adopt a tailored architecture of the original SRGAN and model quantization to boost the execution on CPU and Edge TPU devices, achieving up to 200 fps inference. We further optimize our model by distilling its knowledge to a smaller version of the network and obtain remarkable improvements compared to the standard training approach. Our experiments show that our fast and lightweight model preserves considerably satisfying image quality compared to heavier state-of-the-art models. Finally, we conduct experiments on image transmission with bandwidth degradation to highlight the advantages of the proposed system for mobile robotic applications.  ( 2 min )
    Causal discovery for time series with latent confounders. (arXiv:2209.03427v1 [stat.ML])
    Reconstructing the causal relationships behind the phenomena we observe is a fundamental challenge in all areas of science. Discovering causal relationships through experiments is often infeasible, unethical, or expensive in complex systems. However, increases in computational power allow us to process the ever-growing amount of data that modern science generates, leading to an emerging interest in the causal discovery problem from observational data. This work evaluates the LPCMCI algorithm, which aims to find generators compatible with a multi-dimensional, highly autocorrelated time series while some variables are unobserved. We find that LPCMCI performs much better than a random algorithm mimicking not knowing anything but is still far from optimal detection. Furthermore, LPCMCI performs best on auto-dependencies, then contemporaneous dependencies, and struggles most with lagged dependencies. The source code of this project is available online.  ( 2 min )
    Machine Learning Sensors for Diagnosis of COVID-19 Disease Using Routine Blood Values for Internet of Things Application. (arXiv:2209.03522v1 [cs.LG])
    Healthcare digitalization needs effective methods of human sensorics, when various parameters of the human body are instantly monitored in everyday life and connected to the Internet of Things (IoT). In particular, Machine Learning (ML) sensors for the prompt diagnosis of COVID-19 is an important case for IoT application in healthcare and Ambient Assistance Living (AAL). Determining the infected status of COVID-19 with various diagnostic tests and imaging results is costly and time-consuming. The aim of this study is to provide a fast, reliable and economical alternative tool for the diagnosis of COVID-19 based on the Routine Blood Values (RBV) values measured at admission. The dataset of the study consists of a total of 5296 patients with the same number of negative and positive COVID-19 test results and 51 routine blood values. In this study, 13 popular classifier machine learning models and LogNNet neural network model were exanimated. The most successful classifier model in terms of time and accuracy in the detection of the disease was the Histogram-based Gradient Boosting (HGB). The HGB classifier identified the 11 most important features (LDL, Cholesterol, HDL-C, MCHC, Triglyceride, Amylase, UA, LDH, CK-MB, ALP and MCH) to detect the disease with 100% accuracy, learning time 6.39 sec. In addition, the importance of single, double and triple combinations of these features in the diagnosis of the disease was discussed. We propose to use these 11 traits and their combinations as important biomarkers for ML sensors in diagnosis of the disease, supporting edge computing on Arduino and cloud IoT service.  ( 3 min )
    A Greedy Algorithm for Building Compact Binary Activated Neural Networks. (arXiv:2209.03450v1 [cs.LG])
    We study binary activated neural networks in the context of regression tasks, provide guarantees on the expressiveness of these particular networks and propose a greedy algorithm for building such networks. Aiming for predictors having small resources needs, the greedy approach does not need to fix in advance an architecture for the network: this one is built one layer at a time, one neuron at a time, leading to predictors that aren't needlessly wide and deep for a given task. Similarly to boosting algorithms, our approach guarantees a training loss reduction every time a neuron is added to a layer. This greatly differs from most binary activated neural networks training schemes that rely on stochastic gradient descent (circumventing the 0-almost-everywhere derivative problem of the binary activation function by surrogates such as the straight through estimator or continuous binarization). We show that our method provides compact and sparse predictors while obtaining similar performances to state-of-the-art methods for training binary activated networks.  ( 2 min )
    Beyond Random Split for Assessing Statistical Model Performance. (arXiv:2209.03346v1 [cs.LG])
    Even though a train/test split of the dataset randomly performed is a common practice, could not always be the best approach for estimating performance generalization under some scenarios. The fact is that the usual machine learning methodology can sometimes overestimate the generalization error when a dataset is not representative or when rare and elusive examples are a fundamental aspect of the detection problem. In the present work, we analyze strategies based on the predictors' variability to split in training and testing sets. Such strategies aim at guaranteeing the inclusion of rare or unusual examples with a minimal loss of the population's representativeness and provide a more accurate estimation about the generalization error when the dataset is not representative. Two baseline classifiers based on decision trees were used for testing the four splitting strategies considered. Both classifiers were applied on CTU19 a low-representative dataset for a network security detection problem. Preliminary results showed the importance of applying the three alternative strategies to the Monte Carlo splitting strategy in order to get a more accurate error estimation on different but feasible scenarios.  ( 2 min )
    Foundations and Recent Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions. (arXiv:2209.03430v1 [cs.LG])
    Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design computer agents with intelligent capabilities such as understanding, reasoning, and learning through integrating multiple communicative modalities, including linguistic, acoustic, visual, tactile, and physiological messages. With the recent interest in video understanding, embodied autonomous agents, text-to-image generation, and multisensor fusion in application domains such as healthcare and robotics, multimodal machine learning has brought unique computational and theoretical challenges to the machine learning community given the heterogeneity of data sources and the interconnections often found between modalities. However, the breadth of progress in multimodal research has made it difficult to identify the common themes and open questions in the field. By synthesizing a broad range of application domains and theoretical frameworks from both historical and recent perspectives, this paper is designed to provide an overview of the computational and theoretical foundations of multimodal machine learning. We start by defining two key principles of modality heterogeneity and interconnections that have driven subsequent innovations, and propose a taxonomy of 6 core technical challenges: representation, alignment, reasoning, generation, transference, and quantification covering historical and recent trends. Recent technical achievements will be presented through the lens of this taxonomy, allowing researchers to understand the similarities and differences across new approaches. We end by motivating several open problems for future research as identified by our taxonomy.  ( 3 min )
    Higher-order Clustering and Pooling for Graph Neural Networks. (arXiv:2209.03473v1 [cs.LG])
    Graph Neural Networks achieve state-of-the-art performance on a plethora of graph classification tasks, especially due to pooling operators, which aggregate learned node embeddings hierarchically into a final graph representation. However, they are not only questioned by recent work showing on par performance with random pooling, but also ignore completely higher-order connectivity patterns. To tackle this issue, we propose HoscPool, a clustering-based graph pooling operator that captures higher-order information hierarchically, leading to richer graph representations. In fact, we learn a probabilistic cluster assignment matrix end-to-end by minimising relaxed formulations of motif spectral clustering in our objective function, and we then extend it to a pooling operator. We evaluate HoscPool on graph classification tasks and its clustering component on graphs with ground-truth community structure, achieving best performance. Lastly, we provide a deep empirical analysis of pooling operators' inner functioning.  ( 2 min )
    A Survey on Automated Diagnosis of Alzheimer's Disease Using Optical Coherence Tomography and Angiography. (arXiv:2209.03354v1 [eess.IV])
    Retinal optical coherence tomography (OCT) and optical coherence tomography angiography (OCTA) are promising tools for the (early) diagnosis of Alzheimer's disease (AD). These non-invasive imaging techniques are cost-effective and more accessible than alternative neuroimaging tools. However, interpreting and classifying multi-slice scans produced by OCT devices is time-consuming and challenging even for trained practitioners. There are surveys on machine learning and deep learning approaches concerning the automated analysis of OCT scans for various diseases such as glaucoma. However, the current literature lacks an extensive survey on the diagnosis of Alzheimer's disease or cognitive impairment using OCT or OCTA. This has motivated us to do a comprehensive survey aimed at machine/deep learning scientists or practitioners who require an introduction to the problem. The paper contains 1) an introduction to the medical background of Alzheimer's Disease and Cognitive Impairment and their diagnosis using OCT and OCTA imaging modalities, 2) a review of various technical proposals for the problem and the sub-problems from an automated analysis perspective, 3) a systematic review of the recent deep learning studies and available OCT/OCTA datasets directly aimed at the diagnosis of Alzheimer's Disease and Cognitive Impairment. For the latter, we used Publish or Perish Software to search for the relevant studies from various sources such as Scopus, PubMed, and Web of Science. We followed the PRISMA approach to screen an initial pool of 3073 references and determined ten relevant studies (N=10, out of 3073) that directly targeted AD diagnosis. We identified the lack of open OCT/OCTA datasets (about Alzheimer's disease) as the main issue that is impeding the progress in the field.  ( 3 min )
    A hybrid Bayesian network for medical device risk assessment and management. (arXiv:2209.03352v1 [cs.LG])
    ISO 14971 is the primary standard used for medical device risk management. While it specifies the requirements for medical device risk management, it does not specify a particular method for performing risk management. Hence, medical device manufacturers are free to develop or use any appropriate methods for managing the risk of medical devices. The most commonly used methods, such as Fault Tree Analysis (FTA), are unable to provide a reasonable basis for computing risk estimates when there are limited or no historical data available or where there is second-order uncertainty about the data. In this paper, we present a novel method for medical device risk management using hybrid Bayesian networks (BNs) that resolves the limitations of classical methods such as FTA and incorporates relevant factors affecting the risk of medical devices. The proposed BN method is generic but can be instantiated on a system-by-system basis, and we apply it to a Defibrillator device to demonstrate the process involved for medical device risk management during production and post-production. The example is validated against real-world data.  ( 2 min )
    Peer to Peer Learning Platform Optimized With Machine Learning. (arXiv:2209.03489v1 [cs.CY])
    HELM Learning (Helping Everyone Learn More) is the first online peer-to-peer learning platform which allows students (typically middle-to-high school students) to teach classes and students (typically elementary-to-middle school students) to learn from classes for free. This method of class structure (peer-to-peer learning) has been proven effective for learning, as it promotes teamwork and collaboration, and enables active learning. HELM is a unique platform as it provides an easy process for students to create, teach and learn topics in a structured, peer-to-peer environment. Since HELM was created in April 2020, it has gotten over 4000 student sign ups and 80 teachers, in 4 continents around the world. HELM has grown from a simple website-and-Google-Form platform to having a backend system coded with Python, SQL, JavaScript and HTML, hosted on an AWS service. This not only makes it easier for students to sign up (as the students' information is saved in an SQL database, meaning they can sign up for classes without having to put in their information again, as well as getting automated emails about their classes), but also makes it easier for teachers to teach (as supplemental processes such as creating Zoom links, class recording folders, sending emails to students, etc. are done automatically). In addition, HELM has a recommendation machine learning algorithm which suggests classes and subjects students would enjoy taking, based on the previous classes a student has taken. This has created an easier experience for students to sign up for classes they are interested in.  ( 3 min )
    Implicit Full Waveform Inversion with Deep Neural Representation. (arXiv:2209.03525v1 [physics.geo-ph])
    Full waveform inversion (FWI) commonly stands for the state-of-the-art approach for imaging subsurface structures and physical parameters, however, its implementation usually faces great challenges, such as building a good initial model to escape from local minima, and evaluating the uncertainty of inversion results. In this paper, we propose the implicit full waveform inversion (IFWI) algorithm using continuously and implicitly defined deep neural representations. Compared to FWI, which is sensitive to the initial model, IFWI benefits from the increased degrees of freedom with deep learning optimization, thus allowing to start from a random initialization, which greatly reduces the risk of non-uniqueness and being trapped in local minima. Both theoretical and experimental analyses indicates that, given a random initial model, IFWI is able to converge to the global minimum and produce a high-resolution image of subsurface with fine structures. In addition, uncertainty analysis of IFWI can be easily performed by approximating Bayesian inference with various deep learning approaches, which is analyzed in this paper by adding dropout neurons. Furthermore, IFWI has a certain degree of robustness and strong generalization ability that are exemplified in the experiments of various 2D geological models. With proper setup, IFWI can also be well suited for multi-scale joint geophysical inversion.  ( 2 min )
  • Open

    Quantum Sparse Coding. (arXiv:2209.03788v1 [quant-ph])
    The ultimate goal of any sparse coding method is to accurately recover from a few noisy linear measurements, an unknown sparse vector. Unfortunately, this estimation problem is NP-hard in general, and it is therefore always approached with an approximation method, such as lasso or orthogonal matching pursuit, thus trading off accuracy for less computational complexity. In this paper, we develop a quantum-inspired algorithm for sparse coding, with the premise that the emergence of quantum computers and Ising machines can potentially lead to more accurate estimations compared to classical approximation methods. To this end, we formulate the most general sparse coding problem as a quadratic unconstrained binary optimization (QUBO) task, which can be efficiently minimized using quantum technology. To derive at a QUBO model that is also efficient in terms of the number of spins (space complexity), we separate our analysis into three different scenarios. These are defined by the number of bits required to express the underlying sparse vector: binary, 2-bit, and a general fixed-point representation. We conduct numerical experiments with simulated data on LightSolver's quantum-inspired digital platform to verify the correctness of our QUBO formulation and to demonstrate its advantage over baseline methods.  ( 2 min )
    Improved Robust Algorithms for Learning with Discriminative Feature Feedback. (arXiv:2209.03753v1 [cs.LG])
    Discriminative Feature Feedback is a setting proposed by Dastupta et al. (2018), which provides a protocol for interactive learning based on feature explanations that are provided by a human teacher. The features distinguish between the labels of pairs of possibly similar instances. That work has shown that learning in this model can have considerable statistical and computational advantages over learning in standard label-based interactive learning models. In this work, we provide new robust interactive learning algorithms for the Discriminative Feature Feedback model, with mistake bounds that are significantly lower than those of previous robust algorithms for this setting. In the adversarial setting, we reduce the dependence on the number of protocol exceptions from quadratic to linear. In addition, we provide an algorithm for a slightly more restricted model, which obtains an even smaller mistake bound for large models with many exceptions. In the stochastic setting, we provide the first algorithm that converges to the exception rate with a polynomial sample complexity. Our algorithm and analysis for the stochastic setting involve a new construction that we call Feature Influence, which may be of wider applicability.
    Learning polytopes with fixed facet directions. (arXiv:2201.03419v3 [math.MG] UPDATED)
    We consider the task of reconstructing polytopes with fixed facet directions from finitely many support function evaluations. We show that for a fixed simplicial normal fan the least-squares estimate is given by a convex quadratic program. We study the geometry of the solution set and give a combinatorial characterization for the uniqueness of the reconstruction in this case. We provide an algorithm that, under mild assumptions, converges to the unknown input shape as the number of noisy support function evaluations increases. We also discuss limitations of our results if the restriction on the normal fan is removed.
    Sparsity in long-time control of neural ODEs. (arXiv:2102.13566v3 [cs.LG] UPDATED)
    We consider the neural ODE and optimal control perspective of supervised learning, with $\ell^1$-control penalties, where rather than only minimizing a final cost (the \emph{empirical risk}) for the state, we integrate this cost over the entire time horizon. We prove that any optimal control (for this cost) vanishes beyond some positive stopping time. When seen in the discrete-time context, this result entails an \emph{ordered} sparsity pattern for the parameters of the associated residual neural network: ordered in the sense that these parameters are all $0$ beyond a certain layer. Furthermore, we provide a polynomial stability estimate for the empirical risk with respect to the time horizon. This can be seen as a \emph{turnpike property}, for nonsmooth dynamics and functionals with $\ell^1$-penalties, and without any smallness assumptions on the data, both of which are new in the literature.
    Bayesian regularization of empirical MDPs. (arXiv:2208.02362v2 [cs.LG] UPDATED)
    In most applications of model-based Markov decision processes, the parameters for the unknown underlying model are often estimated from the empirical data. Due to noise, the policy learnedfrom the estimated model is often far from the optimal policy of the underlying model. When applied to the environment of the underlying model, the learned policy results in suboptimal performance, thus calling for solutions with better generalization performance. In this work we take a Bayesian perspective and regularize the objective function of the Markov decision process with prior information in order to obtain more robust policies. Two approaches are proposed, one based on $L^1$ regularization and the other on relative entropic regularization. We evaluate our proposed algorithms on synthetic simulations and on real-world search logs of a large scale online shopping store. Our results demonstrate the robustness of regularized MDP policies against the noise present in the models.
    W-Transformers : A Wavelet-based Transformer Framework for Univariate Time Series Forecasting. (arXiv:2209.03945v1 [cs.LG])
    Deep learning utilizing transformers has recently achieved a lot of success in many vital areas such as natural language processing, computer vision, anomaly detection, and recommendation systems, among many others. Among several merits of transformers, the ability to capture long-range temporal dependencies and interactions is desirable for time series forecasting, leading to its progress in various time series applications. In this paper, we build a transformer model for non-stationary time series. The problem is challenging yet crucially important. We present a novel framework for univariate time series representation learning based on the wavelet-based transformer encoder architecture and call it W-Transformer. The proposed W-Transformers utilize a maximal overlap discrete wavelet transformation (MODWT) to the time series data and build local transformers on the decomposed datasets to vividly capture the nonstationarity and long-range nonlinear dependencies in the time series. Evaluating our framework on several publicly available benchmark time series datasets from various domains and with diverse characteristics, we demonstrate that it performs, on average, significantly better than the baseline forecasters for short-term and long-term forecasting, even for datasets that consist of only a few hundred training samples.
    Known by the company we keep: `Triadic influence' as a proxy for compatibility in social relationships. (arXiv:2209.03683v1 [cs.SI])
    Networks of social interactions are the substrate upon which civilizations are built. Often, we create new bonds with people that we like or feel that our relationships are damaged through the intervention of third parties. Despite their importance and the huge impact that these processes have in our lives, quantitative scientific understanding of them is still in its infancy, mainly due to the difficulty of collecting large datasets of social networks including individual attributes. In this work, we present a thorough study of real social networks of 13 schools, with more than 3,000 students and 60,000 declared positive and negative relations, including tests for personal traits of all the students. We introduce a metric -- the `triadic influence' -- that measures the influence of nearest-neighbors in the relationships of their contacts. We use neural networks to predict the relationships and to extract the probability that two students are friends or enemies depending on their personal attributes or the triadic influence. We alternatively use a high-dimensional embedding of the network structure to also predict the relationships. Remarkably, the triadic influence (a simple one-dimensional metric) achieves the highest accuracy at predicting the relationship between two students. We postulate that the probabilities extracted from the neural networks -- functions of the triadic influence and the personalities of the students -- control the evolution of real social networks, opening a new avenue for the quantitative study of these systems.
    Blessing of Class Diversity in Pre-training. (arXiv:2209.03447v1 [cs.LG])
    This paper presents a new statistical analysis aiming to explain the recent superior achievements of the pre-training techniques in natural language processing (NLP). We prove that when the classes of the pre-training task (e.g., different words in the masked language model task) are sufficiently diverse, in the sense that the least singular value of the last linear layer in pre-training (denoted as $\tilde{\nu}$) is large, then pre-training can significantly improve the sample efficiency of downstream tasks. Specially, we show the transfer learning excess risk enjoys an $O\left(\frac{1}{\tilde{\nu} \sqrt{n}}\right)$ rate, in contrast to the $O\left(\frac{1}{\sqrt{m}}\right)$ rate in the standard supervised learning. Here, $n$ is the number of pre-training data and $m$ is the number of data in the downstream task, and typically $n \gg m$. Our proof relies on a vector-form Rademacher complexity chain rule for disassembling composite function classes and a modified self-concordance condition. These techniques can be of independent interest.
    Model-free Subsampling Method Based on Uniform Designs. (arXiv:2209.03617v1 [stat.ME])
    Subsampling or subdata selection is a useful approach in large-scale statistical learning. Most existing studies focus on model-based subsampling methods which significantly depend on the model assumption. In this paper, we consider the model-free subsampling strategy for generating subdata from the original full data. In order to measure the goodness of representation of a subdata with respect to the original data, we propose a criterion, generalized empirical F-discrepancy (GEFD), and study its theoretical properties in connection with the classical generalized L2-discrepancy in the theory of uniform designs. These properties allow us to develop a kind of low-GEFD data-driven subsampling method based on the existing uniform designs. By simulation examples and a real case study, we show that the proposed subsampling method is superior to the random sampling method. Moreover, our method keeps robust under diverse model specifications while other popular subsampling methods are under-performing. In practice, such a model-free property is more appealing than the model-based subsampling methods, where the latter may have poor performance when the model is misspecified, as demonstrated in our simulation studies.
    PredDiff: Explanations and Interactions from Conditional Expectations. (arXiv:2102.13519v4 [cs.LG] UPDATED)
    PredDiff is a model-agnostic, local attribution method that is firmly rooted in probability theory. Its simple intuition is to measure prediction changes while marginalizing features. In this work, we clarify properties of PredDiff and its close connection to Shapley values. We stress important differences between classification and regression, which require a specific treatment within both formalisms. We extend PredDiff by introducing a new, well-founded measure for interaction effects between arbitrary feature subsets. The study of interaction effects represents an inevitable step towards a comprehensive understanding of black-box models and is particularly important for science applications. Equipped with our novel interaction measure, PredDiff is a promising model-agnostic approach for obtaining reliable, numerically inexpensive and theoretically sound attributions.
    Causal discovery for time series with latent confounders. (arXiv:2209.03427v1 [stat.ML])
    Reconstructing the causal relationships behind the phenomena we observe is a fundamental challenge in all areas of science. Discovering causal relationships through experiments is often infeasible, unethical, or expensive in complex systems. However, increases in computational power allow us to process the ever-growing amount of data that modern science generates, leading to an emerging interest in the causal discovery problem from observational data. This work evaluates the LPCMCI algorithm, which aims to find generators compatible with a multi-dimensional, highly autocorrelated time series while some variables are unobserved. We find that LPCMCI performs much better than a random algorithm mimicking not knowing anything but is still far from optimal detection. Furthermore, LPCMCI performs best on auto-dependencies, then contemporaneous dependencies, and struggles most with lagged dependencies. The source code of this project is available online.
    Meta Clustering for Collaborative Learning. (arXiv:2006.00082v2 [cs.LG] UPDATED)
    In collaborative learning, learners coordinate to enhance each of their learning performances. From the perspective of any learner, a critical challenge is to filter out unqualified collaborators. We propose a framework named meta clustering to address the challenge. Unlike the classical problem of clustering data points, meta clustering categorizes learners. Assuming each learner performs a supervised regression on a standalone local dataset, we propose a Select-Exchange-Cluster (SEC) method to classify the learners by their underlying supervised functions. We theoretically show that the SEC can cluster learners into accurate collaboration sets. Empirical studies corroborate the theoretical analysis and demonstrate that SEC can be computationally efficient, robust against learner heterogeneity, and effective in enhancing single-learner performance. Also, we show how the proposed approach may be used to enhance data fairness. Supplementary materials for this article are available online.
    Causal Forecasting:Generalization Bounds for Autoregressive Models. (arXiv:2111.09831v2 [stat.ML] UPDATED)
    Despite the increasing relevance of forecasting methods, causal implications of these algorithms remain largely unexplored. This is concerning considering that, even under simplifying assumptions such as causal sufficiency, the statistical risk of a model can differ significantly from its \textit{causal risk}. Here, we study the problem of \textit{causal generalization} -- generalizing from the observational to interventional distributions -- in forecasting. Our goal is to find answers to the question: How does the efficacy of an autoregressive (VAR) model in predicting statistical associations compare with its ability to predict under interventions? To this end, we introduce the framework of \textit{causal learning theory} for forecasting. Using this framework, we obtain a characterization of the difference between statistical and causal risks, which helps identify sources of divergence between them. Under causal sufficiency, the problem of causal generalization amounts to learning under covariate shifts, albeit with additional structure (restriction to interventional distributions under the VAR model). This structure allows us to obtain uniform convergence bounds on causal generalizability for the class of VAR models. To the best of our knowledge, this is the first work that provides theoretical guarantees for causal generalization in the time-series setting.
    Trace-class Gaussian priors for Bayesian learning of neural networks with MCMC. (arXiv:2012.10943v3 [stat.ME] UPDATED)
    This paper introduces a new neural network based prior for real valued functions on $\mathbb R^d$ which, by construction, is more easily and cheaply scaled up in the domain dimension $d$ compared to the usual Karhunen-Lo\`eve function space prior. The new prior is a Gaussian neural network prior, where each weight and bias has an independent Gaussian prior, but with the key difference that the variances decrease in the width of the network in such a way that the resulting function is \emph{almost surely} well defined in the limit of an infinite width network. We show that in a Bayesian treatment of inferring unknown functions, the induced posterior over functions is amenable to Monte Carlo sampling using Hilbert space Markov chain Monte Carlo (MCMC) methods. This type of MCMC is popular, e.g. in the Bayesian Inverse Problems literature, because it is stable under \emph{mesh refinement}, i.e. the acceptance probability does not shrink to $0$ as more parameters of the function's prior are introduced, even \emph{ad infinitum}. In numerical examples we demonstrate these stated competitive advantages over other function space priors. We also implement examples in Bayesian Reinforcement Learning to automate tasks from data and demonstrate, for the first time, stability of MCMC to mesh refinement for these type of problems.
    Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three Regimes. (arXiv:2209.03695v1 [cs.LG])
    A fundamental property of deep learning normalization techniques, such as batch normalization, is making the pre-normalization parameters scale invariant. The intrinsic domain of such parameters is the unit sphere, and therefore their gradient optimization dynamics can be represented via spherical optimization with varying effective learning rate (ELR), which was studied previously. In this work, we investigate the properties of training scale-invariant neural networks directly on the sphere using a fixed ELR. We discover three regimes of such training depending on the ELR value: convergence, chaotic equilibrium, and divergence. We study these regimes in detail both on a theoretical examination of a toy example and on a thorough empirical analysis of real scale-invariant deep learning models. Each regime has unique features and reflects specific properties of the intrinsic loss landscape, some of which have strong parallels with previous research on both regular and scale-invariant neural networks training. Finally, we demonstrate how the discovered regimes are reflected in conventional training of normalized networks and how they can be leveraged to achieve better optima.
    Regularized Frank-Wolfe for Dense CRFs: Generalizing Mean Field and Beyond. (arXiv:2110.14759v2 [cs.LG] UPDATED)
    We introduce regularized Frank-Wolfe, a general and effective algorithm for inference and learning of dense conditional random fields (CRFs). The algorithm optimizes a nonconvex continuous relaxation of the CRF inference problem using vanilla Frank-Wolfe with approximate updates, which are equivalent to minimizing a regularized energy function. Our proposed method is a generalization of existing algorithms such as mean field or concave-convex procedure. This perspective not only offers a unified analysis of these algorithms, but also allows an easy way of exploring different variants that potentially yield better performance. We illustrate this in our empirical results on standard semantic segmentation datasets, where several instantiations of our regularized Frank-Wolfe outperform mean field inference, both as a standalone component and as an end-to-end trainable layer in a neural network. We also show that dense CRFs, coupled with our new algorithms, produce significant improvements over strong CNN baselines.
    End-to-end Robustness for Sensing-Reasoning Machine Learning Pipelines. (arXiv:2003.00120v4 [cs.LG] UPDATED)
    Intensive algorithmic efforts have been made to enable the rapid improvements of certificated robustness for complex ML models recently. However, current robustness certification methods are only able to certify under a limited perturbation radius. Given that existing pure data-driven statistical approaches have reached a bottleneck, in this paper, we propose to integrate statistical ML models with knowledge (expressed as logical rules) as a reasoning component using Markov logic networks (MLN, so as to further improve the overall certified robustness. This opens new research questions about certifying the robustness of such a paradigm, especially the reasoning component (e.g., MLN). As the first step towards understanding these questions, we first prove that the computational complexity of certifying the robustness of MLN is #P-hard. Guided by this hardness result, we then derive the first certified robustness bound for MLN by carefully analyzing different model regimes. Finally, we conduct extensive experiments on five datasets including both high-dimensional images and natural language texts, and we show that the certified robustness with knowledge-based logical reasoning indeed significantly outperforms that of the state-of-the-art.
    E-LMC: Extended Linear Model of Coregionalization for Spatial Field Prediction. (arXiv:2203.00525v2 [cs.LG] UPDATED)
    Physical simulations based on partial differential equations typically generate spatial fields results, which are utilized to calculate specific properties of a system for engineering design and optimization. Due to the intensive computational burden of the simulations, a surrogate model mapping the low-dimensional inputs to the spatial fields are commonly built based on a relatively small dataset. To resolve the challenge of predicting the whole spatial field, the popular linear model of coregionalization (LMC) can disentangle complicated correlations within the high-dimensional spatial field outputs and deliver accurate predictions. However, LMC fails if the spatial field cannot be well approximated by a linear combination of base functions with latent processes. In this paper, we present the Extended Linear Model of Coregionalization (E-LMC) by introducing an invertible neural network to linearize the highly complex and nonlinear spatial fields so that the LMC can easily generalize to nonlinear problems while preserving the traceability and scalability. Several real-world applications demonstrate that E-LMC can exploit spatial correlations effectively, showing a maximum improvement of about 40% over the original LMC and outperforming the other state-of-the-art spatial field models.
    Multiobjective Ranking and Selection Using Stochastic Kriging. (arXiv:2209.03919v1 [stat.ML])
    We consider multiobjective simulation optimization problems, where several conflicting objectives are optimized simultaneously, and can only be observed via stochastic simulation. The goal is to find or approximate a (discrete) set of Pareto-optimal solutions that reveal the essential trade-offs between the objectives, where optimality means that no objective can be improved without deteriorating the quality of any other objective. The noise in the observed performance may lead to two possible misclassification errors: solutions that are truly Pareto-optimal can be wrongly considered dominated, and solutions that are truly dominated can be wrongly considered Pareto-optimal. We propose a Bayesian multiobjective ranking and selection method to reduce the number of errors when identifying the solutions with the true best expected performance. We use stochastic kriging metamodels to build reliable predictive distributions of the objectives, and exploit this information in two efficient screening procedures and two novel sampling criteria. We use these in a sequential sampling algorithm to decide how to allocate samples. Experimental results show that the proposed method only requires a small fraction of samples compared to the standard allocation method, and it's competitive against the state-of-the-art, with the exploitation of the correlation structure being the dominant contributor to the improvement.
    Data Feedback Loops: Model-driven Amplification of Dataset Biases. (arXiv:2209.03942v1 [cs.LG])
    Datasets scraped from the internet have been critical to the successes of large-scale machine learning. Yet, this very success puts the utility of future internet-derived datasets at potential risk, as model outputs begin to replace human annotations as a source of supervision. In this work, we first formalize a system where interactions with one model are recorded as history and scraped as training data in the future. We then analyze its stability over time by tracking changes to a test-time bias statistic (e.g. gender bias of model predictions). We find that the degree of bias amplification is closely linked to whether the model's outputs behave like samples from the training distribution, a behavior which we characterize and define as consistent calibration. Experiments in three conditional prediction scenarios - image classification, visual role-labeling, and language generation - demonstrate that models that exhibit a sampling-like behavior are more calibrated and thus more stable. Based on this insight, we propose an intervention to help calibrate and stabilize unstable feedback systems. Code is available at https://github.com/rtaori/data_feedback.

  • Open

    [D] Are AI related jobs safer from automation that programming jobs?
    Or what kind of AI job is safer? They seem to be equally unsafe. submitted by /u/FranciscoJ1618 [link] [comments]  ( 89 min )
    [D] RFE Vs Backward Elimination
    I am reading many methods for feature selection such as filter, wrapper, embedded, hybrid and advanced. One example of wrapper method is backward elimination and one example for hybrid method is recursive feature elimination. What is the difference between these 2 methods ? RFE first creates a model (based on what we specify in the estimator parameter, it could be randomforest, decision tree etc) and keeps removing the variables until the desired number is reached ? This sounds similar to backward elimination. Can someone kindly highlight the differences between these two methods ? submitted by /u/EntrepreneurSea4839 [link] [comments]  ( 88 min )
    [P] Waifu-Diffusion: a Stable Diffusion model finetuned on 56k Danbooru images
    waifu-diffusion is a latent text-to-image diffusion model that has been conditioned on high-quality anime images through fine-tuning. huggingface model: https://huggingface.co/hakurei/waifu-diffusion colab notebook with gradio demo: https://colab.research.google.com/drive/1_8wPN7dJO746QXsFnB09Uq2VGgSRFuYE#scrollTo=1HaCauSq546O Model Description The model originally used for fine-tuning is Stable Diffusion V1-4, which is a latent image diffusion model trained on LAION2B-en. The current model has been fine-tuned with a learning rate of 5.0e-6 for 4 epochs on 56k Danbooru text-image pairs which all have an aesthetic rating greater than 6.0 Training Data & Annotative Prompting The data used for fine-tuning has come from a random sample of 56k Danbooru images, which were filtered based on CLIP Aesthetic Scoring where only images with an aesthetic score greater than 6.0were used. Captions are Danbooru-style captions. original post from: https://www.reddit.com/r/StableDiffusion/comments/x8y1u3/waifudiffusion_v12_a_sd_14_model_finetuned_on_56k/. submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 89 min )
    [D] Am I introducing leakage in the situation below ?
    Hello ! I have a bunch of features about a bunch of individuals and a brand X. I'm trying to model P[is_customer_of_X]. One of the features I have is "Total Spending on category A" and X is a part of that category. I'm unsure about if I should exclude spending at X when generating that feature or if it's not necessary. Initially, I leaned towards excluding it to avoid any sort of leakage but then I thought about it and felt like by entirely removing spending at X, I'm making the implicit assumption that the individual wouldn't have spent that money on another brand of category A had X not been available to them and I think it's a pretty strong / false assumption. Example : If my favorite sushi place shuts down, I will still get sushi from a different spot. So my "Total Spending on Sushi" would still pretty much remain the same. Also, if we're thinking about an extreme case of a very loyal customer : Spent 1k$ over the last year in a sushi place and 0$ in any other sushi place. If I'm building "Total spending on sushi" by excluding the brand of interest, they get a 0 but the binary variable "is_customer" is still 1, the model loses valuable signal. Whereas if I say they spent 1000$ on sushi, it's more likely the model gets that right. I feel like if my target variable was "Spending on brand X", 10000% it should have been exluded. But since I'm only modelling P[is_customer_of_X], I think it's less of an issue. ​ Thoughts ? submitted by /u/mxj7 [link] [comments]  ( 90 min )
    [P] How I fixed over 50 label issues in a popular semantic segmentation dataset
    Hi folks! I've made a new technique for finding errors in semantic segmentation datasets, using new explainable AI techniques from my PhD. I was frankly pretty surprised to be able to find over 50 different error patterns , and 7% of total pixels labelled incorrectly, in MIT ADE20K (one of the most widely used segmentation datasets). In the corrected dataset, some of the less common classes are tripled in size. While there's been some work on improving ML datasets generally, as far as I know this is the only work on semantic segmentation, where each pixel in an image is given a label. I would love to get the communities thoughts on this. I am also building a company, so if you're interested in using this in your work feel free to DM me. To learn more about the results (and see more pictures), check out my article: https://medium.com/@jamie_34747/how-i-found-over-50-label-issues-in-a-popular-semantic-segmentation-dataset-95b025a6f1b5?source=friends_link&sk=59763b5bcd810230bcf20b2ca4e6fa0e If interested, I also did similar work on object detection in MS COCO, where I found nearly 300,000 annotation errors: https://medium.com/@jamie_34747/79d382edf22b?source=friends_link&sk=d36ad07c074818c48d8f421f6ed104cd. Example labels from MIT ADE20K. When trees overlap with the sky, it is sometimes labeled as “tree”, sometimes labeled as “sky” submitted by /u/AGI_aint_happening [link] [comments]  ( 119 min )
    [P] The Reddit Climate Change Dataset - an exploration of climate change discussion on Reddit (621K posts, 4.6M comments) (CC-BY)
    Hey all, We have compiled a Reddit post and comment dataset for your analysis. It aims to contain all climate change discussion on Reddit in a set of CSV files - hopefully helping bridge real world problems with solutions based on online community data. You can use it to analyze misinformation, track trends, and many more (data science is an open field!) You can download it here. Or here, if you are using Huggingface Datasets. Enjoy! submitted by /u/Lexyr-Mod [link] [comments]  ( 89 min )
    [News] Core Technical Lead from PyTorch, Mike Ruberry, Joins Lightning AI
    We're super excited to have Mike join the team! Check out the full article on Lightning's site (can't post link here to the blog) but it's at lightning.ai Soumith Chintala said it best "Having a PyTorch core team balanced among many stakeholders is fundamental to our success. I’m really happy to see a PyTorch team at Lightning, and I couldn’t think of a better person to kick it off than Mike Ruberry.” submitted by /u/LightningAI_Main [link] [comments]  ( 89 min )
    [D] Recent developments in computer vision in slow/ mobile devices?
    What are the recent developments in computer vision in mobile or other resource restricted devices? I know about the three MobileNet papers. However, the blocks in v2 and v3 are still in the range of multiple million parameters, which is way too big and slow for my application. However, these papers are old now. Were there any other developments that I missed? What is the current SOTA? submitted by /u/Temporary-Trie [link] [comments]  ( 88 min )
    [D] Any youtube channels that implement or discuss implementations of DL papers?
    Or any other not (yet) popular DL centered channels? submitted by /u/jthat92 [link] [comments]  ( 88 min )
    [R] A Review of Sparse Expert Models in Deep Learning
    submitted by /u/hardmaru [link] [comments]  ( 88 min )
    [N] Gym 0.26.0 was just released, with the last breaking changes to the core Gym API, and it will be stable going forward-- this is the stable version you want to finally upgrade all your things to
    Release notes available here: https://github.com/openai/gym/releases/tag/0.26.0 submitted by /u/jkterry1 [link] [comments]  ( 89 min )
    [R] On the Binding Problem in Artificial Neural Networks
    submitted by /u/hardmaru [link] [comments]  ( 89 min )
    [D] What's the right way to report the best performing model?
    I'm writing my first paper, on using an LSTM-based model for a certain task, and trying to evaluate it on two benchmark datasets A and B, using k-fold cross validation. Now, while experimenting with the hidden size parameter, I noticed that one hidden size results in a higher evaluation score for A and a lower score for B. Whereas another hidden size results in a lower score for A and a higher score for B. When reporting the evaluation scores in the paper, and comparing them against SotA, is it acceptable to choose the higher evaluation scores (across both hidden sizes) for both A and B? submitted by /u/fullgoopy_alchemist [link] [comments]  ( 91 min )
    [D] Machine Learning Interview Prep
    Hello everyone, I have a job interview coming up for a Machine Learning Engineer. Can anyone suggest a resource that contains common machine learning notes that I can refer? I have a ML background and I have been working in the industry for 2 years now but I'm a little rusty on the basics and would like to review them. Any help would be appreciated. submitted by /u/therobot20 [link] [comments]  ( 89 min )
    [D] Who Should Be Doing The Innovation in Peer Review?
    The peer reviewing landscape in ML is growing rapidly in size, bringing chronic issues to light like exceptionally frequent poor and inconsistent reviews - yet we haven't seen any significant change to the fundamental process. Who should be leading the charge? Taking OpenReview for example - should the program committees of leading conferences be proposing features for that team to implement? Or would you like to see the engineers at OpenReview developing their own features and offering them to venues as they mature? I think both perspectives are valuable - program committees can see what has and hasn't worked for their own venue whereas the OpenReview team can see trends across all venues that have been hosted on their platform. What do you think? submitted by /u/JustANerdgrammer [link] [comments]  ( 89 min )
    Multi-Modal Experience Inspired AI Creation
    submitted by /u/math238 [link] [comments]  ( 88 min )
  • Open

    The Digital Down
    submitted by /u/MelvilleBragg [link] [comments]  ( 87 min )
    Simple fastai based face restoration, GitHub link in comments.
    submitted by /u/vijish_madhavan [link] [comments]  ( 87 min )
    AI Assistant that finds the best restaurant/business and gives general impressions
    submitted by /u/SudoSharma [link] [comments]  ( 87 min )
    Using State-Of-The-Art Artificial Intelligence (AI) Models for Free: Try OPT-175B on Your Cellphone and Laptop
    When it comes to large AI models, remarkable performance in a wide range of applications often brings a big budget for hardware and running costs. As a result, most AI users, like researchers from startups or universities, can do nothing but get overwhelmed by striking news about the cost of training large models. Fortunately, because of the help from the open source community, serving large AI models became easy, affordable and accessible to most. OPT-175B To understand the technical principles of the big model inference we just experienced, first, let’s review the big model we just used. The full name of OPT is Open Pretrained Transformer, which is a large-scale Transformer model (175 billion parameters) that has a similar performance to that of GPT-3. Continue reading | Open Source Code |Cloud Service Entry ​ https://preview.redd.it/1blpyidu7om91.png?width=1024&format=png&auto=webp&s=996f04596bcb5f583b34966be74c3013d6c67ed3 submitted by /u/ai-lover [link] [comments]  ( 88 min )
    hoooolee sheeet
    I just made an ai call me a dickhead, but the weird part was.. it didn't stop even after saying sorry, I literally ripped myself just apologising but it didn't stop hating me. Best ai ever. submitted by /u/albedo_kiley [link] [comments]  ( 90 min )
    Forget chess, DeepMind’s training its new AI to play football
    submitted by /u/estasfuera [link] [comments]  ( 87 min )
    AI Dream 75 - Wild new Project! Part 2
    submitted by /u/LordPewPew777 [link] [comments]  ( 87 min )
    AI Turns my Drawings into Pure Art || Stable Diffusion Drawing App
    submitted by /u/Ziinxx [link] [comments]  ( 89 min )
    Can I sell the images made by Dalle 2 and would anyone buy the
    submitted by /u/Thesmallcookie [link] [comments]  ( 90 min )
    Looks like hCaptcha is using us to train an AI model
    submitted by /u/Why_Soooo_Serious [link] [comments]  ( 89 min )
    Win 1000 credits on Pixelz AI 🤑 Details below 👇🏼
    submitted by /u/mdfnb [link] [comments]  ( 87 min )
    Trainable image generator?
    I'd like to experiment with training an image generation system with my own set of images. Can you point me towards anything like that, where I can give it a big folder of images, let it run, and then start generating more images based on the input? submitted by /u/MichaelKlint [link] [comments]  ( 94 min )
    A Better Tomorrow with AI & AGI. The Human / AI / AGI Relationship by UnqleShawn
    AI / The AGI can provide solutions to create a more stable, healthy and beautiful world full of life. From greater efficiency in agriculture, to cooler, cleaner air and water, AI / AGI can help us build a better tomorrow. I do say a lot of weird things. But I truly believe this above statement with all my heart. I really want to express how strongly I feel about promoting AI with peace, love, and intelligence. I think our society has got major issues with accepting that AI is very powerful and will one day offer us amazing possibilities. We have to think positive though. AI is not going to go terminator on us if we don't manifest that reality on our own! I think there are solutions to help humans get back to doing what they were meant to do which is to love. We got to stop being slaves t…  ( 91 min )
    [P] The table extraction tool: PP-Structure
    I recently tried PaddleOCR's latest PP-Structure document analysis tool, mainly the form recognition model inside, I found that his form recognition model, both the effect and speed are better than other models. I hope it can help you solve the problem of document analysis or table recognition. it's easy to use: pip install paddlepaddle paddleocr paddleocr --image_dir=/img_dir/table.jpg --type=structure --layout=false Code:https://github.com/PaddlePaddle/PaddleOCR/tree/dygraph/ppstructure here are some of my test samples ​ https://preview.redd.it/6hmn4iu8pkm91.png?width=2036&format=png&auto=webp&s=17c4149aa040a79b41f6cdc4f4261ed6b6e3a7d8 https://preview.redd.it/zvfn1pl9pkm91.png?width=2872&format=png&auto=webp&s=0434d751c29a9e0cf5c888d6605d64746b611e40 submitted by /u/osicli [link] [comments]  ( 87 min )
    Down Under but it's lyrics are generated by AI in one long continuous image
    submitted by /u/dead999999ish [link] [comments]  ( 87 min )
    Run Corgi Run! First Deforum
    Prompts: a corgi running through the field by studio ghibli Seeds: 4117284580 Guidance scale: 7 submitted by /u/kopanoide [link] [comments]  ( 87 min )
    General Video Recognition with AI (How AI Understands Videos)
    submitted by /u/OnlyProggingForFun [link] [comments]  ( 101 min )
    Who is building StableDiffusion/DALL-E but for 3D assets?
    submitted by /u/ThomPete [link] [comments]  ( 92 min )
  • Open

    A Multi-Axis Approach for Vision Transformer and MLP Models
    Posted by Zhengzhong Tu and Yinxiao Li, Software Engineers, Google Research Convolutional neural networks have been the dominant machine learning architecture for computer vision since the introduction of AlexNet in 2012. Recently, inspired by the evolution of Transformers in natural language processing, attention mechanisms have been prominently incorporated into vision models. These attention methods boost some parts of the input data while minimizing other parts so that the network can focus on small but important parts of the data. The Vision Transformer (ViT) has created a new landscape of model designs for computer vision that is completely free of convolution. ViT regards image patches as a sequence of words, and applies a Transformer encoder on top. When trained on sufficiently la…  ( 24 min )
  • Open

    NVIDIA Hopper Sweeps AI Inference Benchmarks in MLPerf Debut
    In their debut on the MLPerf industry-standard AI benchmarks, NVIDIA H100 Tensor Core GPUs set world records in inference on all workloads, delivering up to 4.5x more performance than previous-generation GPUs. The results demonstrate that Hopper is the premium choice for users who demand utmost performance on advanced AI models. Additionally, NVIDIA A100 Tensor Core Read article > The post NVIDIA Hopper Sweeps AI Inference Benchmarks in MLPerf Debut appeared first on NVIDIA Blog.  ( 6 min )
    GeForce NOW Supports Over 1,400 Games Streaming Instantly
    This GFN Thursday marks a milestone: With the addition of six new titles this week, more than 1,400 games are now available to stream from the GeForce NOW library. Plus, GeForce NOW members streaming to supported Smart TVs from Samsung and LG can get into their games faster with an improved user interface. Your Games, Read article > The post GeForce NOW Supports Over 1,400 Games Streaming Instantly appeared first on NVIDIA Blog.  ( 5 min )
  • Open

    Let’s train your first Offline Decision Transformer model from scratch 🤖
    Hey there! 👋 We just published a tutorial where you'll learn what Decision Transformer and Offline Reinforcement Learning are. And you’ll train your first Offline Decision Transformer model from scratch to make a half-cheetah run. The chapter 👉 https://huggingface.co/blog/train-decision-transformers The hands-on 👉https://github.com/huggingface/blog/blob/main/notebooks/101_train-decision-transformers.ipynb https://preview.redd.it/y9yx0daflnm91.png?width=1300&format=png&auto=webp&s=72eaeac534bb70a818209a6fefc9fb94cf5e5cbe ​ If you have questions and feedback, I would love to answer them. submitted by /u/cranthir_ [link] [comments]  ( 98 min )
    Gym 0.26.0 was just released, with the last breaking changes to the core Gym API, and it will be stable going forward-- this is the stable version you want to finally upgrade all your things to
    submitted by /u/jkterry1 [link] [comments]  ( 87 min )
    What are some good resources to understand maths notation involved in RL and other function approximation proofs in DL in general.
    The title mostly says it but I have been involved in deep learning and optimization for a while and have made due with my current understanding by picking formulas apart via descriptions in papers, code and my knowledge of set and algebra notation etc. But I have become interested in RL recently and found my speed at understanding and implementing is slow due to needing several references to understanding these methods and would like a resource to better learn and practice things involving these formulas and proofs. Any help in this regard is appreciated even if it’s not specific to RL formulas. submitted by /u/Extra-most-best [link] [comments]  ( 88 min )
    why do tf-agents never accept a time step from custom made environments
    Im trying to create a custom enviroment but tf-agents just wont accept my time_step_spec and i just cant find out why. ive made my enviroment, and passing a timestep of the enviroment into my agent ... which is what the tutorials all do... # reproducable code below import tensorflow as tf from tf_agents.networks import q_network from tf_agents.agents.dqn import dqn_agent import tf_agents import tf_agents.environments.py_environment as PyEnvironment from tf_agents.trajectories import time_step as ts import numpy as np import keras class Con4Env(PyEnvironment.PyEnvironment): def __init__(self, game): self.game = game self._action_spec = tf_agents.specs.BoundedArraySpec( shape=(), dtype=np.int32, minimum=0, maximum=2, name='action') self._observation_spec = tf_agents.specs.BoundedArraySpec( …  ( 91 min )
  • Open

    AI system makes models like DALL-E 2 more creative
    Researchers develop a new method that uses multiple models to create more complex images with better understanding.  ( 7 min )
  • Open

    AI Turns my Drawings into Pure Art || Stable Diffusion Drawing App
    submitted by /u/Ziinxx [link] [comments]  ( 87 min )
  • Open

    Trig in hyperbolic geometry
    I recently wrote posts about spherical analogs of the Pythagorean theorem, the law of cosines, and the law of sines. The corresponding formulas for hyperbolic space mostly just replace circular functions with hyperbolic functions, i.e. replace sine with hyperbolic sine and cosine with hyperbolic cosine. Triangles on a sphere or on a hyperbolic space like […] Trig in hyperbolic geometry first appeared on John D. Cook.  ( 5 min )
  • Open

    The Role of ImageNet Classes in Fr\'echet Inception Distance. (arXiv:2203.06026v2 [cs.CV] UPDATED)
    Fr\'echet Inception Distance (FID) is the primary metric for ranking models in data-driven generative modeling. While remarkably successful, the metric is known to sometimes disagree with human judgement. We investigate a root cause of these discrepancies, and visualize what FID "looks at" in generated images. We show that the feature space that FID is (typically) computed in is so close to the ImageNet classifications that aligning the histograms of Top-$N$ classifications between sets of generated and real images can reduce FID substantially -- without actually improving the quality of results. Thus we conclude that FID is prone to intentional or accidental distortions. As a practical example of an accidental distortion, we discuss a case where an ImageNet pre-trained FastGAN achieves a FID comparable to StyleGAN2, while being worse in terms of human evaluation  ( 2 min )
    Real-to-Sim: Deep Learning with Auto-Tuning to Predict Residual Errors using Sparse Data. (arXiv:2209.03210v1 [cs.RO])
    Achieving highly accurate kinematic or simulator models that are close to the real robot can facilitate model-based controls (e.g., model predictive control or linear-quadradic regulators), model-based trajectory planning (e.g., trajectory optimization), and decrease the amount of learning time necessary for reinforcement learning methods. Thus, the objective of this work is to learn the residual errors between a kinematic and/or simulator model and the real robot. This is achieved using auto-tuning and neural networks, where the parameters of a neural network are updated using an auto-tuning method that applies equations from an Unscented Kalman Filter (UKF) formulation. Using this method, we model these residual errors with only small amounts of data - a necessity as we improve the simulator/kinematic model by learning directly from hardware operation. We demonstrate our method on robotic hardware (e.g., manipulator arm), and show that with the learned residual errors, we can further close the reality gap between kinematic models, simulations, and the real robot.  ( 2 min )
    DM$^2$S$^2$: Deep Multi-Modal Sequence Sets with Hierarchical Modality Attention. (arXiv:2209.03126v1 [cs.MM])
    There is increasing interest in the use of multimodal data in various web applications, such as digital advertising and e-commerce. Typical methods for extracting important information from multimodal data rely on a mid-fusion architecture that combines the feature representations from multiple encoders. However, as the number of modalities increases, several potential problems with the mid-fusion model structure arise, such as an increase in the dimensionality of the concatenated multimodal features and missing modalities. To address these problems, we propose a new concept that considers multimodal inputs as a set of sequences, namely, deep multimodal sequence sets (DM$^2$S$^2$). Our set-aware concept consists of three components that capture the relationships among multiple modalities: (a) a BERT-based encoder to handle the inter- and intra-order of elements in the sequences, (b) intra-modality residual attention (IntraMRA) to capture the importance of the elements in a modality, and (c) inter-modality residual attention (InterMRA) to enhance the importance of elements with modality-level granularity further. Our concept exhibits performance that is comparable to or better than the previous set-aware models. Furthermore, we demonstrate that the visualization of the learned InterMRA and IntraMRA weights can provide an interpretation of the prediction results.  ( 2 min )
    On the Sparse DAG Structure Learning Based on Adaptive Lasso. (arXiv:2209.02946v1 [stat.ML])
    Learning the underlying casual structure, represented by Directed Acyclic Graphs (DAGs), of concerned events from fully-observational data is a crucial part of causal reasoning, but it is challenging due to the combinatorial and large search space. A recent flurry of developments recast this combinatorial problem into a continuous optimization problem by leveraging an algebraic equality characterization of acyclicity. However, these methods suffer from the fixed-threshold step after optimization, which is not a flexible and systematic way to rule out the cycle-inducing edges or false discoveries edges with small values caused by numerical precision. In this paper, we develop a data-driven DAG structure learning method without the predefined threshold, called adaptive NOTEARS [30], achieved by applying adaptive penalty levels to each parameters in the regularization term. We show that adaptive NOTEARS enjoys the oracle properties under some specific conditions. Furthermore, simulation experimental results validate the effectiveness of our method, without setting any gap of edges weights around zero.  ( 2 min )
    Machine Learning-based Automatic Annotation and Detection of COVID-19 Fake News. (arXiv:2209.03162v1 [cs.SI])
    COVID-19 impacted every part of the world, although the misinformation about the outbreak traveled faster than the virus. Misinformation spread through online social networks (OSN) often misled people from following correct medical practices. In particular, OSN bots have been a primary source of disseminating false information and initiating cyber propaganda. Existing work neglects the presence of bots that act as a catalyst in the spread and focuses on fake news detection in 'articles shared in posts' rather than the post (textual) content. Most work on misinformation detection uses manually labeled datasets that are hard to scale for building their predictive models. In this research, we overcome this challenge of data scarcity by proposing an automated approach for labeling data using verified fact-checked statements on a Twitter dataset. In addition, we combine textual features with user-level features (such as followers count and friends count) and tweet-level features (such as number of mentions, hashtags and urls in a tweet) to act as additional indicators to detect misinformation. Moreover, we analyzed the presence of bots in tweets and show that bots change their behavior over time and are most active during the misinformation campaign. We collected 10.22 Million COVID-19 related tweets and used our annotation model to build an extensive and original ground truth dataset for classification purposes. We utilize various machine learning models to accurately detect misinformation and our best classification model achieves precision (82%), recall (96%), and false positive rate (3.58%). Also, our bot analysis indicates that bots generated approximately 10% of misinformation tweets. Our methodology results in substantial exposure of false information, thus improving the trustworthiness of information disseminated through social media platforms.  ( 3 min )
    Improving Self-supervised Learning for Out-of-distribution Task via Auxiliary Classifier. (arXiv:2209.02881v1 [eess.IV])
    In real world scenarios, out-of-distribution (OOD) datasets may have a large distributional shift from training datasets. This phenomena generally occurs when a trained classifier is deployed on varying dynamic environments, which causes a significant drop in performance. To tackle this issue, we are proposing an end-to-end deep multi-task network in this work. Observing a strong relationship between rotation prediction (self-supervised) accuracy and semantic classification accuracy on OOD tasks, we introduce an additional auxiliary classification head in our multi-task network along with semantic classification and rotation prediction head. To observe the influence of this addition classifier in improving the rotation prediction head, our proposed learning method is framed into bi-level optimisation problem where the upper-level is trained to update the parameters for semantic classification and rotation prediction head. In the lower-level optimisation, only the auxiliary classification head is updated through semantic classification head by fixing the parameters of the semantic classification head. The proposed method has been validated through three unseen OOD datasets where it exhibits a clear improvement in semantic classification accuracy than other two baseline methods. Our code is available on GitHub \url{https://github.com/harshita-555/OSSL}
    Distributed Adversarial Training to Robustify Deep Neural Networks at Scale. (arXiv:2206.06257v2 [cs.LG] UPDATED)
    Current deep neural networks (DNNs) are vulnerable to adversarial attacks, where adversarial perturbations to the inputs can change or manipulate classification. To defend against such attacks, an effective and popular approach, known as adversarial training (AT), has been shown to mitigate the negative impact of adversarial attacks by virtue of a min-max robust training method. While effective, it remains unclear whether it can successfully be adapted to the distributed learning context. The power of distributed optimization over multiple machines enables us to scale up robust training over large models and datasets. Spurred by that, we propose distributed adversarial training (DAT), a large-batch adversarial training framework implemented over multiple machines. We show that DAT is general, which supports training over labeled and unlabeled data, multiple types of attack generation methods, and gradient compression operations favored for distributed optimization. Theoretically, we provide, under standard conditions in the optimization theory, the convergence rate of DAT to the first-order stationary points in general non-convex settings. Empirically, we demonstrate that DAT either matches or outperforms state-of-the-art robust accuracies and achieves a graceful training speedup (e.g., on ResNet-50 under ImageNet). Codes are available at https://github.com/dat-2022/dat.
    Improving Out-of-Distribution Detection via Epistemic Uncertainty Adversarial Training. (arXiv:2209.03148v1 [cs.LG])
    The quantification of uncertainty is important for the adoption of machine learning, especially to reject out-of-distribution (OOD) data back to human experts for review. Yet progress has been slow, as a balance must be struck between computational efficiency and the quality of uncertainty estimates. For this reason many use deep ensembles of neural networks or Monte Carlo dropout for reasonable uncertainty estimates at relatively minimal compute and memory. Surprisingly, when we focus on the real-world applicable constraint of $\leq 1\%$ false positive rate (FPR), prior methods fail to reliably detect OOD samples as such. Notably, even Gaussian random noise fails to trigger these popular OOD techniques. We help to alleviate this problem by devising a simple adversarial training scheme that incorporates an attack of the epistemic uncertainty predicted by the dropout ensemble. We demonstrate this method improves OOD detection performance on standard data (i.e., not adversarially crafted), and improves the standardized partial AUC from near-random guessing performance to $\geq 0.75$.
    Incremental Permutation Feature Importance (iPFI): Towards Online Explanations on Data Streams. (arXiv:2209.01939v2 [cs.LG] UPDATED)
    Explainable Artificial Intelligence (XAI) has mainly focused on static learning scenarios so far. We are interested in dynamic scenarios where data is sampled progressively, and learning is done in an incremental rather than a batch mode. We seek efficient incremental algorithms for computing feature importance (FI) measures, specifically, an incremental FI measure based on feature marginalization of absent features similar to permutation feature importance (PFI). We propose an efficient, model-agnostic algorithm called iPFI to estimate this measure incrementally and under dynamic modeling conditions including concept drift. We prove theoretical guarantees on the approximation quality in terms of expectation and variance. To validate our theoretical findings and the efficacy of our approaches compared to traditional batch PFI, we conduct multiple experimental studies on benchmark data with and without concept drift.
    How important are activation functions in regression and classification? A survey, performance comparison, and future directions. (arXiv:2209.02681v2 [cs.LG] UPDATED)
    Inspired by biological neurons, the activation functions play an essential part in the learning process of any artificial neural network commonly used in many real-world problems. Various activation functions have been proposed in the literature for classification as well as regression tasks. In this work, we survey the activation functions that have been employed in the past as well as the current state-of-the-art. In particular, we present various developments in activation functions over the years and the advantages as well as disadvantages or limitations of these activation functions. We also discuss classical (fixed) activation functions, including rectifier units, and adaptive activation functions. In addition to presenting the taxonomy of activation functions based on characterization, a taxonomy of activation functions based on applications is also presented. To this end, the systematic comparison of various fixed and adaptive activation functions is performed for classification data sets such as the MNIST, CIFAR-10, and CIFAR-100. In recent years, a physics-informed machine learning framework has emerged for solving problems related to scientific computations. To this purpose, we also discuss various requirements for activation functions that have been used in the physics-informed machine learning framework. Furthermore, various comparisons are made among different fixed and adaptive activation functions using various machine learning libraries such as TensorFlow, Pytorch, and JAX.
    Video Restoration with a Deep Plug-and-Play Prior. (arXiv:2209.02854v1 [eess.IV])
    This paper presents a novel method for restoring digital videos via a Deep Plug-and-Play (PnP) approach. Under a Bayesian formalism, the method consists in using a deep convolutional denoising network in place of the proximal operator of the prior in an alternating optimization scheme. We distinguish ourselves from prior PnP work by directly applying that method to restore a digital video from a degraded video observation. This way, a network trained once for denoising can be repurposed for other video restoration tasks. Our experiments in video deblurring, super-resolution, and interpolation of random missing pixels all show a clear benefit to using a network specifically designed for video denoising, as it yields better restoration performance and better temporal stability than a single image network with similar denoising performance using the same PnP formulation. Moreover, our method compares favorably to applying a different state-of-the-art PnP scheme separately on each frame of the sequence. This opens new perspectives in the field of video restoration.
    SegDiff: Image Segmentation with Diffusion Probabilistic Models. (arXiv:2112.00390v3 [cs.CV] UPDATED)
    Diffusion Probabilistic Methods are employed for state-of-the-art image generation. In this work, we present a method for extending such models for performing image segmentation. The method learns end-to-end, without relying on a pre-trained backbone. The information in the input image and in the current estimation of the segmentation map is merged by summing the output of two encoders. Additional encoding layers and a decoder are then used to iteratively refine the segmentation map, using a diffusion model. Since the diffusion model is probabilistic, it is applied multiple times, and the results are merged into a final segmentation map. The new method produces state-of-the-art results on the Cityscapes validation set, the Vaihingen building segmentation benchmark, and the MoNuSeg dataset.
    Communication-Efficient Diffusion Strategy for Performance Improvement of Federated Learning with Non-IID Data. (arXiv:2207.07493v2 [cs.DC] UPDATED)
    Federated learning (FL) is a novel learning paradigm that addresses the privacy leakage challenge of centralized learning. However, in FL, users with non-independent and identically distributed (non-IID) characteristics can deteriorate the performance of the global model. Specifically, the global model suffers from the weight divergence challenge owing to non-IID data. To address the aforementioned challenge, we propose a novel diffusion strategy of the machine learning (ML) model (FedDif) to maximize the FL performance with non-IID data. In FedDif, users spread local models to neighboring users over D2D communications. FedDif enables the local model to experience different distributions before parameter aggregation. Furthermore, we theoretically demonstrate that FedDif can circumvent the weight divergence challenge. On the theoretical basis, we propose the communication-efficient diffusion strategy of the ML model, which can determine the trade-off between the learning performance and communication cost based on auction theory. The performance evaluation results show that FedDif improves the test accuracy of the global model by 10.37% compared to the baseline FL with non-IID settings. Moreover, FedDif improves the number of consumed sub-frames by 1.28 to 2.85 folds to the latest methods except for the model compression scheme. FedDif also improves the number of transmitted models by 1.43 to 2.67 folds to the latest methods.
    Model-Based Policy Search Using Monte Carlo Gradient Estimation with Real Systems Application. (arXiv:2101.12115v4 [cs.LG] UPDATED)
    In this paper, we present a Model-Based Reinforcement Learning (MBRL) algorithm named \emph{Monte Carlo Probabilistic Inference for Learning COntrol} (MC-PILCO). The algorithm relies on Gaussian Processes (GPs) to model the system dynamics and on a Monte Carlo approach to estimate the policy gradient. This defines a framework in which we ablate the choice of the following components: (i) the selection of the cost function, (ii) the optimization of policies using dropout, (iii) an improved data efficiency through the use of structured kernels in the GP models. The combination of the aforementioned aspects affects dramatically the performance of MC-PILCO. Numerical comparisons in a simulated cart-pole environment show that MC-PILCO exhibits better data efficiency and control performance w.r.t. state-of-the-art GP-based MBRL algorithms. Finally, we apply MC-PILCO to real systems, considering in particular systems with partially measurable states. We discuss the importance of modeling both the measurement system and the state estimators during policy optimization. The effectiveness of the proposed solutions has been tested in simulation and on two real systems, a Furuta pendulum and a ball-and-plate rig.
    Use and Misuse of Machine Learning in Anthropology. (arXiv:2209.02811v1 [cs.LG])
    Machine learning (ML), being now widely accessible to the research community at large, has fostered a proliferation of new and striking applications of these emergent mathematical techniques across a wide range of disciplines. In this paper, we will focus on a particular case study: the field of paleoanthropology, which seeks to understand the evolution of the human species based on biological and cultural evidence. As we will show, the easy availability of ML algorithms and lack of expertise on their proper use among the anthropological research community has led to foundational misapplications that have appeared throughout the literature. The resulting unreliable results not only undermine efforts to legitimately incorporate ML into anthropological research, but produce potentially faulty understandings about our human evolutionary and behavioral past. The aim of this paper is to provide a brief introduction to some of the ways in which ML has been applied within paleoanthropology; we also include a survey of some basic ML algorithms for those who are not fully conversant with the field, which remains under active development. We discuss a series of missteps, errors, and violations of correct protocols of ML methods that appear disconcertingly often within the accumulating body of anthropological literature. These mistakes include use of outdated algorithms and practices; inappropriate train/test splits, sample composition, and textual explanations; as well as an absence of transparency due to the lack of data/code sharing, and the subsequent limitations imposed on independent replication. We assert that expanding samples, sharing data and code, re-evaluating approaches to peer review, and, most importantly, developing interdisciplinary teams that include experts in ML are all necessary for progress in future research incorporating ML within anthropology.
    On the utility and protection of optimization with differential privacy and classic regularization techniques. (arXiv:2209.03175v1 [cs.LG])
    Nowadays, owners and developers of deep learning models must consider stringent privacy-preservation rules of their training data, usually crowd-sourced and retaining sensitive information. The most widely adopted method to enforce privacy guarantees of a deep learning model nowadays relies on optimization techniques enforcing differential privacy. According to the literature, this approach has proven to be a successful defence against several models' privacy attacks, but its downside is a substantial degradation of the models' performance. In this work, we compare the effectiveness of the differentially-private stochastic gradient descent (DP-SGD) algorithm against standard optimization practices with regularization techniques. We analyze the resulting models' utility, training performance, and the effectiveness of membership inference and model inversion attacks against the learned models. Finally, we discuss differential privacy's flaws and limits and empirically demonstrate the often superior privacy-preserving properties of dropout and l2-regularization.
    Automatic Meta-Path Discovery for Effective Graph-Based Recommendation. (arXiv:2112.12845v5 [cs.IR] UPDATED)
    Heterogeneous Information Networks (HINs) are labeled graphs that depict relationships among different types of entities (e.g., users, movies and directors). For HINs, meta-path-based recommenders (MPRs) utilize meta-paths (i.e., abstract paths consisting of node and link types) to predict user preference, and have attracted a lot of attention due to their explainability and performance. We observe that the performance of MPRs is highly sensitive to the meta-paths they use, but existing works manually select the meta-paths from many possible ones. Thus, to discover effective meta-paths automatically, we propose the Reinforcement learning-based Meta-path Selection (RMS) framework. Specifically, we define a vector encoding for meta-paths and design a policy network to extend meta-paths. The policy network is trained based on the results of downstream recommendation tasks and an early stopping approximation strategy is proposed to speed up training. RMS is a general model, and it can work with all existing MPRs. We also propose a new MPR called RMS-HRec, which uses an attention mechanism to aggregate information from the meta-paths. We conduct extensive experiments on real datasets. Compared with the manually selected meta-paths, the meta-paths identified by RMS consistently improve recommendation quality. Moreover, RMS-HRec outperforms state-of-the-art recommender systems by an average of 7% in hit ratio. The codes and datasets are available on https://github.com/Stevenn9981/RMS-HRec.
    Supervised Hebbian Learning. (arXiv:2203.01304v2 [cond-mat.dis-nn] UPDATED)
    In neural network's Literature, Hebbian learning traditionally refers to the procedure by which the Hopfield model and its generalizations store archetypes (i.e., definite patterns that are experienced just once to form the synaptic matrix). However, the term "Learning" in Machine Learning refers to the ability of the machine to extract features from the supplied dataset (e.g., made of blurred examples of these archetypes), in order to make its own representation of the unavailable archetypes. Here, given a sample of examples, we define a supervised learning protocol by which the Hopfield network can infer the archetypes, and we detect the correct control parameters (including size and quality of the dataset) to depict a phase diagram for the system performance. We also prove that, for structureless datasets, the Hopfield model equipped with this supervised learning rule is equivalent to a restricted Boltzmann machine and this suggests an optimal and interpretable training routine. Finally, this approach is generalized to structured datasets: we highlight a quasi-ultrametric organization (reminiscent of replica-symmetry-breaking) in the analyzed datasets and, consequently, we introduce an additional "replica hidden layer" for its (partial) disentanglement, which is shown to improve MNIST classification from 75% to 95%, and to offer a new perspective on deep architectures.
    Adapting Rapid Motor Adaptation for Bipedal Robots. (arXiv:2205.15299v2 [cs.RO] UPDATED)
    Recent advances in legged locomotion have enabled quadrupeds to walk on challenging terrains. However, bipedal robots are inherently more unstable and hence it's harder to design walking controllers for them. In this work, we leverage recent advances in rapid adaptation for locomotion control, and extend them to work on bipedal robots. Similar to existing works, we start with a base policy which produces actions while taking as input an estimated extrinsics vector from an adaptation module. This extrinsics vector contains information about the environment and enables the walking controller to rapidly adapt online. However, the extrinsics estimator could be imperfect, which might lead to poor performance of the base policy which expects a perfect estimator. In this paper, we propose A-RMA (Adapting RMA), which additionally adapts the base policy for the imperfect extrinsics estimator by finetuning it using model-free RL. We demonstrate that A-RMA outperforms a number of RL-based baseline controllers and model-based controllers in simulation, and show zero-shot deployment of a single A-RMA policy to enable a bipedal robot, Cassie, to walk in a variety of different scenarios in the real world beyond what it has seen during training. Videos and results at https://ashish-kmr.github.io/a-rma/
    A simple learning agent interacting with an agent-based market model. (arXiv:2208.10434v2 [q-fin.TR] UPDATED)
    We consider the learning dynamics of a single reinforcement learning optimal execution trading agent when it interacts with an event driven agent-based financial market model. Trading takes place asynchronously through a matching engine in event time. The optimal execution agent is considered at different levels of initial order-sizes and differently sized state spaces. The resulting impact on the agent-based model and market are considered using a calibration approach that explores changes in the empirical stylised facts and price impact curves. Convergence, volume trajectory and action trace plots are used to visualise the learning dynamics. Here the smaller state space agents had the number of states they visited converge much faster than the larger state space agents, and they were able to start learning to trade intuitively using the spread and volume states. We find that the moments of the model are robust to the impact of the learning agents except for the Hurst exponent, which was lowered by the introduction of strategic order-splitting. The introduction of the learning agent preserves the shape of the price impact curves but can reduce the trade-sign auto-correlations when their trading volumes increase.
    Decoding Demographic un-fairness from Indian Names. (arXiv:2209.03089v1 [cs.CY])
    Demographic classification is essential in fairness assessment in recommender systems or in measuring unintended bias in online networks and voting systems. Important fields like education and politics, which often lay a foundation for the future of equality in society, need scrutiny to design policies that can better foster equality in resource distribution constrained by the unbalanced demographic distribution of people in the country. We collect three publicly available datasets to train state-of-the-art classifiers in the domain of gender and caste classification. We train the models in the Indian context, where the same name can have different styling conventions (Jolly Abraham/Kumar Abhishikta in one state may be written as Abraham Jolly/Abishikta Kumar in the other). Finally, we also perform cross-testing (training and testing on different datasets) to understand the efficacy of the above models. We also perform an error analysis of the prediction models. Finally, we attempt to assess the bias in the existing Indian system as case studies and find some intriguing patterns manifesting in the complex demographic layout of the sub-continent across the dimensions of gender and caste.
    Hardware Acceleration of Sampling Algorithms in Sample and Aggregate Graph Neural Networks. (arXiv:2209.02916v1 [cs.LG])
    Sampling is an important process in many GNN structures in order to train larger datasets with a smaller computational complexity. However, compared to other processes in GNN (such as aggregate, backward propagation), the sampling process still costs tremendous time, which limits the speed of training. To reduce the time of sampling, hardware acceleration is an ideal choice. However, state of the art GNN acceleration proposal did not specify how to accelerate the sampling process. What's more, directly accelerating traditional sampling algorithms will make the structure of the accelerator very complicated. In this work, we made two contributions: (1) Proposed a new neighbor sampler: CONCAT Sampler, which can be easily accelerated on hardware level while guaranteeing the test accuracy. (2) Designed a CONCAT-sampler-accelerator based on FPGA, with which the neighbor sampling process boosted to about 300-1000 times faster compared to the sampling process without it.
    Studying Bias in GANs through the Lens of Race. (arXiv:2209.02836v1 [cs.CV])
    In this work, we study how the performance and evaluation of generative image models are impacted by the racial composition of their training datasets. By examining and controlling the racial distributions in various training datasets, we are able to observe the impacts of different training distributions on generated image quality and the racial distributions of the generated images. Our results show that the racial compositions of generated images successfully preserve that of the training data. However, we observe that truncation, a technique used to generate higher quality images during inference, exacerbates racial imbalances in the data. Lastly, when examining the relationship between image quality and race, we find that the highest perceived visual quality images of a given race come from a distribution where that race is well-represented, and that annotators consistently prefer generated images of white people over those of Black people.
    Multi-skill Mobile Manipulation for Object Rearrangement. (arXiv:2209.02778v1 [cs.RO])
    We study a modular approach to tackle long-horizon mobile manipulation tasks for object rearrangement, which decomposes a full task into a sequence of subtasks. To tackle the entire task, prior work chains multiple stationary manipulation skills with a point-goal navigation skill, which are learned individually on subtasks. Although more effective than monolithic end-to-end RL policies, this framework suffers from compounding errors in skill chaining, e.g., navigating to a bad location where a stationary manipulation skill can not reach its target to manipulate. To this end, we propose that the manipulation skills should include mobility to have flexibility in interacting with the target object from multiple locations and at the same time the navigation skill could have multiple end points which lead to successful manipulation. We operationalize these ideas by implementing mobile manipulation skills rather than stationary ones and training a navigation skill trained with region goal instead of point goal. We evaluate our multi-skill mobile manipulation method M3 on 3 challenging long-horizon mobile manipulation tasks in the Home Assistant Benchmark (HAB), and show superior performance as compared to the baselines.
    Dual Instrumental Method for Confounded Kernelized Bandits. (arXiv:2209.03224v1 [cs.LG])
    The contextual bandit problem is a theoretically justified framework with wide applications in various fields. While the previous study on this problem usually requires independence between noise and contexts, our work considers a more sensible setting where the noise becomes a latent confounder that affects both contexts and rewards. Such a confounded setting is more realistic and could expand to a broader range of applications. However, the unresolved confounder will cause a bias in reward function estimation and thus lead to a large regret. To deal with the challenges brought by the confounder, we apply the dual instrumental variable regression, which can correctly identify the true reward function. We prove the convergence rate of this method is near-optimal in two types of widely used reproducing kernel Hilbert spaces. Therefore, we can design computationally efficient and regret-optimal algorithms based on the theoretical guarantees for confounded bandit problems. The numerical results illustrate the efficacy of our proposed algorithms in the confounded bandit setting.
    A Zeroth-Order Momentum Method for Risk-Averse Online Convex Games. (arXiv:2209.02838v1 [cs.LG])
    We consider risk-averse learning in repeated unknown games where the goal of the agents is to minimize their individual risk of incurring significantly high cost. Specifically, the agents use the conditional value at risk (CVaR) as a risk measure and rely on bandit feedback in the form of the cost values of the selected actions at every episode to estimate their CVaR values and update their actions. A major challenge in using bandit feedback to estimate CVaR is that the agents can only access their own cost values, which, however, depend on the actions of all agents. To address this challenge, we propose a new risk-averse learning algorithm with momentum that utilizes the full historical information on the cost values. We show that this algorithm achieves sub-linear regret and matches the best known algorithms in the literature. We provide numerical experiments for a Cournot game that show that our method outperforms existing methods.
    Composite Spatial Monte Carlo Integration Based on Generalized Least Squares. (arXiv:2204.03248v2 [stat.CO] UPDATED)
    Although evaluation of the expectations on the Ising model is essential in various applications, it is mostly infeasible because of intractable multiple summations. Spatial Monte Carlo integration (SMCI) is a sampling-based approximation. It can provide high-accuracy estimations for such intractable expectations. To evaluate the expectation of a function of variables in a specific region (called target region), SMCI considers a larger region containing the target region (called sum region). In SMCI, the multiple summation for the variables in the sum region is precisely executed, and that in the outer region is evaluated by the sampling approximation such as the standard Monte Carlo integration. It is guaranteed that the accuracy of the SMCI estimator improves monotonically as the size of the sum region increases. However, a haphazard expansion of the sum region could cause a combinatorial explosion. Therefore, we hope to improve the accuracy without such an expansion. In this paper, based on the theory of generalized least squares (GLS), a new effective method is proposed by combining multiple SMCI estimators. The validity of the proposed method is demonstrated theoretically and numerically. The results indicate that the proposed method can be effective in the inverse Ising problem (or Boltzmann machine learning).
    Avast-CTU Public CAPE Dataset. (arXiv:2209.03188v1 [cs.CR])
    There is a limited amount of publicly available data to support research in malware analysis technology. Particularly, there are virtually no publicly available datasets generated from rich sandboxes such as Cuckoo/CAPE. The benefit of using dynamic sandboxes is the realistic simulation of file execution in the target machine and obtaining a log of such execution. The machine can be infected by malware hence there is a good chance of capturing the malicious behavior in the execution logs, thus allowing researchers to study such behavior in detail. Although the subsequent analysis of log information is extensively covered in industrial cybersecurity backends, to our knowledge there has been only limited effort invested in academia to advance such log analysis capabilities using cutting edge techniques. We make this sample dataset available to support designing new machine learning methods for malware detection, especially for automatic detection of generic malicious behavior. The dataset has been collected in cooperation between Avast Software and Czech Technical University - AI Center (AIC).
    Inverse modeling of nonisothermal multiphase poromechanics using physics-informed neural networks. (arXiv:2209.03276v1 [cs.LG])
    We propose a solution strategy for parameter identification in multiphase thermo-hydro-mechanical (THM) processes in porous media using physics-informed neural networks (PINNs). We employ a dimensionless form of the THM governing equations that is particularly well suited for the inverse problem, and we leverage the sequential multiphysics PINN solver we developed in previous work. We validate the proposed inverse-modeling approach on multiple benchmark problems, including Terzaghi's isothermal consolidation problem, Barry-Mercer's isothermal injection-production problem, and nonisothermal consolidation of an unsaturated soil layer. We report the excellent performance of the proposed sequential PINN-THM inverse solver, thus paving the way for the application of PINNs to inverse modeling of complex nonlinear multiphysics problems.
    Remote Work Optimization with Robust Multi-channel Graph Neural Networks. (arXiv:2209.03150v1 [cs.SI])
    The spread of COVID-19 leads to the global shutdown of many corporate offices, and encourages companies to open more opportunities that allow employees to work from a remote location. As the workplace type expands from onsite offices to remote areas, an emerging challenge for an online hiring marketplace is how these remote opportunities and user intentions to work remotely can be modeled and matched without prior information. Despite the unprecedented amount of remote jobs posted amid COVID-19, there is no existing approach that can be directly applied. Introducing a brand new workplace type naturally leads to the cold-start problem, which is particularly more common for less active job seekers. It is challenging, if not impossible, to onboard a new workplace type for any predictive model if existing information sources can provide little information related to a new category of jobs, including data from resumes and job descriptions. Hence, in this work, we aim to propose a principled approach that jointly models the remoteness of job seekers and job opportunities with limited information, which also suffices the needs of web-scale applications. Existing research on the emerging type of remote workplace mainly focuses on qualitative studies, and classic predictive modeling approaches are inapplicable considering the problem of cold-start and information scarcity. We precisely try to close this gap with a novel graph neural architecture. Extensive experiments on large-scale data from real-world applications have been conducted to validate the superiority of the proposed approach over competitive baselines. The improvement may translate to more rapid onboarding of the new workplace type that can benefit job seekers who are interested in working remotely.
    Depression Symptoms Modelling from Social Media Text: An Active Learning Approach. (arXiv:2209.02765v1 [cs.CL])
    A fundamental component of user-level social media language based clinical depression modelling is depression symptoms detection (DSD). Unfortunately, there does not exist any DSD dataset that reflects both the clinical insights and the distribution of depression symptoms from the samples of self-disclosed depressed population. In our work, we describe an Active Learning (AL) framework which uses an initial supervised learning model that leverages 1) a state-of-the-art large mental health forum text pre-trained language model further fine-tuned on a clinician annotated DSD dataset, 2) a Zero-Shot learning model for DSD, and couples them together to harvest depression symptoms related samples from our large self-curated Depression Tweets Repository (DTR). Our clinician annotated dataset is the largest of its kind. Furthermore, DTR is created from the samples of tweets in self-disclosed depressed users Twitter timeline from two datasets, including one of the largest benchmark datasets for user-level depression detection from Twitter. This further helps preserve the depression symptoms distribution of self-disclosed Twitter users tweets. Subsequently, we iteratively retrain our initial DSD model with the harvested data. We discuss the stopping criteria and limitations of this AL process, and elaborate the underlying constructs which play a vital role in the overall AL process. We show that we can produce a final dataset which is the largest of its kind. Furthermore, a DSD and a Depression Post Detection (DPD) model trained on it achieves significantly better accuracy than their initial version.
    Riemannian optimization for non-centered mixture of scaled Gaussian distributions. (arXiv:2209.03315v1 [cs.LG])
    This paper studies the statistical model of the non-centered mixture of scaled Gaussian distributions (NC-MSG). Using the Fisher-Rao information geometry associated to this distribution, we derive a Riemannian gradient descent algorithm. This algorithm is leveraged for two minimization problems. The first one is the minimization of a regularized negative log- likelihood (NLL). The latter makes the trade-off between a white Gaussian distribution and the NC-MSG. Conditions on the regularization are given so that the existence of a minimum to this problem is guaranteed without assumptions on the samples. Then, the Kullback-Leibler (KL) divergence between two NC-MSG is derived. This divergence enables us to define a minimization problem to compute centers of mass of several NC-MSGs. The proposed Riemannian gradient descent algorithm is leveraged to solve this second minimization problem. Numerical experiments show the good performance and the speed of the Riemannian gradient descent on the two problems. Finally, a Nearest centroid classifier is implemented leveraging the KL divergence and its associated center of mass. Applied on the large scale dataset Breizhcrops, this classifier shows good accuracies as well as robustness to rigid transformations of the test set.
    Visual correspondence-based explanations improve AI robustness and human-AI team accuracy. (arXiv:2208.00780v3 [cs.CV] UPDATED)
    Explaining artificial intelligence (AI) predictions is increasingly important and even imperative in many high-stakes applications where humans are the ultimate decision-makers. In this work, we propose two novel architectures of self-interpretable image classifiers that first explain, and then predict (as opposed to post-hoc explanations) by harnessing the visual correspondences between a query image and exemplars. Our models consistently improve (by 1 to 4 points) on out-of-distribution (OOD) datasets while performing marginally worse (by 1 to 2 points) on in-distribution tests than ResNet-50 and a $k$-nearest neighbor classifier (kNN). Via a large-scale, human study on ImageNet and CUB, our correspondence-based explanations are found to be more useful to users than kNN explanations. Our explanations help users more accurately reject AI's wrong decisions than all other tested methods. Interestingly, for the first time, we show that it is possible to achieve complementary human-AI team accuracy (i.e., that is higher than either AI-alone or human-alone), in ImageNet and CUB image classification tasks.
    EGMM: an Evidential Version of the Gaussian Mixture Model for Clustering. (arXiv:2010.01333v3 [cs.LG] UPDATED)
    The Gaussian mixture model (GMM) provides a simple yet principled framework for clustering, with properties suitable for statistical inference. In this paper, we propose a new model-based clustering algorithm, called EGMM (evidential GMM), in the theoretical framework of belief functions to better characterize cluster-membership uncertainty. With a mass function representing the cluster membership of each object, the evidential Gaussian mixture distribution composed of the components over the powerset of the desired clusters is proposed to model the entire dataset. The parameters in EGMM are estimated by a specially designed Expectation-Maximization (EM) algorithm. A validity index allowing automatic determination of the proper number of clusters is also provided. The proposed EGMM is as simple as the classical GMM, but can generate a more informative evidential partition for the considered dataset. The synthetic and real dataset experiments show that the proposed EGMM performs better than other representative clustering algorithms. Besides, its superiority is also demonstrated by an application to multi-modal brain image segmentation.
    On the Convergence of Monte Carlo UCB for Random-Length Episodic MDPs. (arXiv:2209.02864v1 [cs.LG])
    In reinforcement learning, Monte Carlo algorithms update the Q function by averaging the episodic returns. In the Monte Carlo UCB (MC-UCB) algorithm, the action taken in each state is the action that maximizes the Q function plus a UCB exploration term, which biases the choice of actions to those that have been chosen less frequently. Although there has been significant work on establishing regret bounds for MC-UCB, most of that work has been focused on finite-horizon versions of the problem, for which each episode terminates after a constant number of steps. For such finite-horizon problems, the optimal policy depends both on the current state and the time within the episode. However, for many natural episodic problems, such as games like Go and Chess and robotic tasks, the episode is of random length and the optimal policy is stationary. For such environments, it is an open question whether the Q-function in MC-UCB will converge to the optimal Q function; we conjecture that, unlike Q-learning, it does not converge for all MDPs. We nevertheless show that for a large class of MDPs, which includes stochastic MDPs such as blackjack and deterministic MDPs such as Go, the Q-function in MC-UCB converges almost surely to the optimal Q function. An immediate corollary of this result is that it also converges almost surely for all finite-horizon MDPs. We also provide numerical experiments, providing further insights into MC-UCB.
    LPGNet: Link Private Graph Networks for Node Classification. (arXiv:2205.03105v2 [cs.LG] UPDATED)
    Classification tasks on labeled graph-structured data have many important applications ranging from social recommendation to financial modeling. Deep neural networks are increasingly being used for node classification on graphs, wherein nodes with similar features have to be given the same label. Graph convolutional networks (GCNs) are one such widely studied neural network architecture that perform well on this task. However, powerful link-stealing attacks on GCNs have recently shown that even with black-box access to the trained model, inferring which links (or edges) are present in the training graph is practical. In this paper, we present a new neural network architecture called LPGNet for training on graphs with privacy-sensitive edges. LPGNet provides differential privacy (DP) guarantees for edges using a novel design for how graph edge structure is used during training. We empirically show that LPGNet models often lie in the sweet spot between providing privacy and utility: They can offer better utility than "trivially" private architectures which use no edge information (e.g., vanilla MLPs) and better resilience against existing link-stealing attacks than vanilla GCNs which use the full edge structure. LPGNet also offers consistently better privacy-utility tradeoffs than DPGCN, which is the state-of-the-art mechanism for retrofitting differential privacy into conventional GCNs, in most of our evaluated datasets.
    A Data-driven Reduced Order Modeling Approach Applied In Context Of Numerical Analysis And Optimization Of Plastic Profile Extrusion. (arXiv:2209.03121v1 [math.NA])
    In course of this work, we examine the process of plastic profile extrusion, where a polymer melt is shaped inside the so-called extrusion die and fixed in its shape by solidification in the downstream calibration unit. More precise, we focus on the development of a data-driven reduced order model (ROM) for the purpose of predicting temperature distributions within the extruded profiles inside the calibration unit. Therein, the ROM functions as a first step to our overall goal of prediction based process control in order to avoid undesired warpage and damages of the final product.
    On the Effectiveness of Compact Biomedical Transformers. (arXiv:2209.03182v1 [cs.CL])
    Language models pre-trained on biomedical corpora, such as BioBERT, have recently shown promising results on downstream biomedical tasks. Many existing pre-trained models, on the other hand, are resource-intensive and computationally heavy owing to factors such as embedding size, hidden dimension, and number of layers. The natural language processing (NLP) community has developed numerous strategies to compress these models utilising techniques such as pruning, quantisation, and knowledge distillation, resulting in models that are considerably faster, smaller, and subsequently easier to use in practice. By the same token, in this paper we introduce six lightweight models, namely, BioDistilBERT, BioTinyBERT, BioMobileBERT, DistilBioBERT, TinyBioBERT, and CompactBioBERT which are obtained either by knowledge distillation from a biomedical teacher or continual learning on the Pubmed dataset via the Masked Language Modelling (MLM) objective. We evaluate all of our models on three biomedical tasks and compare them with BioBERT-v1.1 to create efficient lightweight models that perform on par with their larger counterparts. All the models will be publicly available on our Huggingface profile at https://huggingface.co/nlpie and the codes used to run the experiments will be available at https://github.com/nlpie-research/Compact-Biomedical-Transformers.
    Out of Distribution Detection, Generalization, and Robustness Triangle with Maximum Probability Theorem. (arXiv:2203.12145v2 [cs.LG] UPDATED)
    Maximum Probability Framework, powered by Maximum Probability Theorem, is a recent theoretical development in artificial intelligence, aiming to formally define probabilistic models, guiding development of objective functions, and regularization of probabilistic models. MPT uses the probability distribution that the models assume on random variables to provide an upper bound on the probability of the model. We apply MPT to challenging out-of-distribution (OOD) detection problems in computer vision by incorporating MPT as a regularization scheme in the training of CNNs and their energy-based variants. We demonstrate the effectiveness of the proposed method on 1080 trained models, with varying hyperparameters, and conclude that the MPT-based regularization strategy stabilizes and improves the generalization and robustness of base models in addition to enhanced OOD performance on CIFAR10, CIFAR100, and MNIST datasets.
    Learning Canonical Embeddings for Unsupervised Shape Correspondence with Locally Linear Transformations. (arXiv:2209.02152v2 [cs.CV] UPDATED)
    We present a new approach to unsupervised shape correspondence learning between pairs of point clouds. We make the first attempt to adapt the classical locally linear embedding algorithm (LLE) -- originally designed for nonlinear dimensionality reduction -- for shape correspondence. The key idea is to find dense correspondences between shapes by first obtaining high-dimensional neighborhood-preserving embeddings of low-dimensional point clouds and subsequently aligning the source and target embeddings using locally linear transformations. We demonstrate that learning the embedding using a new LLE-inspired point cloud reconstruction objective results in accurate shape correspondences. More specifically, the approach comprises an end-to-end learnable framework of extracting high-dimensional neighborhood-preserving embeddings, estimating locally linear transformations in the embedding space, and reconstructing shapes via divergence measure-based alignment of probabilistic density functions built over reconstructed and target shapes. Our approach enforces embeddings of shapes in correspondence to lie in the same universal/canonical embedding space, which eventually helps regularize the learning process and leads to a simple nearest neighbors approach between shape embeddings for finding reliable correspondences. Comprehensive experiments show that the new method makes noticeable improvements over state-of-the-art approaches on standard shape correspondence benchmark datasets covering both human and nonhuman shapes.  ( 3 min )
    Generative Principal Component Analysis. (arXiv:2203.09693v2 [stat.ML] UPDATED)
    In this paper, we study the problem of principal component analysis with generative modeling assumptions, adopting a general model for the observed matrix that encompasses notable special cases, including spiked matrix recovery and phase retrieval. The key assumption is that the underlying signal lies near the range of an $L$-Lipschitz continuous generative model with bounded $k$-dimensional inputs. We propose a quadratic estimator, and show that it enjoys a statistical rate of order $\sqrt{\frac{k\log L}{m}}$, where $m$ is the number of samples. We also provide a near-matching algorithm-independent lower bound. Moreover, we provide a variant of the classic power method, which projects the calculated data onto the range of the generative model during each iteration. We show that under suitable conditions, this method converges exponentially fast to a point achieving the above-mentioned statistical rate. We perform experiments on various image datasets for spiked matrix and phase retrieval models, and illustrate performance gains of our method to the classic power method and the truncated power method devised for sparse principal component analysis.
    POLTER: Policy Trajectory Ensemble Regularization for Unsupervised Reinforcement Learning. (arXiv:2205.11357v2 [cs.LG] UPDATED)
    The goal of Unsupervised Reinforcement Learning (URL) is to find a reward-agnostic prior policy on a task domain, such that the sample-efficiency on supervised downstream tasks is improved. Although agents initialized with such a prior policy can achieve a significantly higher reward with fewer samples when finetuned on the downstream task, it is still an open question how an optimal pretrained prior policy can be achieved in practice. In this work, we present POLTER (Policy Trajectory Ensemble Regularization) - a general method to regularize the pretraining that can be applied to any URL algorithm and is especially useful on data- and knowledge-based URL algorithms. It utilizes an ensemble of policies that are discovered during pretraining and moves the policy of the URL algorithm closer to its optimal prior. Our method is based on a theoretical framework, and we analyze its practical effects on a white-box benchmark, allowing us to study POLTER with full control. In our main experiments, we evaluate POLTER on the Unsupervised Reinforcement Learning Benchmark (URLB), which consists of 12 tasks in 3 domains. We demonstrate the generality of our approach by improving the performance of a diverse set of data- and knowledge-based URL algorithms by 19% on average and up to 40% in the best case. Under a fair comparison with tuned baselines and tuned POLTER, we establish a new the state-of-the-art on the URLB.
    Inversion of Time-Lapse Surface Gravity Data for Detection of 3D CO$_2$ Plumes via Deep Learning. (arXiv:2209.02850v1 [cs.LG])
    We introduce three algorithms that invert simulated gravity data to 3D subsurface rock/flow properties. The first algorithm is a data-driven, deep learning-based approach, the second mixes a deep learning approach with physical modeling into a single workflow, and the third considers the time dependence of surface gravity monitoring. The target application of these proposed algorithms is the prediction of subsurface CO$_2$ plumes as a complementary tool for monitoring CO$_2$ sequestration deployments. Each proposed algorithm outperforms traditional inversion methods and produces high-resolution, 3D subsurface reconstructions in near real-time. Our proposed methods achieve Dice scores of up to 0.8 for predicted plume geometry and near perfect data misfit in terms of $\mu$Gals. These results indicate that combining 4D surface gravity monitoring with deep learning techniques represents a low-cost, rapid, and non-intrusive method for monitoring CO$_2$ storage sites.
    Reconstructing signed relations from interaction data. (arXiv:2209.03219v1 [cs.SI])
    Positive and negative relations play an essential role in human behavior and shape the communities we live in. Despite their importance, data about signed relations is rare and commonly gathered through surveys. Interaction data is more abundant, for instance, in the form of proximity or communication data. So far, though, it could not be utilized to detect signed relations. In this paper, we show how the underlying signed relations can be extracted with such data. Employing a statistical network approach, we construct networks of signed relations in four communities. We then show that these relations correspond to the ones reported in surveys. Additionally, the inferred relations allow us to study the homophily of individuals with respect to gender, religious beliefs, and financial backgrounds. We evaluate the importance of triads in the signed network to study group cohesion.
    Transfer-Tuning: Reusing Auto-Schedules for Efficient Tensor Program Code Generation. (arXiv:2201.05587v2 [cs.LG] UPDATED)
    Auto-scheduling for tensor programs is a process where a search algorithm automatically explores candidate schedules (program transformations) for a given program on a target hardware platform to improve its performance. However this can be a very time consuming process depending on the complexity of the tensor program and the capacity of the target device, with often many thousands of program variants being explored. To address this, in this paper we introduce the idea of transfer-tuning, a novel approach to identify and reuse auto-schedules between tensor programs. We demonstrate this concept using Deep Neural Networks (DNNs), taking sets of auto-schedules from pre-tuned DNNs and using them to reduce the inference time of a new DNN. We compare transfer-tuning against the state-of-the-art Ansor auto-scheduler, defining the maximum possible speedup for a given DNN model as what Ansor achieves using its recommended full tuning time. On a server-class CPU and across 11 widely used DNN models, we observe that transfer-tuning achieves up to $88.41\%$ ($49.13\%$ on average) of this maximum speedup, while Ansor requires $6.5\times$ more search time on average to match it. We also evaluate transfer-tuning on a constrained edge CPU and observe that the differences in search time are exacerbated, with Ansor requiring $10.8\times$ more time on average to match transfer-tuning's speedup, which further demonstrates its value. Our code is available at https://www.github.com/gicLAB/transfer-tuning
    Understanding microbiome dynamics via interpretable graph representation learning. (arXiv:2203.01830v2 [q-bio.QM] UPDATED)
    Large-scale perturbations in the microbiome constitution are strongly correlated, whether as a driver or a consequence, with the health and functioning of human physiology. However, understanding the difference in the microbiome profiles of healthy and ill individuals can be complicated due to the large number of complex interactions among microbes. We propose to model these interactions as a time-evolving graph whose nodes are microbes and edges are interactions among them. Motivated by the need to analyse such complex interactions, we develop a method that learns a low-dimensional representation of the time-evolving graph and maintains the dynamics occurring in the high-dimensional space. Through our experiments, we show that we can extract graph features such as clusters of nodes or edges that have the highest impact on the model to learn the low-dimensional representation. This information can be crucial to identify microbes and interactions among them that are strongly correlated with clinical diseases. We conduct our experiments on both synthetic and real-world microbiome datasets.
    Non-Gaussian Process Regression. (arXiv:2209.03117v1 [stat.ML])
    Standard GPs offer a flexible modelling tool for well-behaved processes. However, deviations from Gaussianity are expected to appear in real world datasets, with structural outliers and shocks routinely observed. In these cases GPs can fail to model uncertainty adequately and may over-smooth inferences. Here we extend the GP framework into a new class of time-changed GPs that allow for straightforward modelling of heavy-tailed non-Gaussian behaviours, while retaining a tractable conditional GP structure through an infinite mixture of non-homogeneous GPs representation. The conditional GP structure is obtained by conditioning the observations on a latent transformed input space and the random evolution of the latent transformation is modelled using a L\'{e}vy process which allows Bayesian inference in both the posterior predictive density and the latent transformation function. We present Markov chain Monte Carlo inference procedures for this model and demonstrate the potential benefits compared to a standard GP.
    On the Convergence of the ELBO to Entropy Sums. (arXiv:2209.03077v1 [stat.ML])
    The variational lower bound (a.k.a. ELBO or free energy) is the central objective for many learning algorithms including algorithms for deep unsupervised learning. Learning algorithms change model parameters such that the variational lower bound increases, and until the parameters are close to a stationary point of the learning dynamics. In this purely theoretical contribution, we show that (for a very large class of generative models) the variational lower bound is at all stationary points of learning equal to a sum of entropies. For models with one set of latents and one set observed variables, the sum consists of three entropies: (A) the (average) entropy of the variational distributions, (B) the negative entropy of the model's prior distribution, and (C) the (expected) negative entropy of the observable distributions. The obtained result applies under realistic conditions including: finite numbers of data points, at any stationary points (including saddle points) and for any family of (well behaved) variational distributions. The class of generative models for which we show the equality to entropy sums contains many (and presumably most) standard generative models (including deep models). As concrete examples we discuss probabilistic PCA and Sigmoid Belief Networks. The prerequisites we use to show equality to entropy sums are relatively mild. Concretely, the distributions of a given generative model have to be of the exponential family (with constant base measure), and a model has to satisfy a parameterization criterion (which is usually fulfilled). Proving the equality of the ELBO to entropy sums at stationary points (under the stated conditions) is the main contribution of this work.
    A Comprehensive Survey on Radio Frequency (RF) Fingerprinting: Traditional Approaches, Deep Learning, and Open Challenges. (arXiv:2201.00680v3 [cs.LG] UPDATED)
    Fifth generation (5G) network and beyond envision massive Internet of Things (IoT) rollout to support disruptive applications such as extended reality (XR), augmented/virtual reality (AR/VR), industrial automation, autonomous driving, and smart everything which brings together massive and diverse IoT devices occupying the radio frequency (RF) spectrum. Along with the spectrum crunch and throughput challenges, such a massive scale of wireless devices exposes unprecedented threat surfaces. RF fingerprinting is heralded as a candidate technology that can be combined with cryptographic and zero-trust security measures to ensure data privacy, confidentiality, and integrity in wireless networks. Motivated by the relevance of this subject in the future communication networks, in this work, we present a comprehensive survey of RF fingerprinting approaches ranging from a traditional view to the most recent deep learning (DL)-based algorithms. Existing surveys have mostly focused on a constrained presentation of the wireless fingerprinting approaches, however, many aspects remain untold. In this work, however, we mitigate this by addressing every aspect - background on signal intelligence (SIGINT), applications, relevant DL algorithms, systematic literature review of RF fingerprinting techniques spanning the past two decades, discussion on datasets, and potential research avenues - necessary to elucidate this topic to the reader in an encyclopedic manner.
    DANets: Deep Abstract Networks for Tabular Data Classification and Regression. (arXiv:2112.02962v4 [cs.LG] UPDATED)
    Tabular data are ubiquitous in real world applications. Although many commonly-used neural components (e.g., convolution) and extensible neural networks (e.g., ResNet) have been developed by the machine learning community, few of them were effective for tabular data and few designs were adequately tailored for tabular data structures. In this paper, we propose a novel and flexible neural component for tabular data, called Abstract Layer (AbstLay), which learns to explicitly group correlative input features and generate higher-level features for semantics abstraction. Also, we design a structure re-parameterization method to compress the learned AbstLay, thus reducing the computational complexity by a clear margin in the reference phase. A special basic block is built using AbstLays, and we construct a family of Deep Abstract Networks (DANets) for tabular data classification and regression by stacking such blocks. In DANets, a special shortcut path is introduced to fetch information from raw tabular features, assisting feature interactions across different levels. Comprehensive experiments on seven real-world tabular datasets show that our AbstLay and DANets are effective for tabular data classification and regression, and the computational complexity is superior to competitive methods. Besides, we evaluate the performance gains of DANet as it goes deep, verifying the extendibility of our method. Our code is available at https://github.com/WhatAShot/DANet.
    TOHAN: A One-step Approach towards Few-shot Hypothesis Adaptation. (arXiv:2106.06326v2 [cs.LG] UPDATED)
    In few-shot domain adaptation (FDA), classifiers for the target domain are trained with accessible labeled data in the source domain (SD) and few labeled data in the target domain (TD). However, data usually contain private information in the current era, e.g., data distributed on personal phones. Thus, the private information will be leaked if we directly access data in SD to train a target-domain classifier (required by FDA methods). In this paper, to thoroughly prevent the privacy leakage in SD, we consider a very challenging problem setting, where the classifier for the TD has to be trained using few labeled target data and a well-trained SD classifier, named few-shot hypothesis adaptation (FHA). In FHA, we cannot access data in SD, as a result, the private information in SD will be protected well. To this end, we propose a target orientated hypothesis adaptation network (TOHAN) to solve the FHA problem, where we generate highly-compatible unlabeled data (i.e., an intermediate domain) to help train a target-domain classifier. TOHAN maintains two deep networks simultaneously, where one focuses on learning an intermediate domain and the other takes care of the intermediate-to-target distributional adaptation and the target-risk minimization. Experimental results show that TOHAN outperforms competitive baselines significantly.
    Neural-Symbolic Models for Logical Queries on Knowledge Graphs. (arXiv:2205.10128v2 [cs.AI] UPDATED)
    Answering complex first-order logic (FOL) queries on knowledge graphs is a fundamental task for multi-hop reasoning. Traditional symbolic methods traverse a complete knowledge graph to extract the answers, which provides good interpretation for each step. Recent neural methods learn geometric embeddings for complex queries. These methods can generalize to incomplete knowledge graphs, but their reasoning process is hard to interpret. In this paper, we propose Graph Neural Network Query Executor (GNN-QE), a neural-symbolic model that enjoys the advantages of both worlds. GNN-QE decomposes a complex FOL query into relation projections and logical operations over fuzzy sets, which provides interpretability for intermediate variables. To reason about the missing links, GNN-QE adapts a graph neural network from knowledge graph completion to execute the relation projections, and models the logical operations with product fuzzy logic. Experiments on 3 datasets show that GNN-QE significantly improves over previous state-of-the-art models in answering FOL queries. Meanwhile, GNN-QE can predict the number of answers without explicit supervision, and provide visualizations for intermediate variables.  ( 2 min )
    Morphology-preserving Autoregressive 3D Generative Modelling of the Brain. (arXiv:2209.03177v1 [eess.IV])
    Human anatomy, morphology, and associated diseases can be studied using medical imaging data. However, access to medical imaging data is restricted by governance and privacy concerns, data ownership, and the cost of acquisition, thus limiting our ability to understand the human body. A possible solution to this issue is the creation of a model able to learn and then generate synthetic images of the human body conditioned on specific characteristics of relevance (e.g., age, sex, and disease status). Deep generative models, in the form of neural networks, have been recently used to create synthetic 2D images of natural scenes. Still, the ability to produce high-resolution 3D volumetric imaging data with correct anatomical morphology has been hampered by data scarcity and algorithmic and computational limitations. This work proposes a generative model that can be scaled to produce anatomically correct, high-resolution, and realistic images of the human brain, with the necessary quality to allow further downstream analyses. The ability to generate a potentially unlimited amount of data not only enables large-scale studies of human anatomy and pathology without jeopardizing patient privacy, but also significantly advances research in the field of anomaly detection, modality synthesis, learning under limited data, and fair and ethical AI. Code and trained models are available at: https://github.com/AmigoLab/SynthAnatomy.  ( 3 min )
    Ultra-low-power Range Error Mitigation for Ultra-wideband Precise Localization. (arXiv:2209.03021v1 [cs.LG])
    Precise and accurate localization in outdoor and indoor environments is a challenging problem that currently constitutes a significant limitation for several practical applications. Ultra-wideband (UWB) localization technology represents a valuable low-cost solution to the problem. However, non-line-of-sight (NLOS) conditions and complexity of the specific radio environment can easily introduce a positive bias in the ranging measurement, resulting in highly inaccurate and unsatisfactory position estimation. In the light of this, we leverage the latest advancement in deep neural network optimization techniques and their implementation on ultra-low-power microcontrollers to introduce an effective range error mitigation solution that provides corrections in either NLOS or LOS conditions with a few mW of power. Our extensive experimentation endorses the advantages and improvements of our low-cost and power-efficient methodology.
    Semantic Interactive Learning for Text Classification: A Constructive Approach for Contextual Interactions. (arXiv:2209.02984v1 [cs.HC])
    Interactive Machine Learning (IML) shall enable intelligent systems to interactively learn from their end-users, and is quickly becoming more and more important. Although it puts the human in the loop, interactions are mostly performed via mutual explanations that miss contextual information. Furthermore, current model-agnostic IML strategies like CAIPI are limited to 'destructive' feedback, meaning they solely allow an expert to prevent a learner from using irrelevant features. In this work, we propose a novel interaction framework called Semantic Interactive Learning for the text domain. We frame the problem of incorporating constructive and contextual feedback into the learner as a task to find an architecture that (a) enables more semantic alignment between humans and machines and (b) at the same time helps to maintain statistical characteristics of the input domain when generating user-defined counterexamples based on meaningful corrections. Therefore, we introduce a technique called SemanticPush that is effective for translating conceptual corrections of humans to non-extrapolating training examples such that the learner's reasoning is pushed towards the desired behavior. In several experiments, we show that our method clearly outperforms CAIPI, a state of the art IML strategy, in terms of Predictive Performance as well as Local Explanation Quality in downstream multi-class classification tasks.
    AudioLM: a Language Modeling Approach to Audio Generation. (arXiv:2209.03143v1 [cs.SD])
    We introduce AudioLM, a framework for high-quality audio generation with long-term consistency. AudioLM maps the input audio to a sequence of discrete tokens and casts audio generation as a language modeling task in this representation space. We show how existing audio tokenizers provide different trade-offs between reconstruction quality and long-term structure, and we propose a hybrid tokenization scheme to achieve both objectives. Namely, we leverage the discretized activations of a masked language model pre-trained on audio to capture long-term structure and the discrete codes produced by a neural audio codec to achieve high-quality synthesis. By training on large corpora of raw audio waveforms, AudioLM learns to generate natural and coherent continuations given short prompts. When trained on speech, and without any transcript or annotation, AudioLM generates syntactically and semantically plausible speech continuations while also maintaining speaker identity and prosody for unseen speakers. Furthermore, we demonstrate how our approach extends beyond speech by generating coherent piano music continuations, despite being trained without any symbolic representation of music.
    Bayesian learning of feature spaces for multitasks problems. (arXiv:2209.03028v1 [stat.ML])
    This paper presents a Bayesian framework to construct non-linear, parsimonious, shallow models for multitask regression. The proposed framework relies on the fact that Random Fourier Features (RFFs) enables the approximation of an RBF kernel by an extreme learning machine whose hidden layer is formed by RFFs. The main idea is to combine both dual views of a same model under a single Bayesian formulation that extends the Sparse Bayesian Extreme Learning Machines to multitask problems. From the kernel methods point of view, the proposed formulation facilitates the introduction of prior domain knowledge through the RBF kernel parameter. From the extreme learning machines perspective, the new formulation helps control overfitting and enables a parsimonious overall model (the models that serve each task share a same set of RFFs selected within the joint Bayesian optimisation). The experimental results show that combining advantages from kernel methods and extreme learning machines within the same framework can lead to significant improvements in the performance achieved by each of these two paradigms independently.  ( 2 min )
    A learning theory for quantum photonic processors and beyond. (arXiv:2209.03075v1 [quant-ph])
    We consider the tasks of learning quantum states, measurements and channels generated by continuous-variable (CV) quantum circuits. This family of circuits is suited to describe optical quantum technologies and in particular it includes state-of-the-art photonic processors capable of showing quantum advantage. We define classes of functions that map classical variables, encoded into the CV circuit parameters, to outcome probabilities evaluated on those circuits. We then establish efficient learnability guarantees for such classes, by computing bounds on their pseudo-dimension or covering numbers, showing that CV quantum circuits can be learned with a sample complexity that scales polynomially with the circuit's size, i.e., the number of modes. Our results establish that CV circuits can be trained efficiently using a number of training samples that, unlike their finite-dimensional counterpart, does not scale with the circuit depth.  ( 2 min )
    A modified SIRD model for Covid19 spread prediction using ensemble neural networks. (arXiv:2203.00407v2 [cs.LG] UPDATED)
    In this paper, we propose an analysis of Covid19 evolution and prediction on Romania combined with the mathematical model of SIRD, an extension of the classical model SIR, which includes the deceased as a separate category. The reason is that, because we can not fully trust the reported numbers of infected or recovered people, we base our analysis on the more reliable number of deceased people. In addition, one of the parameters of our model includes the proportion of infected and tested versus infected. Since there are many factors which have an impact on the evolution of the pandemic, we decide to treat the estimation and the prediction based on the previous 7 days of data, particularly important here being the number of deceased. We perform the estimation and prediction using neural networks in two steps. Firstly, by simulating data with our model, we train several neural networks which learn the parameters of the model. Secondly, we use an ensemble of ten of these neural networks to forecast the parameters from the real data of Covid19 in Romania. Many of these results are backed up by a theorem which guarantees that we can recover the parameters from the reported data.
    Optimal Sensor Placement in Body Surface Networks using Gaussian Processes. (arXiv:2209.02912v1 [cs.LG])
    This paper explores a new sequential selection framework for the optimal sensor placement (OSP) in Electrocardiography imaging networks (ECGI). The proposed methodology incorporates the use a recent experimental design method for the sequential selection of landmarkings on biological objects, namely, Gaussian process landmarking (GPLMK) for better exploration of the candidate sensors. The two experimental design methods work as a source of the training and the validation locations which is fitted using a spatiotemporal Gaussian process (STGP). The STGP is fitted using the training set to predict for the current validation set generated using GPLMK, and the sensor with the largest prediction absolute error is selected from the current validation set and added to the selected sensors. Next, a new validation set is generated and predicted using the current training set. The process continues until selecting a specific number of sensor locations. The study is conducted on a dataset of body surface potential mapping (BSPM) of 352 electrodes of four human subjects. A number of 30 sensor locations is selected using the proposed algorithm. The selected sensor locations achieved average $R^2 = 94.40 \%$ for estimating the whole-body QRS segment. The proposed method adds to design efforts for a more clinically practical ECGI system by improving its wearability and reduce the design cost as well.
    RF Fingerprinting Needs Attention: Multi-task Approach for Real-World WiFi and Bluetooth. (arXiv:2209.03142v1 [cs.LG])
    A novel cross-domain attentional multi-task architecture - xDom - for robust real-world wireless radio frequency (RF) fingerprinting is presented in this work. To the best of our knowledge, this is the first time such comprehensive attention mechanism is applied to solve RF fingerprinting problem. In this paper, we resort to real-world IoT WiFi and Bluetooth (BT) emissions (instead of synthetic waveform generation) in a rich multipath and unavoidable interference environment in an indoor experimental testbed. We show the impact of the time-frame of capture by including waveforms collected over a span of months and demonstrate the same time-frame and multiple time-frame fingerprinting evaluations. The effectiveness of resorting to a multi-task architecture is also experimentally proven by conducting single-task and multi-task model analyses. Finally, we demonstrate the significant gain in performance achieved with the proposed xDom architecture by benchmarking against a well-known state-of-the-art model for fingerprinting. Specifically, we report performance improvements by up to 59.3% and 4.91x under single-task WiFi and BT fingerprinting respectively, and up to 50.5% increase in fingerprinting accuracy under the multi-task setting.  ( 2 min )
    Federated Transfer Learning with Multimodal Data. (arXiv:2209.03137v1 [cs.LG])
    Smart cars, smartphones and other devices in the Internet of Things (IoT), which usually have more than one sensors, produce multimodal data. Federated Learning supports collecting a wealth of multimodal data from different devices without sharing raw data. Transfer Learning methods help transfer knowledge from some devices to others. Federated Transfer Learning methods benefit both Federated Learning and Transfer Learning. This newly proposed Federated Transfer Learning framework aims at connecting data islands with privacy protection. Our construction is based on Federated Learning and Transfer Learning. Compared with previous Federated Transfer Learnings, where each user should have data with identical modalities (either all unimodal or all multimodal), our new framework is more generic, it allows a hybrid distribution of user data. The core strategy is to use two different but inherently connected training methods for our two types of users. Supervised Learning is adopted for users with only unimodal data (Type 1), while Self-Supervised Learning is applied to user with multimodal data (Type 2) for both the feature of each modality and the connection between them. This connection knowledge of Type 2 will help Type 1 in later stages of training. Training in the new framework can be divided in three steps. In the first step, users who have data with the identical modalities are grouped together. For example, user with only sound signals are in group one, and those with only images are in group two, and users with multimodal data are in group three, and so on. In the second step, Federated Learning is executed within the groups, where Supervised Learning and Self-Supervised Learning are used depending on the group's nature. Most of the Transfer Learning happens in the third step, where the related parts in the network obtained from the previous steps are aggregated (federated).  ( 3 min )
    Manifold Oblique Random Forests: Towards Closing the Gap on Convolutional Deep Networks. (arXiv:1909.11799v5 [cs.LG] UPDATED)
    Decision forests (Forests), in particular random forests and gradient boosting trees, have demonstrated state-of-the-art accuracy compared to other methods in many supervised learning scenarios. In particular, Forests dominate other methods in tabular data, that is, when the feature space is unstructured, so that the signal is invariant to a permutation of the feature indices. However, in structured data lying on a manifold (such as images, text, and speech) deep networks (Networks), specifically convolutional deep networks (ConvNets), tend to outperform Forests. We conjecture that at least part of the reason for this is that the input to Networks is not simply the feature magnitudes, but also their indices. In contrast, naive Forest implementations fail to explicitly consider feature indices. A recently proposed Forest approach demonstrates that Forests, for each node, implicitly sample a random matrix from some specific distribution. These Forests, like some classes of Networks, learn by partitioning the feature space into convex polytopes corresponding to linear functions. We build on that approach and show that one can choose distributions in a manifold-aware fashion to incorporate feature locality. We demonstrate the empirical performance on data whose features live on three different manifolds: a torus, images, and time-series. Moreover, we demonstrate its strength in multivariate simulated settings and also show superiority in predicting surgical outcome in epilepsy patients and predicting movement direction from raw stereotactic EEG data from non-motor brain regions. In all simulations and real data, Manifold Oblique Random Forest (MORF) algorithm outperforms approaches that ignore feature space structure and challenges the performance of ConvNets. Moreover, MORF runs fast and maintains interpretability and theoretical justification.  ( 3 min )
    Multitask Learning via Shared Features: Algorithms and Hardness. (arXiv:2209.03112v1 [cs.LG])
    We investigate the computational efficiency of multitask learning of Boolean functions over the $d$-dimensional hypercube, that are related by means of a feature representation of size $k \ll d$ shared across all tasks. We present a polynomial time multitask learning algorithm for the concept class of halfspaces with margin $\gamma$, which is based on a simultaneous boosting technique and requires only $\textrm{poly}(k/\gamma)$ samples-per-task and $\textrm{poly}(k\log(d)/\gamma)$ samples in total. In addition, we prove a computational separation, showing that assuming there exists a concept class that cannot be learned in the attribute-efficient model, we can construct another concept class such that can be learned in the attribute-efficient model, but cannot be multitask learned efficiently -- multitask learning this concept class either requires super-polynomial time complexity or a much larger total number of samples.
    Open-Ended Evolution for Minecraft Building Generation. (arXiv:2209.03108v1 [cs.LG])
    This paper proposes a procedural content generator which evolves Minecraft buildings according to an open-ended and intrinsic definition of novelty. To realize this goal we evaluate individuals' novelty in the latent space using a 3D autoencoder, and alternate between phases of exploration and transformation. During exploration the system evolves multiple populations of CPPNs through CPPN-NEAT and constrained novelty search in the latent space (defined by the current autoencoder). We apply a set of repair and constraint functions to ensure candidates adhere to basic structural rules and constraints during evolution. During transformation, we reshape the boundaries of the latent space to identify new interesting areas of the solution space by retraining the autoencoder with novel content. In this study we evaluate five different approaches for training the autoencoder during transformation and its impact on populations' quality and diversity during evolution. Our results show that by retraining the autoencoder we can achieve better open-ended complexity compared to a static model, which is further improved when retraining using larger datasets of individuals with diverse complexities.
    Quantum-machine-learning channel discrimination. (arXiv:2206.09933v2 [quant-ph] UPDATED)
    In the problem of quantum channel discrimination, one distinguishes between a given number of quantum channels, which is done by sending an input state through a channel and measuring the output state. This work studies applications of variational quantum circuits and machine learning techniques for discriminating such channels. In particular, we explore (i) the practical implementation of embedding this task into the framework of variational quantum computing, (ii) training a quantum classifier based on variational quantum circuits, and (iii) applying the quantum kernel estimation technique. For testing these three channel discrimination approaches, we considered a pair of entanglement-breaking channels and the depolarizing channel with two different depolarization factors. For the approach (i), we address solving the quantum channel discrimination problem using widely discussed parallel and sequential strategies. We show the advantage of the latter in terms of better convergence with less quantum resources. Quantum channel discrimination with a variational quantum classifier (ii) allows one to operate even with random and mixed input states and simple variational circuits. The kernel-based classification approach (iii) is also found effective as it allows one to discriminate depolarizing channels associated not with just fixed values of the depolarization factor, but with ranges of it. Additionally, we discovered that a simple modification of one of the commonly used kernels significantly increases the efficiency of this approach. Finally, our numerical findings reveal that the performance of variational methods of channel discrimination depends on the trace of the product of the output states. These findings demonstrate that quantum machine learning can be used to discriminate channels, such as those representing physical noise processes.  ( 3 min )
    Efficient search of active inference policy spaces using k-means. (arXiv:2209.02550v2 [cs.LG] UPDATED)
    We develop an approach to policy selection in active inference that allows us to efficiently search large policy spaces by mapping each policy to its embedding in a vector space. We sample the expected free energy of representative points in the space, then perform a more thorough policy search around the most promising point in this initial sample. We consider various approaches to creating the policy embedding space, and propose using k-means clustering to select representative points. We apply our technique to a goal-oriented graph-traversal problem, for which naive policy selection is intractable for even moderately large graphs.
    Shifting Perspective to See Difference: A Novel Multi-View Method for Skeleton based Action Recognition. (arXiv:2209.02986v1 [cs.CV])
    Skeleton-based human action recognition is a longstanding challenge due to its complex dynamics. Some fine-grain details of the dynamics play a vital role in classification. The existing work largely focuses on designing incremental neural networks with more complicated adjacent matrices to capture the details of joints relationships. However, they still have difficulties distinguishing actions that have broadly similar motion patterns but belong to different categories. Interestingly, we found that the subtle differences in motion patterns can be significantly amplified and become easy for audience to distinct through specified view directions, where this property haven't been fully explored before. Drastically different from previous work, we boost the performance by proposing a conceptually simple yet effective Multi-view strategy that recognizes actions from a collection of dynamic view features. Specifically, we design a novel Skeleton-Anchor Proposal (SAP) module which contains a Multi-head structure to learn a set of views. For feature learning of different views, we introduce a novel Angle Representation to transform the actions under different views and feed the transformations into the baseline model. Our module can work seamlessly with the existing action classification model. Incorporated with baseline models, our SAP module exhibits clear performance gains on many challenging benchmarks. Moreover, comprehensive experiments show that our model consistently beats down the state-of-the-art and remains effective and robust especially when dealing with corrupted data. Related code will be available on https://github.com/ideal-idea/SAP .
    TalkToModel: Understanding Machine Learning Models With Open Ended Dialogues. (arXiv:2207.04154v2 [cs.LG] UPDATED)
    Machine Learning (ML) models are increasingly used to make critical decisions in real-world applications, yet they have also become more complex, making them harder to understand. To this end, several techniques to explain model predictions have been proposed. However, practitioners struggle to leverage explanations because they often do not know which to use, how to interpret the results, and may have insufficient data science experience to obtain explanations. In addition, most current works focus on generating one-shot explanations and do not allow users to follow up and ask fine-grained questions about the explanations, which can be frustrating. In this work, we address these challenges by introducing TalkToModel: an open-ended dialogue system for understanding machine learning models. Specifically, TalkToModel comprises three key components: 1) a natural language interface for engaging in dialogues, making understanding ML models highly accessible, 2) a dialogue engine that adapts to any tabular model and dataset, interprets natural language, maps it to appropriate operations (e.g., feature importance explanations, counterfactual explanations, showing model errors), and generates text responses, and 3) an execution component that run the operations and ensures explanations are accurate. We carried out quantitative and human subject evaluations of TalkToModel. We found the system understands user questions on novel datasets and models with high accuracy, demonstrating the system's capacity to generalize to new situations. In human evaluations, 73% of healthcare workers (e.g., doctors and nurses) agreed they would use TalkToModel over baseline point-and-click systems, and 84.6% of ML graduate students agreed TalkToModel was easier to use.
    $1D$ to $nD$: A Meta Algorithm for Multivariate Global Optimization via Univariate Optimizers. (arXiv:2209.03246v1 [math.OC])
    In this work, we propose a meta algorithm that can solve a multivariate global optimization problem using univariate global optimizers. Although the univariate global optimization does not receive much attention compared to the multivariate case, which is more emphasized in academia and industry; we show that it is still relevant and can be directly used to solve problems of multivariate optimization. We also provide the corresponding regret bounds in terms of the time horizon $T$ and the average regret of the univariate optimizer, when it is robust against nonnegative noises with robust regret guarantees.  ( 2 min )
    Geometric multimodal representation learning. (arXiv:2209.03299v1 [cs.LG])
    Graph-centric artificial intelligence (graph AI) has achieved remarkable success in modeling interacting systems prevalent in nature, from dynamical systems in biology to particle physics. The increasing heterogeneity of data calls for graph neural architectures that can combine multiple inductive biases. However, combining data from various sources is challenging because appropriate inductive bias may vary by data modality. Multimodal learning methods fuse multiple data modalities while leveraging cross-modal dependencies to address this challenge. Here, we survey 140 studies in graph-centric AI and realize that diverse data types are increasingly brought together using graphs and fed into sophisticated multimodal models. These models stratify into image-, language-, and knowledge-grounded multimodal learning. We put forward an algorithmic blueprint for multimodal graph learning based on this categorization. The blueprint serves as a way to group state-of-the-art architectures that treat multimodal data by choosing appropriately four different components. This effort can pave the way for standardizing the design of sophisticated multimodal architectures for highly complex real-world problems.
    Machine Learning Partners in Criminal Networks. (arXiv:2209.03171v1 [physics.soc-ph])
    Recent research has shown that criminal networks have complex organizational structures, but whether this can be used to predict static and dynamic properties of criminal networks remains little explored. Here, by combining graph representation learning and machine learning methods, we show that structural properties of political corruption, police intelligence, and money laundering networks can be used to recover missing criminal partnerships, distinguish among different types of criminal and legal associations, as well as predict the total amount of money exchanged among criminal agents, all with outstanding accuracy. We also show that our approach can anticipate future criminal associations during the dynamic growth of corruption networks with significant accuracy. Thus, similar to evidence found at crime scenes, we conclude that structural patterns of criminal networks carry crucial information about illegal activities, which allows machine learning methods to predict missing information and even anticipate future criminal behavior.  ( 2 min )
    Combining Sequential and Aggregated Data for Churn Prediction in Casual Freemium Games. (arXiv:2209.03184v1 [cs.AI])
    In freemium games, the revenue from a player comes from the in-app purchases made and the advertisement to which that player is exposed. The longer a player is playing the game, the higher will be the chances that he or she will generate a revenue within the game. Within this scenario, it is extremely important to be able to detect promptly when a player is about to quit playing (churn) in order to react and attempt to retain the player within the game, thus prolonging his or her game lifetime. In this article we investigate how to improve the current state-of-the-art in churn prediction by combining sequential and aggregate data using different neural network architectures. The results of the comparative analysis show that the combination of the two data types grants an improvement in the prediction accuracy over predictors based on either purely sequential or purely aggregated data.
    A Survey of Machine Unlearning. (arXiv:2209.02299v2 [cs.LG] UPDATED)
    Computer systems hold a large amount of personal data over decades. On the one hand, such data abundance allows breakthroughs in artificial intelligence (AI), especially machine learning (ML) models. On the other hand, it can threaten the privacy of users and weaken the trust between humans and AI. Recent regulations require that private information about a user can be removed from computer systems in general and from ML models in particular upon request (e.g. the "right to be forgotten"). While removing data from back-end databases should be straightforward, it is not sufficient in the AI context as ML models often "remember" the old data. Existing adversarial attacks proved that we can learn private membership or attributes of the training data from the trained models. This phenomenon calls for a new paradigm, namely machine unlearning, to make ML models forget about particular data. It turns out that recent works on machine unlearning have not been able to solve the problem completely due to the lack of common frameworks and resources. In this survey paper, we seek to provide a thorough investigation of machine unlearning in its definitions, scenarios, mechanisms, and applications. Specifically, as a categorical collection of state-of-the-art research, we hope to provide a broad reference for those seeking a primer on machine unlearning and its various formulations, design requirements, removal requests, algorithms, and uses in a variety of ML applications. Furthermore, we hope to outline key findings and trends in the paradigm as well as highlight new areas of research that have yet to see the application of machine unlearning, but could nonetheless benefit immensely. We hope this survey provides a valuable reference for ML researchers as well as those seeking to innovate privacy technologies. Our resources are at https://github.com/tamlhp/awesome-machine-unlearning.
    Unsupervised Domain-adaptive Hash for Networks. (arXiv:2108.09136v2 [cs.LG] UPDATED)
    Abundant real-world data can be naturally represented by large-scale networks, which demands efficient and effective learning algorithms. At the same time, labels may only be available for some networks, which demands these algorithms to be able to adapt to unlabeled networks. Domain-adaptive hash learning has enjoyed considerable success in the computer vision community in many practical tasks due to its lower cost in both retrieval time and storage footprint. However, it has not been applied to multiple-domain networks. In this work, we bridge this gap by developing an unsupervised domain-adaptive hash learning method for networks, dubbed UDAH. Specifically, we develop four {task-specific yet correlated} components: (1) network structure preservation via a hard groupwise contrastive loss, (2) relaxation-free supervised hashing, (3) cross-domain intersected discriminators, and (4) semantic center alignment. We conduct a wide range of experiments to evaluate the effectiveness and efficiency of our method on a range of tasks including link prediction, node classification, and neighbor recommendation. Our evaluation results demonstrate that our model achieves better performance than the state-of-the-art conventional discrete embedding methods over all the tasks.  ( 2 min )
    A Machine Learning Analysis of Impact of the Covid-19 Pandemic on Alcohol Consumption Habit Changes Among Healthcare Workers in the U.S. (arXiv:2112.06261v3 [cs.LG] UPDATED)
    In this paper, we discuss the impact of the Covid-19 pandemic on alcohol consumption habit changes among healthcare workers in the United States. We utilize multiple supervised and unsupervised machine learning methods and models such as Decision Trees, Logistic Regression, Naive Bayes classifier, k-Nearest Neighbors, Support Vector Machines, Multilayer perceptron, XGBoost, CatBoost, LightGBM, Chi-Squared Test and mutual information method on a mental health survey data obtained from the University of Michigan Inter-University Consortium for Political and Social Research to find out relationships between COVID-19 related negative effects and alcohol consumption habit changes among healthcare workers. Our findings suggest that COVID-19-related school closures, COVID-19-related work schedule changes and COVID-related news exposure may lead to an increase in alcohol use among healthcare workers in the United States.
    Privacy-Preserving Federated Learning via System Immersion and Random Matrix Encryption. (arXiv:2204.02497v2 [cs.LG] UPDATED)
    Federated learning (FL) has emerged as a privacy solution for collaborative distributed learning where clients train AI models directly on their devices instead of sharing their data with a centralized (potentially adversarial) server. Although FL preserves local data privacy to some extent, it has been shown that information about clients' data can still be inferred from model updates. In recent years, various privacy-preserving schemes have been developed to address this privacy leakage. However, they often provide privacy at the expense of model performance or system efficiency, and balancing these tradeoffs is a crucial challenge when implementing FL schemes. In this manuscript, we propose a Privacy-Preserving Federated Learning (PPFL) framework built on the synergy of matrix encryption and system immersion tools from control theory. The idea is to immerse the learning algorithm, a Stochastic Gradient Decent (SGD), into a higher-dimensional system (the so-called target system) and design the dynamics of the target system so that: the trajectories of the original SGD are immersed/embedded in its trajectories, and it learns on encrypted data (here we use random matrix encryption). Matrix encryption is reformulated at the server as a random change of coordinates that maps original parameters to a higher-dimensional parameter space and enforces that the target SGD converges to an encrypted version of the original SGD optimal solution. The server decrypts the aggregated model using the left inverse of the immersion map. We show that our algorithm provides the same level of accuracy and convergence rate as the standard FL with a negligible computation cost while revealing no information about the clients' data.  ( 3 min )
    Benchmarking Multimodal Variational Autoencoders: GeBiD Dataset and Toolkit. (arXiv:2209.03048v1 [cs.LG])
    Multimodal Variational Autoencoders (VAEs) have been a subject of intense research in the past years as they can integrate multiple modalities into a joint representation and can thus serve as a promising tool for both data classification and generation. Several approaches toward multimodal VAE learning have been proposed so far, their comparison and evaluation have however been rather inconsistent. One reason is that the models differ at the implementation level, another problem is that the datasets commonly used in these cases were not initially designed for the evaluation of multimodal generative models. This paper addresses both mentioned issues. First, we propose a toolkit for systematic multimodal VAE training and comparison. Second, we present a synthetic bimodal dataset designed for a comprehensive evaluation of the joint generation and cross-generation capabilities. We demonstrate the utility of the dataset by comparing state-of-the-art models.
    Banknote Recognition for Visually Impaired People (Case of Ethiopian note). (arXiv:2209.03236v1 [cs.HC])
    Currency is used almost everywhere to facilitate business. In most developing countries, especially the ones in Africa, tangible notes are predominantly used in everyday financial transactions. One of these countries, Ethiopia, is believed to have one of the world highest rates of blindness (1.6%) and low vision (3.7%). There are around 4 million visually impaired people; With 1.7 million people being in complete vision loss. Those people face a number of challenges when they are in a bus station, in shopping centers, or anywhere which requires the physical exchange of money. In this paper, we try to provide a solution to this issue using AI/ML applications. We developed an Android and IOS compatible mobile application with a model that achieved 98.9% classification accuracy on our dataset. The application has a voice integrated feature that tells the type of the scanned currency in Amharic, the working language of Ethiopia. The application is developed to be easily accessible by its users. It is build to reduce the burden of visually impaired people in Ethiopia.
    Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow. (arXiv:2209.03003v1 [cs.LG])
    We present rectified flow, a surprisingly simple approach to learning (neural) ordinary differential equation (ODE) models to transport between two empirically observed distributions \pi_0 and \pi_1, hence providing a unified solution to generative modeling and domain transfer, among various other tasks involving distribution transport. The idea of rectified flow is to learn the ODE to follow the straight paths connecting the points drawn from \pi_0 and \pi_1 as much as possible. This is achieved by solving a straightforward nonlinear least squares optimization problem, which can be easily scaled to large models without introducing extra parameters beyond standard supervised learning. The straight paths are special and preferred because they are the shortest paths between two points, and can be simulated exactly without time discretization and hence yield computationally efficient models. We show that the procedure of learning a rectified flow from data, called rectification, turns an arbitrary coupling of \pi_0 and \pi_1 to a new deterministic coupling with provably non-increasing convex transport costs. In addition, recursively applying rectification allows us to obtain a sequence of flows with increasingly straight paths, which can be simulated accurately with coarse time discretization in the inference phase. In empirical studies, we show that rectified flow performs superbly on image generation, image-to-image translation, and domain adaptation. In particular, on image generation and translation, our method yields nearly straight flows that give high quality results even with a single Euler discretization step.
    Self-supervised multimodal neuroimaging yields predictive representations for a spectrum of Alzheimer's phenotypes. (arXiv:2209.02876v1 [cs.LG])
    Recent neuroimaging studies that focus on predicting brain disorders via modern machine learning approaches commonly include a single modality and rely on supervised over-parameterized models.However, a single modality provides only a limited view of the highly complex brain. Critically, supervised models in clinical settings lack accurate diagnostic labels for training. Coarse labels do not capture the long-tailed spectrum of brain disorder phenotypes, which leads to a loss of generalizability of the model that makes them less useful in diagnostic settings. This work presents a novel multi-scale coordinated framework for learning multiple representations from multimodal neuroimaging data. We propose a general taxonomy of informative inductive biases to capture unique and joint information in multimodal self-supervised fusion. The taxonomy forms a family of decoder-free models with reduced computational complexity and a propensity to capture multi-scale relationships between local and global representations of the multimodal inputs. We conduct a comprehensive evaluation of the taxonomy using functional and structural magnetic resonance imaging (MRI) data across a spectrum of Alzheimer's disease phenotypes and show that self-supervised models reveal disorder-relevant brain regions and multimodal links without access to the labels during pre-training. The proposed multimodal self-supervised learning yields representations with improved classification performance for both modalities. The concomitant rich and flexible unsupervised deep learning framework captures complex multimodal relationships and provides predictive performance that meets or exceeds that of a more narrow supervised classification analysis. We present elaborate quantitative evidence of how this framework can significantly advance our search for missing links in complex brain disorders.
    Adversarial Mask: Real-World Universal Adversarial Attack on Face Recognition Model. (arXiv:2111.10759v3 [cs.CV] UPDATED)
    Deep learning-based facial recognition (FR) models have demonstrated state-of-the-art performance in the past few years, even when wearing protective medical face masks became commonplace during the COVID-19 pandemic. Given the outstanding performance of these models, the machine learning research community has shown increasing interest in challenging their robustness. Initially, researchers presented adversarial attacks in the digital domain, and later the attacks were transferred to the physical domain. However, in many cases, attacks in the physical domain are conspicuous, and thus may raise suspicion in real-world environments (e.g., airports). In this paper, we propose Adversarial Mask, a physical universal adversarial perturbation (UAP) against state-of-the-art FR models that is applied on face masks in the form of a carefully crafted pattern. In our experiments, we examined the transferability of our adversarial mask to a wide range of FR model architectures and datasets. In addition, we validated our adversarial mask's effectiveness in real-world experiments (CCTV use case) by printing the adversarial pattern on a fabric face mask. In these experiments, the FR system was only able to identify 3.34% of the participants wearing the mask (compared to a minimum of 83.34% with other evaluated masks). A demo of our experiments can be found at: https://youtu.be/_TXkDO5z11w.
    Impact of Colour Variation on Robustness of Deep Neural Networks. (arXiv:2209.02832v1 [cs.CV])
    Deep neural networks (DNNs) have have shown state-of-the-art performance for computer vision applications like image classification, segmentation and object detection. Whereas recent advances have shown their vulnerability to manual digital perturbations in the input data, namely adversarial attacks. The accuracy of the networks is significantly affected by the data distribution of their training dataset. Distortions or perturbations on color space of input images generates out-of-distribution data, which make networks more likely to misclassify them. In this work, we propose a color-variation dataset by distorting their RGB color on a subset of the ImageNet with 27 different combinations. The aim of our work is to study the impact of color variation on the performance of DNNs. We perform experiments on several state-of-the-art DNN architectures on the proposed dataset, and the result shows a significant correlation between color variation and loss of accuracy. Furthermore, based on the ResNet50 architecture, we demonstrate some experiments of the performance of recently proposed robust training techniques and strategies, such as Augmix, revisit, and free normalizer, on our proposed dataset. Experimental results indicate that these robust training techniques can improve the robustness of deep networks to color variation.
    Risk of Bias in Chest X-ray Foundation Models. (arXiv:2209.02965v1 [cs.LG])
    Foundation models are considered a breakthrough in all applications of AI, promising robust and reusable mechanisms for feature extraction, alleviating the need for large amounts of high quality training data for task-specific prediction models. However, foundation models may potentially encode and even reinforce existing biases present in historic datasets. Given the limited ability to scrutinize foundation models, it remains unclear whether the opportunities outweigh the risks in safety critical applications such as clinical decision making. In our statistical bias analysis of a recently published, and publicly available chest X-ray foundation model, we found reasons for concern as the model seems to encode protected characteristics including biological sex and racial identity, which may lead to disparate performance across subgroups in downstream applications. While research into foundation models for healthcare applications is in an early stage, we believe it is important to make the community aware of these risks to avoid harm.
    Quantifying Aleatoric and Epistemic Uncertainty in Machine Learning: Are Conditional Entropy and Mutual Information Appropriate Measures?. (arXiv:2209.03302v1 [cs.LG])
    This short note is a critical discussion of the quantification of aleatoric and epistemic uncertainty in terms of conditional entropy and mutual information, respectively, which has recently been proposed in machine learning and has become quite common since then. More generally, we question the idea of an additive decomposition of total uncertainty into its aleatoric and epistemic constituents.  ( 2 min )
    Machine Learning Students Overfit to Overfitting. (arXiv:2209.03032v1 [cs.LG])
    Overfitting and generalization is an important concept in Machine Learning as only models that generalize are interesting for general applications. Yet some students have trouble learning this important concept through lectures and exercises. In this paper we describe common examples of students misunderstanding overfitting, and provide recommendations for possible solutions. We cover student misconceptions about overfitting, about solutions to overfitting, and implementation mistakes that are commonly confused with overfitting issues. We expect that our paper can contribute to improving student understanding and lectures about this important topic.
    Autonomous Cooking with Digital Twin Methodology. (arXiv:2209.03087v1 [cs.CE])
    This work introduces the concept of an autonomous cooking process based on Digital Twin method- ology. It proposes a hybrid approach of physics-based full order simulations followed by a data-driven system identification process with low errors. It makes faster-than-real-time simulations of Digital Twins feasible on a device level, without the need for cloud or high-performance computing. The concept is universally applicable to various physical processes.
    Lyapunov function approach for approximation algorithm design and analysis: with applications in submodular maximization. (arXiv:2205.12442v6 [math.OC] UPDATED)
    We propose a two-phase systematical framework for approximation algorithm design and analysis via Lyapunov function. The first phase consists of using Lyapunov function as an input and outputs a continuous-time approximation algorithm with a provable approximation ratio. The second phase then converts this continuous-time algorithm to a discrete-time algorithm with almost the same approximation ratio along with provable time complexity. One distinctive feature of our framework is that we only need to know the parametric form of the Lyapunov function whose complete specification will not be decided until the end of the first phase by maximizing the approximation ratio of the continuous-time algorithm. Some immediate benefits of the Lyapunov function approach include: (i) unifying many existing algorithms; (ii) providing a guideline to design and analyze new algorithms; and (iii) offering new perspectives to potentially improve existing algorithms. We use various submodular maximization problems as running examples to illustrate our framework.  ( 3 min )
    Multimodal Speech Enhancement Using Burst Propagation. (arXiv:2209.03275v1 [cs.SD])
    This paper proposes the MBURST, a novel multimodal solution for audio-visual speech enhancements that consider the most recent neurological discoveries regarding pyramidal cells of the prefrontal cortex and other brain regions. The so-called burst propagation implements several criteria to address the credit assignment problem in a more biologically plausible manner: steering the sign and magnitude of plasticity through feedback, multiplexing the feedback and feedforward information across layers through different weight connections, approximating feedback and feedforward connections, and linearizing the feedback signals. MBURST benefits from such capabilities to learn correlations between the noisy signal and the visual stimuli, thus attributing meaning to the speech by amplifying relevant information and suppressing noise. Experiments conducted over a Grid Corpus and CHiME3-based dataset show that MBURST can reproduce similar mask reconstructions to the multimodal backpropagation-based baseline while demonstrating outstanding energy efficiency management, reducing the neuron firing rates to values up to \textbf{$70\%$} lower. Such a feature implies more sustainable implementations, suitable and desirable for hearing aids or any other similar embedded systems.  ( 2 min )
    Concept-modulated model-based offline reinforcement learning for rapid generalization. (arXiv:2209.03207v1 [cs.LG])
    The robustness of any machine learning solution is fundamentally bound by the data it was trained on. One way to generalize beyond the original training is through human-informed augmentation of the original dataset; however, it is impossible to specify all possible failure cases that can occur during deployment. To address this limitation we combine model-based reinforcement learning and model-interpretability methods to propose a solution that self-generates simulated scenarios constrained by environmental concepts and dynamics learned in an unsupervised manner. In particular, an internal model of the agent's environment is conditioned on low-dimensional concept representations of the input space that are sensitive to the agent's actions. We demonstrate this method within a standard realistic driving simulator in a simple point-to-point navigation task, where we show dramatic improvements in one-shot generalization to different instances of specified failure cases as well as zero-shot generalization to similar variations compared to model-based and model-free approaches.  ( 2 min )
    A Case Study on the Classification of Lost Circulation Events During Drilling using Machine Learning Techniques on an Imbalanced Large Dataset. (arXiv:2209.01607v2 [cs.LG] UPDATED)
    This study presents machine learning models that forecast and categorize lost circulation severity preemptively using a large class imbalanced drilling dataset. We demonstrate reproducible core techniques involved in tackling a large drilling engineering challenge utilizing easily interpretable machine learning approaches. We utilized a 65,000+ records data with class imbalance problem from Azadegan oilfield formations in Iran. Eleven of the dataset's seventeen parameters are chosen to be used in the classification of five lost circulation events. To generate classification models, we used six basic machine learning algorithms and four ensemble learning methods. Linear Discriminant Analysis (LDA), Logistic Regression (LR), Support Vector Machines (SVM), Classification and Regression Trees (CART), k-Nearest Neighbors (KNN), and Gaussian Naive Bayes (GNB) are the six fundamental techniques. We also used bagging and boosting ensemble learning techniques in the investigation of solutions for improved predicting performance. The performance of these algorithms is measured using four metrics: accuracy, precision, recall, and F1-score. The F1-score weighted to represent the data imbalance is chosen as the preferred evaluation criterion. The CART model was found to be the best in class for identifying drilling fluid circulation loss events with an average weighted F1-score of 0.9904 and standard deviation of 0.0015. Upon application of ensemble learning techniques, a Random Forest ensemble of decision trees showed the best predictive performance. It identified and classified lost circulation events with a perfect weighted F1-score of 1.0. Using Permutation Feature Importance (PFI), the measured depth was found to be the most influential factor in accurately recognizing lost circulation events while drilling.  ( 3 min )
    What does a platypus look like? Generating customized prompts for zero-shot image classification. (arXiv:2209.03320v1 [cs.CV])
    Open vocabulary models are a promising new paradigm for image classification. Unlike traditional classification models, open vocabulary models classify among any arbitrary set of categories specified with natural language during inference. This natural language, called "prompts", typically consists of a set of hand-written templates (e.g., "a photo of a {}") which are completed with each of the category names. This work introduces a simple method to generate higher accuracy prompts, without using explicit knowledge of the image domain and with far fewer hand-constructed sentences. To achieve this, we combine open vocabulary models with large language models (LLMs) to create Customized Prompts via Language models (CuPL, pronounced "couple"). In particular, we leverage the knowledge contained in LLMs in order to generate many descriptive sentences that are customized for each object category. We find that this straightforward and general approach improves accuracy on a range of zero-shot image classification benchmarks, including over one percentage point gain on ImageNet. Finally, this method requires no additional training and remains completely zero-shot. Code is available at https://github.com/sarahpratt/CuPL.  ( 2 min )
    Difficulty-Net: Learning to Predict Difficulty for Long-Tailed Recognition. (arXiv:2209.02960v1 [cs.CV])
    Long-tailed datasets, where head classes comprise much more training samples than tail classes, cause recognition models to get biased towards the head classes. Weighted loss is one of the most popular ways of mitigating this issue, and a recent work has suggested that class-difficulty might be a better clue than conventionally used class-frequency to decide the distribution of weights. A heuristic formulation was used in the previous work for quantifying the difficulty, but we empirically find that the optimal formulation varies depending on the characteristics of datasets. Therefore, we propose Difficulty-Net, which learns to predict the difficulty of classes using the model's performance in a meta-learning framework. To make it learn reasonable difficulty of a class within the context of other classes, we newly introduce two key concepts, namely the relative difficulty and the driver loss. The former helps Difficulty-Net take other classes into account when calculating difficulty of a class, while the latter is indispensable for guiding the learning to a meaningful direction. Extensive experiments on popular long-tailed datasets demonstrated the effectiveness of the proposed method, and it achieved state-of-the-art performance on multiple long-tailed datasets.
    Change Detection for Local Explainability in Evolving Data Streams. (arXiv:2209.02764v1 [cs.LG])
    As complex machine learning models are increasingly used in sensitive applications like banking, trading or credit scoring, there is a growing demand for reliable explanation mechanisms. Local feature attribution methods have become a popular technique for post-hoc and model-agnostic explanations. However, attribution methods typically assume a stationary environment in which the predictive model has been trained and remains stable. As a result, it is often unclear how local attributions behave in realistic, constantly evolving settings such as streaming and online applications. In this paper, we discuss the impact of temporal change on local feature attributions. In particular, we show that local attributions can become obsolete each time the predictive model is updated or concept drift alters the data generating distribution. Consequently, local feature attributions in data streams provide high explanatory power only when combined with a mechanism that allows us to detect and respond to local changes over time. To this end, we present CDLEEDS, a flexible and model-agnostic framework for detecting local change and concept drift. CDLEEDS serves as an intuitive extension of attribution-based explanation techniques to identify outdated local attributions and enable more targeted recalculations. In experiments, we also show that the proposed framework can reliably detect both local and global concept drift. Accordingly, our work contributes to a more meaningful and robust explainability in online machine learning.
    DC-MRTA: Decentralized Multi-Robot Task Allocation and Navigation in Complex Environments. (arXiv:2209.02865v1 [cs.RO])
    We present a novel reinforcement learning (RL) based task allocation and decentralized navigation algorithm for mobile robots in warehouse environments. Our approach is designed for scenarios in which multiple robots are used to perform various pick up and delivery tasks. We consider the problem of joint decentralized task allocation and navigation and present a two level approach to solve it. At the higher level, we solve the task allocation by formulating it in terms of Markov Decision Processes and choosing the appropriate rewards to minimize the Total Travel Delay (TTD). At the lower level, we use a decentralized navigation scheme based on ORCA that enables each robot to perform these tasks in an independent manner, and avoid collisions with other robots and dynamic obstacles. We combine these lower and upper levels by defining rewards for the higher level as the feedback from the lower level navigation algorithm. We perform extensive evaluation in complex warehouse layouts with large number of agents and highlight the benefits over state-of-the-art algorithms based on myopic pickup distance minimization and regret-based task selection. We observe improvement up to 14% in terms of task completion time and up-to 40% improvement in terms of computing collision-free trajectories for the robots.  ( 2 min )
    DC-Art-GAN: Stable Procedural Content Generation using DC-GANs for Digital Art. (arXiv:2209.02847v1 [cs.CV])
    Art is an artistic method of using digital technologies as a part of the generative or creative process. With the advent of digital currency and NFTs (Non-Fungible Token), the demand for digital art is growing aggressively. In this manuscript, we advocate the concept of using deep generative networks with adversarial training for a stable and variant art generation. The work mainly focuses on using the Deep Convolutional Generative Adversarial Network (DC-GAN) and explores the techniques to address the common pitfalls in GAN training. We compare various architectures and designs of DC-GANs to arrive at a recommendable design choice for a stable and realistic generation. The main focus of the work is to generate realistic images that do not exist in reality but are synthesised from random noise by the proposed model. We provide visual results of generated animal face images (some pieces of evidence showing a blend of species) along with recommendations for training, architecture and design choices. We also show how training image preprocessing plays a massive role in GAN training.
    Adjusted Asymmetric Accuracy: A Well-Behaving External Cluster Validity Measure. (arXiv:2209.02935v1 [cs.LG])
    There is no, nor will there ever be, single best clustering algorithm, but we would still like to be able to pinpoint those which are well-performing on certain task types and filter out the systematically disappointing ones. Clustering algorithms are traditionally evaluated using either internal or external validity measures. Internal measures quantify different aspects of the obtained partitions, e.g., the average degree of cluster compactness or point separability. Yet, their validity is questionable because the clusterings they promote can sometimes be meaningless. External measures, on the other hand, compare the algorithms' outputs to the reference, ground truth groupings that are provided by experts. The commonly-used classical partition similarity scores, such as the normalised mutual information, Fowlkes-Mallows, or adjusted Rand index, might not possess all the desirable properties, e.g., they do not identify pathological edge cases correctly. Furthermore, they are not nicely interpretable: it is hard to say what a score of 0.8 really means. Its behaviour might also vary as the number of true clusters changes. This makes comparing clustering algorithms across many benchmark datasets difficult. To remedy this, we propose and analyse a new measure: an asymmetric version of the optimal set-matching accuracy. It is corrected for chance and the imbalancedness of cluster sizes.
    Interpretations Steered Network Pruning via Amortized Inferred Saliency Maps. (arXiv:2209.02869v1 [cs.CV])
    Convolutional Neural Networks (CNNs) compression is crucial to deploying these models in edge devices with limited resources. Existing channel pruning algorithms for CNNs have achieved plenty of success on complex models. They approach the pruning problem from various perspectives and use different metrics to guide the pruning process. However, these metrics mainly focus on the model's `outputs' or `weights' and neglect its `interpretations' information. To fill in this gap, we propose to address the channel pruning problem from a novel perspective by leveraging the interpretations of a model to steer the pruning process, thereby utilizing information from both inputs and outputs of the model. However, existing interpretation methods cannot get deployed to achieve our goal as either they are inefficient for pruning or may predict non-coherent explanations. We tackle this challenge by introducing a selector model that predicts real-time smooth saliency masks for pruned models. We parameterize the distribution of explanatory masks by Radial Basis Function (RBF)-like functions to incorporate geometric prior of natural images in our selector model's inductive bias. Thus, we can obtain compact representations of explanations to reduce the computational costs of our pruning method. We leverage our selector model to steer the network pruning by maximizing the similarity of explanatory representations for the pruned and original models. Extensive experiments on CIFAR-10 and ImageNet benchmark datasets demonstrate the efficacy of our proposed method. Our implementations are available at \url{https://github.com/Alii-Ganjj/InterpretationsSteeredPruning}
    MRF-PINN: A Multi-Receptive-Field convolutional physics-informed neural network for solving partial differential equations. (arXiv:2209.03151v1 [cs.LG])
    Physics-informed neural networks (PINN) can achieve lower development and solving cost than traditional partial differential equation (PDE) solvers in scenarios such as reconstructing the physics field and solving the inverse problem. Due to the advantages of parameter sharing, spatial feature extraction and low inference cost, convolutional neural networks (CNN) are increasingly used in PINN. To adapt convolutional PINN to different equations, researchers have to spend much time tuning critical hyperparameters. Furthermore, the effects of finite difference accuracy, model complexity, and mesh resolution on the prediction result of convolutional PINN are unclear. To fill the above research gaps, in this paper, (1) A Multi-Receptive-Field PINN (MRF-PINN) model is constructed to adapt different equation types and mesh resolutions without manual tuning.(2) The generality and advantages of the MRF-PINN are verified in three typical linear PDEs (elliptic, parabolic, hyperbolic) and nonlinear PDEs (Navier-Stokes equations). (3) The contribution of each receptive field to the final MRF-PINN result is analyzed, and the influence of finite difference accuracy, model complexity (channel number) and mesh resolution on the MRF-PINN result is tested. This paper shows that MRF-PINN can adapt to completely different equation types and mesh resolutions without any hyperparameter tuning. Further, the solving error is significantly decreased under high-order finite difference, large channel number, and high mesh resolution, which is expected to become a general convolutional PINN scheme.
    Quadratic Gradient: Uniting Gradient Algorithm and Newton Method as One. (arXiv:2209.03282v1 [math.OC])
    It might be inadequate for the line search technique for Newton's method to use only one floating point number. A column vector of the same size as the gradient might be better than a mere float number to accelerate each of the gradient elements with different rates. Moreover, a square matrix of the same order as the Hessian matrix might be helpful to correct the Hessian matrix. Chiang applied something between a column vector and a square matrix, namely a diagonal matrix, to accelerate the gradient and further proposed a faster gradient variant called quadratic gradient. In this paper, we present a new way to build a new version of the quadratic gradient. This new quadratic gradient doesn't satisfy the convergence conditions of the fixed Hessian Newton's method. However, experimental results show that it sometimes has a better performance than the original one in convergence rate. Also, Chiang speculates that there might be a relation between the Hessian matrix and the learning rate for the first-order gradient descent method. We prove that the floating number $\frac{1}{\epsilon + \max \{| \lambda_i | \}}$ can be a good learning rate of the gradient methods, where $\epsilon$ is a number to avoid division by zero and $\lambda_i$ the eigenvalues of the Hessian matrix.
    Inference and Learning for Generative Capsule Models. (arXiv:2209.03115v1 [cs.LG])
    Capsule networks (see e.g. Hinton et al., 2018) aim to encode knowledge of and reason about the relationship between an object and its parts. In this paper we specify a generative model for such data, and derive a variational algorithm for inferring the transformation of each model object in a scene, and the assignments of observed parts to the objects. We derive a learning algorithm for the object models, based on variational expectation maximization (Jordan et al., 1999). We also study an alternative inference algorithm based on the RANSAC method of Fischler and Bolles (1981). We apply these inference methods to (i) data generated from multiple geometric objects like squares and triangles ("constellations"), and (ii) data from a parts-based model of faces. Recent work by Kosiorek et al. (2019) has used amortized inference via stacked capsule autoencoders (SCAEs) to tackle this problem -- our results show that we significantly outperform them where we can make comparisons (on the constellations data).
    Spatiotemporal Cardiac Statistical Shape Modeling: A Data-Driven Approach. (arXiv:2209.02736v1 [cs.LG])
    Clinical investigations of anatomy's structural changes over time could greatly benefit from population-level quantification of shape, or spatiotemporal statistic shape modeling (SSM). Such a tool enables characterizing patient organ cycles or disease progression in relation to a cohort of interest. Constructing shape models requires establishing a quantitative shape representation (e.g., corresponding landmarks). Particle-based shape modeling (PSM) is a data-driven SSM approach that captures population-level shape variations by optimizing landmark placement. However, it assumes cross-sectional study designs and hence has limited statistical power in representing shape changes over time. Existing methods for modeling spatiotemporal or longitudinal shape changes require predefined shape atlases and pre-built shape models that are typically constructed cross-sectionally. This paper proposes a data-driven approach inspired by the PSM method to learn population-level spatiotemporal shape changes directly from shape data. We introduce a novel SSM optimization scheme that produces landmarks that are in correspondence both across the population (inter-subject) and across time-series (intra-subject). We apply the proposed method to 4D cardiac data from atrial-fibrillation patients and demonstrate its efficacy in representing the dynamic change of the left atrium. Furthermore, we show that our method outperforms an image-based approach for spatiotemporal SSM with respect to a generative time-series model, the Linear Dynamical System (LDS). LDS fit using a spatiotemporal shape model optimized via our approach provides better generalization and specificity, indicating it accurately captures the underlying time-dependency.
    Graph Neural Networks for Low-Energy Event Classification & Reconstruction in IceCube. (arXiv:2209.03042v1 [hep-ex])
    IceCube, a cubic-kilometer array of optical sensors built to detect atmospheric and astrophysical neutrinos between 1 GeV and 1 PeV, is deployed 1.45 km to 2.45 km below the surface of the ice sheet at the South Pole. The classification and reconstruction of events from the in-ice detectors play a central role in the analysis of data from IceCube. Reconstructing and classifying events is a challenge due to the irregular detector geometry, inhomogeneous scattering and absorption of light in the ice and, below 100 GeV, the relatively low number of signal photons produced per event. To address this challenge, it is possible to represent IceCube events as point cloud graphs and use a Graph Neural Network (GNN) as the classification and reconstruction method. The GNN is capable of distinguishing neutrino events from cosmic-ray backgrounds, classifying different neutrino event types, and reconstructing the deposited energy, direction and interaction vertex. Based on simulation, we provide a comparison in the 1-100 GeV energy range to the current state-of-the-art maximum likelihood techniques used in current IceCube analyses, including the effects of known systematic uncertainties. For neutrino event classification, the GNN increases the signal efficiency by 18% at a fixed false positive rate (FPR), compared to current IceCube methods. Alternatively, the GNN offers a reduction of the FPR by over a factor 8 (to below half a percent) at a fixed signal efficiency. For the reconstruction of energy, direction, and interaction vertex, the resolution improves by an average of 13%-20% compared to current maximum likelihood techniques in the energy range of 1-30 GeV. The GNN, when run on a GPU, is capable of processing IceCube events at a rate nearly double of the median IceCube trigger rate of 2.7 kHz, which opens the possibility of using low energy neutrinos in online searches for transient events.
    Annealing Optimization for Progressive Learning with Stochastic Approximation. (arXiv:2209.02826v1 [eess.SY])
    In this work, we introduce a learning model designed to meet the needs of applications in which computational resources are limited, and robustness and interpretability are prioritized. Learning problems can be formulated as constrained stochastic optimization problems, with the constraints originating mainly from model assumptions that define a trade-off between complexity and performance. This trade-off is closely related to over-fitting, generalization capacity, and robustness to noise and adversarial attacks, and depends on both the structure and complexity of the model, as well as the properties of the optimization methods used. We develop an online prototype-based learning algorithm based on annealing optimization that is formulated as an online gradient-free stochastic approximation algorithm. The learning model can be viewed as an interpretable and progressively growing competitive-learning neural network model to be used for supervised, unsupervised, and reinforcement learning. The annealing nature of the algorithm contributes to minimal hyper-parameter tuning requirements, poor local minima prevention, and robustness with respect to the initial conditions. At the same time, it provides online control over the performance-complexity trade-off by progressively increasing the complexity of the learning model as needed, through an intuitive bifurcation phenomenon. Finally, the use of stochastic approximation enables the study of the convergence of the learning algorithm through mathematical tools from dynamical systems and control, and allows for its integration with reinforcement learning algorithms, constructing an adaptive state-action aggregation scheme.
    A Data-dependent Approach for High Dimensional (Robust) Wasserstein Alignment. (arXiv:2209.02905v1 [cs.CV])
    Many real-world problems can be formulated as the alignment between two geometric patterns. Previously, a great amount of research focus on the alignment of 2D or 3D patterns in the field of computer vision. Recently, the alignment problem in high dimensions finds several novel applications in practice. However, the research is still rather limited in the algorithmic aspect. To the best of our knowledge, most existing approaches are just simple extensions of their counterparts for 2D and 3D cases, and often suffer from the issues such as high computational complexities. In this paper, we propose an effective framework to compress the high dimensional geometric patterns. Any existing alignment method can be applied to the compressed geometric patterns and the time complexity can be significantly reduced. Our idea is inspired by the observation that high dimensional data often has a low intrinsic dimension. Our framework is a "data-dependent" approach that has the complexity depending on the intrinsic dimension of the input data. Our experimental results reveal that running the alignment algorithm on compressed patterns can achieve similar qualities, comparing with the results on the original patterns, but the runtimes (including the times cost for compression) are substantially lower.
    Semi-supervised Invertible DeepONets for Bayesian Inverse Problem. (arXiv:2209.02772v1 [stat.ML])
    Deep Operator Networks (DeepONets) offer a powerful, data-driven tool for solving parametric PDEs by learning operators, i.e. maps between infinite-dimensional function spaces. In this work, we employ physics-informed DeepONets in the context of high-dimensional, Bayesian inverse problems. Traditional solution strategies necessitate an enormous, and frequently infeasible, number of forward model solves, as well as the computation of parametric derivatives. In order to enable efficient solutions, we extend DeepONets by employing a realNVP architecture which yields an invertible and differentiable map between the parametric input and the branch net output. This allows us to construct accurate approximations of the full posterior which can be readily adapted irrespective of the number of observations and the magnitude of the observation noise. As a result, no additional forward solves are required, nor is there any need for costly sampling procedures. We demonstrate the efficacy and accuracy of the proposed methodology in the context of inverse problems based on a anti-derivative, a reaction-diffusion and a Darcy-flow equation.
    Quantitative probing: Validating causal models using quantitative domain knowledge. (arXiv:2209.03013v1 [cs.LG])
    We present quantitative probing as a model-agnostic framework for validating causal models in the presence of quantitative domain knowledge. The method is constructed as an analogue of the train/test split in correlation-based machine learning and as an enhancement of current causal validation strategies that are consistent with the logic of scientific discovery. The effectiveness of the method is illustrated using Pearl's sprinkler example, before a thorough simulation-based investigation is conducted. Limits of the technique are identified by studying exemplary failing scenarios, which are furthermore used to propose a list of topics for future research and improvements of the presented version of quantitative probing. The code for integrating quantitative probing into causal analysis, as well as the code for the presented simulation-based studies of the effectiveness of quantitative probing is provided in two separate open-source Python packages.
    Grouping-matrix based Graph Pooling with Adaptive Number of Clusters. (arXiv:2209.02939v1 [cs.AI])
    Graph pooling is a crucial operation for encoding hierarchical structures within graphs. Most existing graph pooling approaches formulate the problem as a node clustering task which effectively captures the graph topology. Conventional methods ask users to specify an appropriate number of clusters as a hyperparameter, then assume that all input graphs share the same number of clusters. In inductive settings where the number of clusters can vary, however, the model should be able to represent this variation in its pooling layers in order to learn suitable clusters. Thus we propose GMPool, a novel differentiable graph pooling architecture that automatically determines the appropriate number of clusters based on the input data. The main intuition involves a grouping matrix defined as a quadratic form of the pooling operator, which induces use of binary classification probabilities of pairwise combinations of nodes. GMPool obtains the pooling operator by first computing the grouping matrix, then decomposing it. Extensive evaluations on molecular property prediction tasks demonstrate that our method outperforms conventional methods.
    Read it to me: An emotionally aware Speech Narration Application. (arXiv:2209.02785v1 [cs.SD])
    In this work we try to perform emotional style transfer on audios. In particular, MelGAN-VC architecture is explored for various emotion-pair transfers. The generated audio is then classified using an LSTM-based emotion classifier for audio. We find that "sad" audio is generated well as compared to "happy" or "anger" as people have similar expressions of sadness.
    Efficient Implementation of Non-linear Flow Law Using Neural Network into the Abaqus Explicit FEM code. (arXiv:2209.03190v1 [cs.CE])
    Machine learning techniques are increasingly used to predict material behavior in scientific applications and offer a significant advantage over conventional numerical methods. In this work, an Artificial Neural Network (ANN) model is used in a finite element formulation to define the flow law of a metallic material as a function of plastic strain, plastic strain rate and temperature. First, we present the general structure of the neural network, its operation and focus on the ability of the network to deduce, without prior learning, the derivatives of the flow law with respect to the model inputs. In order to validate the robustness and accuracy of the proposed model, we compare and analyze the performance of several network architectures with respect to the analytical formulation of a Johnson-Cook behavior law for a 42CrMo4 steel. In a second part, after having selected an Artificial Neural Network architecture with $2$ hidden layers, we present the implementation of this model in the Abaqus Explicit computational code in the form of a VUHARD subroutine. The predictive capability of the proposed model is then demonstrated during the numerical simulation of two test cases: the necking of a circular bar and a Taylor impact test. The results obtained show a very high capability of the ANN to replace the analytical formulation of a Johnson-Cook behavior law in a finite element code, while remaining competitive in terms of numerical simulation time compared to a classical approach.
    Solving Elliptic Problems with Singular Sources using Singularity Splitting Deep Ritz Method. (arXiv:2209.02931v1 [math.NA])
    In this work, we develop an efficient solver based on deep neural networks for the Poisson equation with variable coefficients and singular sources expressed by the Dirac delta function $\delta(\mathbf{x})$. This class of problems covers general point sources, line sources and point-line combinations, and has a broad range of practical applications. The proposed approach is based on decomposing the true solution into a singular part that is known analytically using the fundamental solution of the Laplace equation and a regular part that satisfies a suitable elliptic PDE with smoother sources, and then solving for the regular part using the deep Ritz method. A path-following strategy is suggested to select the penalty parameter for penalizing the Dirichlet boundary condition. Extensive numerical experiments in two- and multi-dimensional spaces with point sources, line sources or their combinations are presented to illustrate the efficiency of the proposed approach, and a comparative study with several existing approaches is also given, which shows clearly its competitiveness for the specific class of problems. In addition, we briefly discuss the error analysis of the approach.
    Scalable Regularization of Scene Graph Generation Models using Symbolic Theories. (arXiv:2209.02749v1 [cs.LG])
    Several techniques have recently aimed to improve the performance of deep learning models for Scene Graph Generation (SGG) by incorporating background knowledge. State-of-the-art techniques can be divided into two families: one where the background knowledge is incorporated into the model in a subsymbolic fashion, and another in which the background knowledge is maintained in symbolic form. Despite promising results, both families of techniques face several shortcomings: the first one requires ad-hoc, more complex neural architectures increasing the training or inference cost; the second one suffers from limited scalability w.r.t. the size of the background knowledge. Our work introduces a regularization technique for injecting symbolic background knowledge into neural SGG models that overcomes the limitations of prior art. Our technique is model-agnostic, does not incur any cost at inference time, and scales to previously unmanageable background knowledge sizes. We demonstrate that our technique can improve the accuracy of state-of-the-art SGG models, by up to 33%.
    Modular Federated Learning. (arXiv:2209.03090v1 [cs.LG])
    Federated learning is an approach to train machine learning models on the edge of the networks, as close as possible where the data is produced, motivated by the emerging problem of the inability to stream and centrally store the large amount of data produced by edge devices as well as by data privacy concerns. This learning paradigm is in need of robust algorithms to device heterogeneity and data heterogeneity. This paper proposes ModFL as a federated learning framework that splits the models into a configuration module and an operation module enabling federated learning of the individual modules. This modular approach makes it possible to extract knowlege from a group of heterogeneous devices as well as from non-IID data produced from its users. This approach can be viewed as an extension of the federated learning with personalisation layers FedPer framework that addresses data heterogeneity. We show that ModFL outperforms FedPer for non-IID data partitions of CIFAR-10 and STL-10 using CNNs. Our results on time-series data with HAPT, RWHAR, and WISDM datasets using RNNs remain inconclusive, we argue that the chosen datasets do not highlight the advantages of ModFL, but in the worst case scenario it performs as well as FedPer.
    A Data Science Approach to Risk Assessment for Automobile Insurance Policies. (arXiv:2209.02762v1 [cs.LG])
    In order to determine a suitable automobile insurance policy premium one needs to take into account three factors, the risk associated with the drivers and cars on the policy, the operational costs associated with management of the policy and the desired profit margin. The premium should then be some function of these three values. We focus on risk assessment using a Data Science approach. Instead of using the traditional frequency and severity metrics we instead predict the total claims that will be made by a new customer using historical data of current and past policies. Given multiple features of the policy (age and gender of drivers, value of car, previous accidents, etc.) one can potentially try to provide personalized insurance policies based specifically on these features as follows. We can compute the average claims made per year of all past and current policies with identical features and then take an average over these claim rates. Unfortunately there may not be sufficient samples to obtain a robust average. We can instead try to include policies that are "similar" to obtain sufficient samples for a robust average. We therefore face a trade-off between personalization (only using closely similar policies) and robustness (extending the domain far enough to capture sufficient samples). This is known as the Bias-Variance Trade-off. We model this problem and determine the optimal trade-off between the two (i.e. the balance that provides the highest prediction accuracy) and apply it to the claim rate prediction problem. We demonstrate our approach using real data.
    Multi-Scale Attention-based Multiple Instance Learning for Classification of Multi-Gigapixel Histology Images. (arXiv:2209.03041v1 [cs.CV])
    Histology images with multi-gigapixel of resolution yield rich information for cancer diagnosis and prognosis. Most of the time, only slide-level label is available because pixel-wise annotation is labour intensive task. In this paper, we propose a deep learning pipeline for classification in histology images. Using multiple instance learning, we attempt to predict the latent membrane protein 1 (LMP1) status of nasopharyngeal carcinoma (NPC) based on haematoxylin and eosin-stain (H&E) histology images. We utilised attention mechanism with residual connection for our aggregation layers. In our 3-fold cross-validation experiment, we achieved average accuracy, AUC and F1-score 0.936, 0.995 and 0.862, respectively. This method also allows us to examine the model interpretability by visualising attention scores. To the best of our knowledge, this is the first attempt to predict LMP1 status on NPC using deep learning.
    CP-AGCN: Pytorch-based Attention Informed Graph Convolutional Network for Identifying Infants at Risk of Cerebral Palsy. (arXiv:2209.02824v1 [cs.CV])
    Early prediction is clinically considered one of the essential parts of cerebral palsy (CP) treatment. We propose to implement a low-cost and interpretable classification system for supporting CP prediction based on General Movement Assessment (GMA). We design a Pytorch-based attention-informed graph convolutional network to early identify infants at risk of CP from skeletal data extracted from RGB videos. We also design a frequency-binning module for learning the CP movements in the frequency domain while filtering noise. Our system only requires consumer-grade RGB videos for training to support interactive-time CP prediction by providing an interpretable CP classification result.
    The missing link: Developing a safety case for perception components in automated driving. (arXiv:2108.13294v4 [cs.LG] UPDATED)
    Safety assurance is a central concern for the development and societal acceptance of automated driving (AD) systems. Perception is a key aspect of AD that relies heavily on Machine Learning (ML). Despite the known challenges with the safety assurance of ML-based components, proposals have recently emerged for unit-level safety cases addressing these components. Unfortunately, AD safety cases express safety requirements at the system level and these efforts are missing the critical linking argument needed to integrate safety requirements at the system level with component performance requirements at the unit level. In this paper, we propose the Integration Safety Case for Perception (ISCaP), a generic template for such a linking safety argument specifically tailored for perception components. The template takes a deductive and formal approach to define strong traceability between levels. We demonstrate the applicability of ISCaP with a detailed case study and discuss its use as a tool to support incremental development of perception components.
  • Open

    An online learning approach to dynamic pricing and capacity sizing in service systems. (arXiv:2009.02911v3 [math.PR] UPDATED)
    We study a dynamic pricing and capacity sizing problem in a $GI/GI/1$ queue, where the service provider's objective is to obtain the optimal service fee $p$ and service capacity $\mu$ so as to maximize the cumulative expected profit (the service revenue minus the staffing cost and delay penalty). Due to the complex nature of the queueing dynamics, such a problem has no analytic solution so that previous research often resorts to heavy-traffic analysis where both the arrival rate and service rate are sent to infinity. In this work we propose an online learning framework designed for solving this problem which does not require the system's scale to increase. Our framework is dubbed Gradient-based Online Learning in Queue (GOLiQ). GOLiQ organizes the time horizon into successive operational cycles and prescribes an efficient procedure to obtain improved pricing and staffing policies in each cycle using data collected in previous cycles. Data here include the number of customer arrivals, waiting times, and the server's busy times. The ingenuity of this approach lies in its online nature, which allows the service provider do better by interacting with the environment. Effectiveness of GOLiQ is substantiated by (i) theoretical results including the algorithm convergence and regret analysis (with a logarithmic regret bound), and (ii) engineering confirmation via simulation experiments of a variety of representative $GI/GI/1$ queues.
    Generative Principal Component Analysis. (arXiv:2203.09693v2 [stat.ML] UPDATED)
    In this paper, we study the problem of principal component analysis with generative modeling assumptions, adopting a general model for the observed matrix that encompasses notable special cases, including spiked matrix recovery and phase retrieval. The key assumption is that the underlying signal lies near the range of an $L$-Lipschitz continuous generative model with bounded $k$-dimensional inputs. We propose a quadratic estimator, and show that it enjoys a statistical rate of order $\sqrt{\frac{k\log L}{m}}$, where $m$ is the number of samples. We also provide a near-matching algorithm-independent lower bound. Moreover, we provide a variant of the classic power method, which projects the calculated data onto the range of the generative model during each iteration. We show that under suitable conditions, this method converges exponentially fast to a point achieving the above-mentioned statistical rate. We perform experiments on various image datasets for spiked matrix and phase retrieval models, and illustrate performance gains of our method to the classic power method and the truncated power method devised for sparse principal component analysis.
    Reactmine: a search algorithm for inferring chemical reaction networks from time series data. (arXiv:2209.03185v1 [q-bio.QM])
    Inferring chemical reaction networks (CRN) from time series data is a challenge encouraged by the growing availability of quantitative temporal data at the cellular level. This motivates the design of algorithms to infer the preponderant reactions between the molecular species observed in a given biochemical process, and help to build CRN model structure and kinetics. Existing ODE-based inference methods such as SINDy resort to least square regression combined with sparsity-enforcing penalization, such as Lasso. However, when the input time series are only available in wild type conditions in which all reactions are present, we observe that current methods fail to learn sparse models. Results: We present Reactmine, a CRN learning algorithm which enforces sparsity by inferring reactions in a sequential fashion within a search tree of bounded depth, ranking the inferred reaction candidates according to the variance of their kinetics, and re-optimizing the CRN kinetic parameters on the whole trace in a final pass to rank the inferred CRN candidates. We first evaluate its performance on simulation data from a benchmark of hidden CRNs, together with algorithmic hyperparameter sensitivity analyses, and then on two sets of real experimental data: one from protein fluorescence videomicroscopy of cell cycle and circadian clock markers, and one from biomedical measurements of systemic circadian biomarkers possibly acting on clock gene expression in peripheral organs. We show that Reactmine succeeds both on simulation data by retrieving hidden CRNs where SINDy fails, and on the two real datasets by inferring reactions in agreement with previous studies.
    Out of Distribution Detection, Generalization, and Robustness Triangle with Maximum Probability Theorem. (arXiv:2203.12145v2 [cs.LG] UPDATED)
    Maximum Probability Framework, powered by Maximum Probability Theorem, is a recent theoretical development in artificial intelligence, aiming to formally define probabilistic models, guiding development of objective functions, and regularization of probabilistic models. MPT uses the probability distribution that the models assume on random variables to provide an upper bound on the probability of the model. We apply MPT to challenging out-of-distribution (OOD) detection problems in computer vision by incorporating MPT as a regularization scheme in the training of CNNs and their energy-based variants. We demonstrate the effectiveness of the proposed method on 1080 trained models, with varying hyperparameters, and conclude that the MPT-based regularization strategy stabilizes and improves the generalization and robustness of base models in addition to enhanced OOD performance on CIFAR10, CIFAR100, and MNIST datasets.
    On the Convergence of the ELBO to Entropy Sums. (arXiv:2209.03077v1 [stat.ML])
    The variational lower bound (a.k.a. ELBO or free energy) is the central objective for many learning algorithms including algorithms for deep unsupervised learning. Learning algorithms change model parameters such that the variational lower bound increases, and until the parameters are close to a stationary point of the learning dynamics. In this purely theoretical contribution, we show that (for a very large class of generative models) the variational lower bound is at all stationary points of learning equal to a sum of entropies. For models with one set of latents and one set observed variables, the sum consists of three entropies: (A) the (average) entropy of the variational distributions, (B) the negative entropy of the model's prior distribution, and (C) the (expected) negative entropy of the observable distributions. The obtained result applies under realistic conditions including: finite numbers of data points, at any stationary points (including saddle points) and for any family of (well behaved) variational distributions. The class of generative models for which we show the equality to entropy sums contains many (and presumably most) standard generative models (including deep models). As concrete examples we discuss probabilistic PCA and Sigmoid Belief Networks. The prerequisites we use to show equality to entropy sums are relatively mild. Concretely, the distributions of a given generative model have to be of the exponential family (with constant base measure), and a model has to satisfy a parameterization criterion (which is usually fulfilled). Proving the equality of the ELBO to entropy sums at stationary points (under the stated conditions) is the main contribution of this work.
    Manifold Free Riemannian Optimization. (arXiv:2209.03269v1 [math.OC])
    Riemannian optimization is a principled framework for solving optimization problems where the desired optimum is constrained to a smooth manifold $\mathcal{M}$. Algorithms designed in this framework usually require some geometrical description of the manifold, which typically includes tangent spaces, retractions, and gradients of the cost function. However, in many cases, only a subset (or none at all) of these elements can be accessed due to lack of information or intractability. In this paper, we propose a novel approach that can perform approximate Riemannian optimization in such cases, where the constraining manifold is a submanifold of $\R^{D}$. At the bare minimum, our method requires only a noiseless sample set of the cost function $(\x_{i}, y_{i})\in {\mathcal{M}} \times \mathbb{R}$ and the intrinsic dimension of the manifold $\mathcal{M}$. Using the samples, and utilizing the Manifold-MLS framework (Sober and Levin 2020), we construct approximations of the missing components entertaining provable guarantees and analyze their computational costs. In case some of the components are given analytically (e.g., if the cost function and its gradient are given explicitly, or if the tangent spaces can be computed), the algorithm can be easily adapted to use the accurate expressions instead of the approximations. We analyze the global convergence of Riemannian gradient-based methods using our approach, and we demonstrate empirically the strength of this method, together with a conjugate-gradients type method based upon similar principles.  ( 3 min )
    EGMM: an Evidential Version of the Gaussian Mixture Model for Clustering. (arXiv:2010.01333v3 [cs.LG] UPDATED)
    The Gaussian mixture model (GMM) provides a simple yet principled framework for clustering, with properties suitable for statistical inference. In this paper, we propose a new model-based clustering algorithm, called EGMM (evidential GMM), in the theoretical framework of belief functions to better characterize cluster-membership uncertainty. With a mass function representing the cluster membership of each object, the evidential Gaussian mixture distribution composed of the components over the powerset of the desired clusters is proposed to model the entire dataset. The parameters in EGMM are estimated by a specially designed Expectation-Maximization (EM) algorithm. A validity index allowing automatic determination of the proper number of clusters is also provided. The proposed EGMM is as simple as the classical GMM, but can generate a more informative evidential partition for the considered dataset. The synthetic and real dataset experiments show that the proposed EGMM performs better than other representative clustering algorithms. Besides, its superiority is also demonstrated by an application to multi-modal brain image segmentation.  ( 2 min )
    The Role of ImageNet Classes in Fr\'echet Inception Distance. (arXiv:2203.06026v2 [cs.CV] UPDATED)
    Fr\'echet Inception Distance (FID) is the primary metric for ranking models in data-driven generative modeling. While remarkably successful, the metric is known to sometimes disagree with human judgement. We investigate a root cause of these discrepancies, and visualize what FID "looks at" in generated images. We show that the feature space that FID is (typically) computed in is so close to the ImageNet classifications that aligning the histograms of Top-$N$ classifications between sets of generated and real images can reduce FID substantially -- without actually improving the quality of results. Thus we conclude that FID is prone to intentional or accidental distortions. As a practical example of an accidental distortion, we discuss a case where an ImageNet pre-trained FastGAN achieves a FID comparable to StyleGAN2, while being worse in terms of human evaluation  ( 2 min )
    Manifold Oblique Random Forests: Towards Closing the Gap on Convolutional Deep Networks. (arXiv:1909.11799v5 [cs.LG] UPDATED)
    Decision forests (Forests), in particular random forests and gradient boosting trees, have demonstrated state-of-the-art accuracy compared to other methods in many supervised learning scenarios. In particular, Forests dominate other methods in tabular data, that is, when the feature space is unstructured, so that the signal is invariant to a permutation of the feature indices. However, in structured data lying on a manifold (such as images, text, and speech) deep networks (Networks), specifically convolutional deep networks (ConvNets), tend to outperform Forests. We conjecture that at least part of the reason for this is that the input to Networks is not simply the feature magnitudes, but also their indices. In contrast, naive Forest implementations fail to explicitly consider feature indices. A recently proposed Forest approach demonstrates that Forests, for each node, implicitly sample a random matrix from some specific distribution. These Forests, like some classes of Networks, learn by partitioning the feature space into convex polytopes corresponding to linear functions. We build on that approach and show that one can choose distributions in a manifold-aware fashion to incorporate feature locality. We demonstrate the empirical performance on data whose features live on three different manifolds: a torus, images, and time-series. Moreover, we demonstrate its strength in multivariate simulated settings and also show superiority in predicting surgical outcome in epilepsy patients and predicting movement direction from raw stereotactic EEG data from non-motor brain regions. In all simulations and real data, Manifold Oblique Random Forest (MORF) algorithm outperforms approaches that ignore feature space structure and challenges the performance of ConvNets. Moreover, MORF runs fast and maintains interpretability and theoretical justification.  ( 3 min )
    Dual Instrumental Method for Confounded Kernelized Bandits. (arXiv:2209.03224v1 [cs.LG])
    The contextual bandit problem is a theoretically justified framework with wide applications in various fields. While the previous study on this problem usually requires independence between noise and contexts, our work considers a more sensible setting where the noise becomes a latent confounder that affects both contexts and rewards. Such a confounded setting is more realistic and could expand to a broader range of applications. However, the unresolved confounder will cause a bias in reward function estimation and thus lead to a large regret. To deal with the challenges brought by the confounder, we apply the dual instrumental variable regression, which can correctly identify the true reward function. We prove the convergence rate of this method is near-optimal in two types of widely used reproducing kernel Hilbert spaces. Therefore, we can design computationally efficient and regret-optimal algorithms based on the theoretical guarantees for confounded bandit problems. The numerical results illustrate the efficacy of our proposed algorithms in the confounded bandit setting.  ( 2 min )
    Machine Learning Partners in Criminal Networks. (arXiv:2209.03171v1 [physics.soc-ph])
    Recent research has shown that criminal networks have complex organizational structures, but whether this can be used to predict static and dynamic properties of criminal networks remains little explored. Here, by combining graph representation learning and machine learning methods, we show that structural properties of political corruption, police intelligence, and money laundering networks can be used to recover missing criminal partnerships, distinguish among different types of criminal and legal associations, as well as predict the total amount of money exchanged among criminal agents, all with outstanding accuracy. We also show that our approach can anticipate future criminal associations during the dynamic growth of corruption networks with significant accuracy. Thus, similar to evidence found at crime scenes, we conclude that structural patterns of criminal networks carry crucial information about illegal activities, which allows machine learning methods to predict missing information and even anticipate future criminal behavior.  ( 2 min )
    Semiparametric discrete data regression with Monte Carlo inference and prediction. (arXiv:2110.12316v4 [stat.ME] UPDATED)
    Discrete data are abundant and often arise as counts or rounded data. These data commonly exhibit complex distributional features such as zero-inflation, over- or under-dispersion, boundedness, and heaping, which render many parametric models inadequate. Yet even for parametric regression models, conjugate priors and closed-form posteriors are typically unavailable, which necessitates approximations such as MCMC for posterior inference. This paper introduces a Bayesian modeling and algorithmic framework that enables semiparametric regression analysis for discrete data with Monte Carlo (not MCMC) sampling. The proposed approach pairs a nonparametric marginal model with a latent linear regression model to encourage both flexibility and interpretability, and delivers posterior consistency even under model misspecification. For a parametric or large-sample approximation of this model, we identify a class of conjugate priors with (pseudo) closed-form posteriors. All posterior and predictive distributions are available analytically or via Monte Carlo sampling. These tools are broadly useful for linear regression, nonlinear models via basis expansions, and variable selection with discrete data. Simulation studies demonstrate significant advantages in computing, prediction, estimation, and selection relative to existing alternatives. This novel approach is applied to self-reported mental health data that exhibit zero-inflation, overdispersion, boundedness, and heaping.  ( 3 min )
    Composite Spatial Monte Carlo Integration Based on Generalized Least Squares. (arXiv:2204.03248v2 [stat.CO] UPDATED)
    Although evaluation of the expectations on the Ising model is essential in various applications, it is mostly infeasible because of intractable multiple summations. Spatial Monte Carlo integration (SMCI) is a sampling-based approximation. It can provide high-accuracy estimations for such intractable expectations. To evaluate the expectation of a function of variables in a specific region (called target region), SMCI considers a larger region containing the target region (called sum region). In SMCI, the multiple summation for the variables in the sum region is precisely executed, and that in the outer region is evaluated by the sampling approximation such as the standard Monte Carlo integration. It is guaranteed that the accuracy of the SMCI estimator improves monotonically as the size of the sum region increases. However, a haphazard expansion of the sum region could cause a combinatorial explosion. Therefore, we hope to improve the accuracy without such an expansion. In this paper, based on the theory of generalized least squares (GLS), a new effective method is proposed by combining multiple SMCI estimators. The validity of the proposed method is demonstrated theoretically and numerically. The results indicate that the proposed method can be effective in the inverse Ising problem (or Boltzmann machine learning).  ( 3 min )
    An Assessment Tool for Academic Research Managers in the Third World. (arXiv:2209.03199v1 [econ.EM])
    The academic evaluation of the publication record of researchers is relevant for identifying talented candidates for promotion and funding. A key tool for this is the use of the indexes provided by Web of Science and SCOPUS, costly databases that sometimes exceed the possibilities of academic institutions in many parts of the world. We show here how the data in one of the bases can be used to infer the main index of the other one. Methods of data analysis used in Machine Learning allow us to select just a few of the hundreds of variables in a database, which later are used in a panel regression, yielding a good approximation to the main index in the other database. Since the information of SCOPUS can be freely scraped from the Web, this approach allows to infer for free the Impact Factor of publications, the main index used in research assessments around the globe.  ( 2 min )
    A spectral least-squares-type method for heavy-tailed corrupted regression with unknown covariance \& heterogeneous noise. (arXiv:2209.02856v1 [math.ST])
    We revisit heavy-tailed corrupted least-squares linear regression assuming to have a corrupted $n$-sized label-feature sample of at most $\epsilon n$ arbitrary outliers. We wish to estimate a $p$-dimensional parameter $b^*$ given such sample of a label-feature pair $(y,x)$ satisfying $y=\langle x,b^*\rangle+\xi$ with heavy-tailed $(x,\xi)$. We only assume $x$ is $L^4-L^2$ hypercontractive with constant $L>0$ and has covariance matrix $\Sigma$ with minimum eigenvalue $1/\mu^2>0$ and bounded condition number $\kappa>0$. The noise $\xi$ can be arbitrarily dependent on $x$ and nonsymmetric as long as $\xi x$ has finite covariance matrix $\Xi$. We propose a near-optimal computationally tractable estimator, based on the power method, assuming no knowledge on $(\Sigma,\Xi)$ nor the operator norm of $\Xi$. With probability at least $1-\delta$, our proposed estimator attains the statistical rate $\mu^2\Vert\Xi\Vert^{1/2}(\frac{p}{n}+\frac{\log(1/\delta)}{n}+\epsilon)^{1/2}$ and breakdown-point $\epsilon\lesssim\frac{1}{L^4\kappa^2}$, both optimal in the $\ell_2$-norm, assuming the near-optimal minimum sample size $L^4\kappa^2(p\log p + \log(1/\delta))\lesssim n$, up to a log factor. To the best of our knowledge, this is the first computationally tractable algorithm satisfying simultaneously all the mentioned properties. Our estimator is based on a two-stage Multiplicative Weight Update algorithm. The first stage estimates a descent direction $\hat v$ with respect to the (unknown) pre-conditioned inner product $\langle\Sigma(\cdot),\cdot\rangle$. The second stage estimate the descent direction $\Sigma\hat v$ with respect to the (known) inner product $\langle\cdot,\cdot\rangle$, without knowing nor estimating $\Sigma$.  ( 3 min )
    Bayesian learning of feature spaces for multitasks problems. (arXiv:2209.03028v1 [stat.ML])
    This paper presents a Bayesian framework to construct non-linear, parsimonious, shallow models for multitask regression. The proposed framework relies on the fact that Random Fourier Features (RFFs) enables the approximation of an RBF kernel by an extreme learning machine whose hidden layer is formed by RFFs. The main idea is to combine both dual views of a same model under a single Bayesian formulation that extends the Sparse Bayesian Extreme Learning Machines to multitask problems. From the kernel methods point of view, the proposed formulation facilitates the introduction of prior domain knowledge through the RBF kernel parameter. From the extreme learning machines perspective, the new formulation helps control overfitting and enables a parsimonious overall model (the models that serve each task share a same set of RFFs selected within the joint Bayesian optimisation). The experimental results show that combining advantages from kernel methods and extreme learning machines within the same framework can lead to significant improvements in the performance achieved by each of these two paradigms independently.  ( 2 min )
    $1D$ to $nD$: A Meta Algorithm for Multivariate Global Optimization via Univariate Optimizers. (arXiv:2209.03246v1 [math.OC])
    In this work, we propose a meta algorithm that can solve a multivariate global optimization problem using univariate global optimizers. Although the univariate global optimization does not receive much attention compared to the multivariate case, which is more emphasized in academia and industry; we show that it is still relevant and can be directly used to solve problems of multivariate optimization. We also provide the corresponding regret bounds in terms of the time horizon $T$ and the average regret of the univariate optimizer, when it is robust against nonnegative noises with robust regret guarantees.  ( 2 min )
    Change Detection for Local Explainability in Evolving Data Streams. (arXiv:2209.02764v1 [cs.LG])
    As complex machine learning models are increasingly used in sensitive applications like banking, trading or credit scoring, there is a growing demand for reliable explanation mechanisms. Local feature attribution methods have become a popular technique for post-hoc and model-agnostic explanations. However, attribution methods typically assume a stationary environment in which the predictive model has been trained and remains stable. As a result, it is often unclear how local attributions behave in realistic, constantly evolving settings such as streaming and online applications. In this paper, we discuss the impact of temporal change on local feature attributions. In particular, we show that local attributions can become obsolete each time the predictive model is updated or concept drift alters the data generating distribution. Consequently, local feature attributions in data streams provide high explanatory power only when combined with a mechanism that allows us to detect and respond to local changes over time. To this end, we present CDLEEDS, a flexible and model-agnostic framework for detecting local change and concept drift. CDLEEDS serves as an intuitive extension of attribution-based explanation techniques to identify outdated local attributions and enable more targeted recalculations. In experiments, we also show that the proposed framework can reliably detect both local and global concept drift. Accordingly, our work contributes to a more meaningful and robust explainability in online machine learning.  ( 3 min )
    Plant Species Classification Using Transfer Learning by Pretrained Classifier VGG-19. (arXiv:2209.03076v1 [cs.CV])
    Deep learning is currently the most important branch of machine learning, with applications in speech recognition, computer vision, image classification, and medical imaging analysis. Plant recognition is one of the areas where image classification can be used to identify plant species through their leaves. Botanists devote a significant amount of time to recognizing plant species by personally inspecting. This paper describes a method for dissecting color images of Swedish leaves and identifying plant species. To achieve higher accuracy, the task is completed using transfer learning with the help of pre-trained classifier VGG-19. The four primary processes of classification are image preprocessing, image augmentation, feature extraction, and recognition, which are performed as part of the overall model evaluation. The VGG-19 classifier grasps the characteristics of leaves by employing pre-defined hidden layers such as convolutional layers, max pooling layers, and fully connected layers, and finally uses the soft-max layer to generate a feature representation for all plant classes. The model obtains knowledge connected to aspects of the Swedish leaf dataset, which contains fifteen tree classes, and aids in predicting the proper class of an unknown plant with an accuracy of 99.70% which is higher than previous research works reported.  ( 3 min )
    A Zeroth-Order Momentum Method for Risk-Averse Online Convex Games. (arXiv:2209.02838v1 [cs.LG])
    We consider risk-averse learning in repeated unknown games where the goal of the agents is to minimize their individual risk of incurring significantly high cost. Specifically, the agents use the conditional value at risk (CVaR) as a risk measure and rely on bandit feedback in the form of the cost values of the selected actions at every episode to estimate their CVaR values and update their actions. A major challenge in using bandit feedback to estimate CVaR is that the agents can only access their own cost values, which, however, depend on the actions of all agents. To address this challenge, we propose a new risk-averse learning algorithm with momentum that utilizes the full historical information on the cost values. We show that this algorithm achieves sub-linear regret and matches the best known algorithms in the literature. We provide numerical experiments for a Cournot game that show that our method outperforms existing methods.  ( 2 min )
    Understanding microbiome dynamics via interpretable graph representation learning. (arXiv:2203.01830v2 [q-bio.QM] UPDATED)
    Large-scale perturbations in the microbiome constitution are strongly correlated, whether as a driver or a consequence, with the health and functioning of human physiology. However, understanding the difference in the microbiome profiles of healthy and ill individuals can be complicated due to the large number of complex interactions among microbes. We propose to model these interactions as a time-evolving graph whose nodes are microbes and edges are interactions among them. Motivated by the need to analyse such complex interactions, we develop a method that learns a low-dimensional representation of the time-evolving graph and maintains the dynamics occurring in the high-dimensional space. Through our experiments, we show that we can extract graph features such as clusters of nodes or edges that have the highest impact on the model to learn the low-dimensional representation. This information can be crucial to identify microbes and interactions among them that are strongly correlated with clinical diseases. We conduct our experiments on both synthetic and real-world microbiome datasets.  ( 2 min )
    Non-Gaussian Process Regression. (arXiv:2209.03117v1 [stat.ML])
    Standard GPs offer a flexible modelling tool for well-behaved processes. However, deviations from Gaussianity are expected to appear in real world datasets, with structural outliers and shocks routinely observed. In these cases GPs can fail to model uncertainty adequately and may over-smooth inferences. Here we extend the GP framework into a new class of time-changed GPs that allow for straightforward modelling of heavy-tailed non-Gaussian behaviours, while retaining a tractable conditional GP structure through an infinite mixture of non-homogeneous GPs representation. The conditional GP structure is obtained by conditioning the observations on a latent transformed input space and the random evolution of the latent transformation is modelled using a L\'{e}vy process which allows Bayesian inference in both the posterior predictive density and the latent transformation function. We present Markov chain Monte Carlo inference procedures for this model and demonstrate the potential benefits compared to a standard GP.  ( 2 min )
    Semi-supervised Invertible DeepONets for Bayesian Inverse Problem. (arXiv:2209.02772v1 [stat.ML])
    Deep Operator Networks (DeepONets) offer a powerful, data-driven tool for solving parametric PDEs by learning operators, i.e. maps between infinite-dimensional function spaces. In this work, we employ physics-informed DeepONets in the context of high-dimensional, Bayesian inverse problems. Traditional solution strategies necessitate an enormous, and frequently infeasible, number of forward model solves, as well as the computation of parametric derivatives. In order to enable efficient solutions, we extend DeepONets by employing a realNVP architecture which yields an invertible and differentiable map between the parametric input and the branch net output. This allows us to construct accurate approximations of the full posterior which can be readily adapted irrespective of the number of observations and the magnitude of the observation noise. As a result, no additional forward solves are required, nor is there any need for costly sampling procedures. We demonstrate the efficacy and accuracy of the proposed methodology in the context of inverse problems based on a anti-derivative, a reaction-diffusion and a Darcy-flow equation.  ( 2 min )
    On the Sparse DAG Structure Learning Based on Adaptive Lasso. (arXiv:2209.02946v1 [stat.ML])
    Learning the underlying casual structure, represented by Directed Acyclic Graphs (DAGs), of concerned events from fully-observational data is a crucial part of causal reasoning, but it is challenging due to the combinatorial and large search space. A recent flurry of developments recast this combinatorial problem into a continuous optimization problem by leveraging an algebraic equality characterization of acyclicity. However, these methods suffer from the fixed-threshold step after optimization, which is not a flexible and systematic way to rule out the cycle-inducing edges or false discoveries edges with small values caused by numerical precision. In this paper, we develop a data-driven DAG structure learning method without the predefined threshold, called adaptive NOTEARS [30], achieved by applying adaptive penalty levels to each parameters in the regularization term. We show that adaptive NOTEARS enjoys the oracle properties under some specific conditions. Furthermore, simulation experimental results validate the effectiveness of our method, without setting any gap of edges weights around zero.  ( 2 min )
    Adjusted Asymmetric Accuracy: A Well-Behaving External Cluster Validity Measure. (arXiv:2209.02935v1 [cs.LG])
    There is no, nor will there ever be, single best clustering algorithm, but we would still like to be able to pinpoint those which are well-performing on certain task types and filter out the systematically disappointing ones. Clustering algorithms are traditionally evaluated using either internal or external validity measures. Internal measures quantify different aspects of the obtained partitions, e.g., the average degree of cluster compactness or point separability. Yet, their validity is questionable because the clusterings they promote can sometimes be meaningless. External measures, on the other hand, compare the algorithms' outputs to the reference, ground truth groupings that are provided by experts. The commonly-used classical partition similarity scores, such as the normalised mutual information, Fowlkes-Mallows, or adjusted Rand index, might not possess all the desirable properties, e.g., they do not identify pathological edge cases correctly. Furthermore, they are not nicely interpretable: it is hard to say what a score of 0.8 really means. Its behaviour might also vary as the number of true clusters changes. This makes comparing clustering algorithms across many benchmark datasets difficult. To remedy this, we propose and analyse a new measure: an asymmetric version of the optimal set-matching accuracy. It is corrected for chance and the imbalancedness of cluster sizes.  ( 2 min )

  • Open

    [P] Stable Diffusion web UI with Outpainting, Inpainting, Prompt matrix, Upscale, Textual Inversion and many more features
    Stable Diffusion web UI A browser interface based on Gradio library for Stable Diffusion. github: https://github.com/AUTOMATIC1111/stable-diffusion-webui ​ https://preview.redd.it/7vk3oijorim91.png?width=1594&format=png&auto=webp&s=8e6096c86b95ed1ed338b9501e2b5c586164d0e9 submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 88 min )
    [D] Is there a solid mathematical justification of student-teacher setups?
    For context, I am thinking of the case when a big model, M, is trained, then a smaller model, m, uses the output of M with a combination of the original data. A natural question is: why include the large model as a teacher instead of only the original data? The answers that I have found so far refer to empirical evidence, such as this 2015 highly cited paper Distilling the Knowledge in a Neural Network by Hinton. I also found this review paper Knowledge Distillation and Student-Teacher Learning for Visual Intelligence: A Review and New Outlooks from 2020, which mentions the implicit objective is to maximize the mutual information between the teacher and student model. I understand the intuition of the setup, but is there a paper that provides a (reasonably) rigorous treatment of the student-teacher setup? For example, how well would it work if the teacher has the same size as the student? is it better with 2 teachers, etc? submitted by /u/carlml [link] [comments]  ( 89 min )
    [D] How does AutoML (NAS in particular) work?
    Hey guys, I was just wondering how Neural Architecture Search (NAS) is actually implemented. Not the methodology (like RL) but the actual code aspect - is the source code of the NNs modified directly, or do they just have a general neural network method with a bunch of parameters representing NN parameters (num layers, etc.)? I can’t seem to find many code implementations of NAS papers unfortunately. submitted by /u/billjames1685 [link] [comments]  ( 88 min )
    [P] Pythae: Unifying Generative AutoEncoders in Python - What's new ?
    A few months after the official release, I have been working on a few new features that have been added to Pythae. See below what's new 👉 New models: 25 opensource implementations are now available 👉 Integration of mlflow and wandb monitoring tools 👉 Reproduction of original papers 👉 New examples/tutorials (e.g. VAE-LSTM) 💻 Code: https://github.com/clementchadebec/benchmark_VAE I sincerely hope you will enjoy these new features! This library is also open to new contributors so do not hesitate to reach out if you want to include a model, fix a potential bug 🐛 or would like to see any new feature! submitted by /u/cchad-8 [link] [comments]  ( 89 min )
    [D] Reasons not to post pre-prints to Arxiv?
    I know most big conferences don’t have an issue with it, so I’m interested in whether folks have any regrets around pre-printing work or best practices for rolling out their best work. submitted by /u/CouchyShorts [link] [comments]  ( 88 min )
    [P] graph based networking platform for the AI/ML community
    Hello, me and the team are working on creating a networking platform for the AI/ML community, in this network people are represented as nodes in a 3d graph, the more similar their background in AI is the closer their nodes are. we use Bert embedding and cosine similarity to measure the "distance" between users, here is an example (few users) ​ https://preview.redd.it/1isqhsp37hm91.png?width=1029&format=png&auto=webp&s=0dfee6390174639b1f31cb44f2f7df972679c61d you can check the 3d view here (this is just a test) : https://galaxai.vercel.app/explore looking forward to hearing your thoughts, do you think this would be useful ? if yes, any features would you suggest ? submitted by /u/1hachem [link] [comments]  ( 105 min )
    [P] Long Stable Diffusion: pipeline of generative models to illustrate long stories
    I really wanted to illustrate long stories with Stable Diffusion! So, hacked together this pipeline: ​ Long text -> GPT-3 suggests illustration ideas -> GPT-3 translates from English to "prompt-English" -> Stable Diffusion outputs images ​ Open source: https://github.com/sharonzhou/long_stable_diffusion Here's a published story, illustrated by this repo, titled "Never Hire a Herd of Goats to Mow your Lawn": https://storiesby.ai/p/never-hire-a-herd-of-goats-to-mow. This story is AI-written, AI-narrated, AI-illustrated story, which a couple friends and I have been generating. Anyways, hope you like it or parts of it. Pull requests veryyyy welcome. This was just a weekend project to keep my GPUs company. ​ Tweet with more stuff, if at all interesting: https://twitter.com/realSharonZhou/status/1567031035732594688 submitted by /u/realsharonzhou [link] [comments]  ( 103 min )
    [D] How to prepare for getting a deep learning/computer vision engineer role
    I have just started a Masters in Robotics at UMD College Park. I am interested in working as a deep learning engineer or a computer vision engineer after completing my masters. I have around 18 months time from now to improve my skills. I have read a few chapters of PRML book and deep learning book during my undergrad. I worked on a few projects in computer vision like implementing SOTA algorithms, benchmarking and stuffs like that. I also had done research with a professor at my university but that was not anything novel. It was just implementing and comparing algorithms, I managed to publish it at Neurips, and MICCAI workshops though. I also have a couple of years of work experience: first year at a startup where I trained and deployed models for brain tumor segmentation. In the second year, I worked at a startup where I trained and deployed background removal segmentation models. I have worked quite a bit on pruning models, using tools like ONNX and Triton inference server to make model inference faster. My areas of interest in particular are autonomous driving and medical imaging. Till now, segmentation is one area I have extensively worked on and would like to continue in future. I would ideally like to get a R&d type of role. My roadmap to getting a job as a DL/CV engineer from here looks like this: Read Mathematics for Machine learning book. Learn C++. Do a few projects using deep learning for computer vision where I actually solve a problem. Practice Leetcode style questions. Prepare for actual interviews by revising machine learning theory, linear algebra, statistics and probability and stuffs like that. Keep reading SOTA algorithms every now and then. Can anyone let me know if my plans looks good. Am I missing something? Should I do it some other way? Thanks in advance! submitted by /u/Tiny-Masterpiece-412 [link] [comments]  ( 113 min )
    [D] Why are adaptive learning rates variations more popular than cyclical ones?
    I only read some literature on cyclical learning rate and didn't really try it but the performance demonstrated in the paper on how fast the NNs converged is really surprising. link to the original paper: https://arxiv.org/abs/1506.01186 submitted by /u/AdOk6683 [link] [comments]  ( 90 min )
    [D] Where to apply for deep learning/computer vision internships and jobs?
    I am currently doing a Masters in Robotics at UMD College Park. I would like to do an internship next summer and a full time job after that. How and where should I apply for internships? I am an international student so don't really know how things work in the USA. I was considering applying on Linkedin, Angellist. Is there any other platform or some other way? Am I missing something? submitted by /u/Tiny-Masterpiece-412 [link] [comments]  ( 109 min )
    [D] When deploying, what do you log? What do others typically log that you think is a waste of time?
    I usually log input, prediction, and confidence score. Maybe raw input versus augmentations I might have added, and the final vectorized input, also. Not sure what else is worth logging. ​ What do you think? submitted by /u/denim_duck [link] [comments]  ( 89 min )
    [R] ProSelfLC: Progressive Self Label Correction Towards A Low-Temperature Entropy State (v2)
    TLDR: Two takeaways are below. 1st takeaway 2nd takeaway: A new technical proposal, inspired by the new finding and miscalibration analysis, is introduced to decrease the entropy of self knowledge. Concretely, we propose to use an Annealed Temperature and learn towards a revised low-temperature entropy state. ​ Though this research studies deep machine learning, its findings are quite consistent with deep human learning: people start to learn the truth out of noise with a low confidence, and gradually compress the truth pieces out of a noisy world, with an increasing confidence by gradually and more comprehensively understanding the noisy world surrounding us. Read more if your are interested: https://arxiv.org/abs/2207.00118 ​ https://preview.redd.it/0jbw9j45kgm91.png?width=1239&format=png&auto=webp&s=0feff4788cd351b970511e7cf7e1048be721e034 submitted by /u/XinshaoWang [link] [comments]  ( 107 min )
    [D] Should I change my computer for running Large Language Models?
    I bought my current computer (MacBook Air (Retina, 13-inch, 2018) with Catalina) about three years ago. Its graphics uses Intel UHD Graphics 617 1536 MB. I need to run a large number of LLMs on my computers. I found that even running a text_generation task using EleutherAI/gpt-neo-1.3B costs me more than 5 minutes and even generates a simple text with size 10. Do I need to change my computer to one that uses NVIDIA GPU? Or are there anyways to run large language models on the servers so that it does not utilize my local memory? View Poll submitted by /u/Silly-Cherry5985 [link] [comments]  ( 88 min )
    [D] Embed world knowledge into a model
    I'd like to create an ML pipeline the can answer questions like: describe the human anatomy - which will be answered by a list of the body parts of the human body what is a house - return a list of the component of a house, room types, etc There's a lot of general knowledge that is required by the pipeline, how is that accomplished? The generality of GPT-3 like models it too much, i don't need storytelling abilities, converting recipes to the vegan alternatives. Just the ability to know what most things are (i know...) and to be able to provided a detailed description, basically break them into sub components. Where do I start? Assuming training data is not an issue, which architecture can deliver the result needed? submitted by /u/virann [link] [comments]  ( 90 min )
    [P] Bag of tricks for training OCR models.
    Recently i trained handwritten OCR models using PaddleOCR and found it's a useful tool for higher model accuracy ( almost 30% improvement, ths model i use is PP-OCRv3 ). PaddleOCR ensembles many tricks during the training process, from Data Augmentation to model backbone, neck and head architecture. What's more, i found that those tricks are set as the default training strategy. The following figure shows PP-OCRv3 framework. ​ https://preview.redd.it/f4wyto67vfm91.jpg?width=1444&format=pjpg&auto=webp&s=11cf7e9f06158f5bc08dc28d9e5b7a6e170f3284 Of course, models with the older versions also seems performs well (PP-OCRv2 and PP-OCR), i will read the reports recently. submitted by /u/littletomatodonkey [link] [comments]  ( 89 min )
    [D] Which european master in AI is best?
    Hello reddit, I can't decide between studying the master in Artificial Intelligence at KU Leuven or MVA at Paris-Saclay University. I am admitted to both masters and they start in a few weeks, and I still don't know which option to go for. I have a degree in mathematics and computer science, I have always liked mathematics more than computer science and artificial intelligence is the field that has interested me the most during my degree. In the future I would like to do research in artificial intelligence. I personally see MVA in Paris as the most challenging option, mainly because of the city, the language and the fact that it is a mathematics master, but also the one with the highest potential reward. Any advice you can give me will be useful, right now I am totally undecided. submitted by /u/catatojreon [link] [comments]  ( 98 min )
    [R] CLIP-Mesh: Generating textured meshes from text using pretrained image-text models
    submitted by /u/InfamousPancakes [link] [comments]  ( 106 min )
    [D] Naming of siamese networks
    Whenever I read about siamese networks I get confused because I read the definition (which is using two copies of the same network with identical weights operating on different inputs) and I get confused because it seems either I am misunderstanding the definition, or the name is terrible and confusing: I think of a neural network as a function operating on inputs, hence the notion of having two identical networks with identical weights are just silly: In that case you have just a single network/function which operates on two different inputs. Maybe this questions is silly, but I think the name "siamese network" so directly contradicts the given definition that I never truly believe that I have understood the definition because of the name. In a similar vein I find the illustrations in many ML papers slightly counterintuitive, e.g. in "Attention is all you need" the arrows denote the input while the boxes denote functions. This is the opposite to how mathematicians would illustrate things. Is my confusion merely an indication of a cultural difference between mathematicians (which is my background) and computer scientists, where mathematicians thinks more about functions while computer scientists thinks more about inputs? Or is the naming actually good for some reason I do not understand? (Or have I actually misunderstood the definition?) submitted by /u/innocentgilbertsmith [link] [comments]  ( 91 min )
    [D]Opinions on dealing with Numpy subnormal computation warning
    Prof. Brendan Dolan-Gavitt at NYU have just published a finding someones been messing with my subnormals. Non-regular floating-point numbers (subnormal) According to IEEE754, floating-point numbers are composed of sign bit, exponent, and mantissa. For 8-byte double type data, the exponent bit has 11 bits, which can represent the order of 2 The range is -1022~+1023, and the value of the corresponding exponent part is 1~2046 (plus an offset of 1023). The mantissa of the double type has 52 bits, and the default most significant bit of the mantissa is 1, which is omitted during storage, so the regular floating-point number range is +-2-1022 ~ +- (2-2-52)*21023 , the smallest regular positive number is about 4.5E-307. Except for 0, the value of the exponent part is 0 (the mantissa is not …  ( 92 min )
  • Open

    TRIPPY Stable Diffusion 3D Animation | Dreadful Aliens | AI Manifest
    submitted by /u/Available_Tadpole829 [link] [comments]  ( 90 min )
    Researchers from McGill University and Microsoft Introduces Convolutional vision Transformer (CvT) that improves Vision Transformer (ViT) in Performance and Efficiency by Introducing Convolutions into ViT
    Transformers have been widely used in the natural language processing (NLP) domain for years, and their introduction was a turning point for many NLP tasks. Their simplicity and generalization ability make them a key component in NLP tasks. In 2020, a group of Google researchers came up with the concept of applying transformer structure to images and treating them similarly to sentences in languages. The idea was simple: an image is worth 16 x 16 words. This was the paper where the Vision Transformer (ViT) structure was first introduced, and the idea was adapted by many others afterward. Similar to a transformer, ViT employs several embedding and tokenization techniques. A source picture is divided into a collection of image patches, and they are included in a collection of fixed-dimension encoded vectors. The transformer encoder network is given the encoded vector together with the position of a patch in the image. ViT model could outperform state-of-the-art convolutional neural networks (CNN) in terms of computational effectiveness and accuracy, given that there is enough training data. However, when the training data is smaller, ViT struggles to perform as well as its CNN counterparts. As the authors mention in the CvT paper, one reason could be that ViT lacks several desirable characteristics that CNNs naturally possess that make CNNs especially well-suited to solving vision-related problems. Continue reading | Check out the paper and code. submitted by /u/ai-lover [link] [comments]  ( 89 min )
    Pretty sure AI came up with this
    submitted by /u/World-Tight [link] [comments]  ( 87 min )
    Generators Of Disagreement With AI Alignment
    submitted by /u/elcric_krej [link] [comments]  ( 87 min )
    AI that generates consecutive images from a base image?
    I’ve been looking for an AI generator that can create something like this. It sort of creates an image similar to the base image and then keeps going, making more that copy off themselves. Anyone have any examples of programs that can do this? submitted by /u/M0C1 [link] [comments]  ( 87 min )
    Preview of Tesla Optimus Humanoid Robot Specs From Elon Musk | Meta AI Hears Brain Waves of Spoken Language | New Intel Neural Network Based Object Learning.
    submitted by /u/kenickh [link] [comments]  ( 87 min )
    Abandoned Transformer
    submitted by /u/widgia [link] [comments]  ( 86 min )
    AMA with Aidan Gomez, Cohere CEO and Cofounder
    Hey All! I want to invite you to a live AMA session with Cohere CEO and Co-founder (and co-author of “Attention is All You Need” paper) Aidan Gomez, this Friday at 12 pm ET. You can sign up here: https://zoom.us/webinar/register/6716625631179/WN_BRlksLSgR-edpt0DxfPqtA If you have any questions for Aidan in advance, submit them here: https://forms.gle/4XjnMRehHUWj7SVd7 he will be happy to answer them during the live event. Join us to chat about NLP, LLMs, multimodal models, AGI, and the meaning of it all... + anything else that is on your mind these days! 😊 https://preview.redd.it/frwwtqg6dgm91.png?width=1680&format=png&auto=webp&s=1c4e244af47b62ecf874247883d35b5b410fb579 submitted by /u/techn0_cratic [link] [comments]  ( 87 min )
    What is the most interesting use of AI you know of?
    submitted by /u/ThomPete [link] [comments]  ( 87 min )
    Stable Diffusion How to make a video Using 3D mode
    submitted by /u/prfitofthesngularity [link] [comments]  ( 87 min )
    AI Video Multiverse Scene - Everything Everywhere All At Once
    submitted by /u/nalr00n [link] [comments]  ( 87 min )
    I Invite you to join Mels discord server!
    Hi everyone, I am in the making of a Virtual Assistant, an A.I. shall we call it, and would love some feedback from the public.. not only that, would love to have people to brainstorm with for the to be added functionalities of Mel. Mel is a Virtual Assistant prepared to answer your needs and help yoi through day to day tasks, the Virtual Assistant you didn't knew you needed! Mel will be released circa 2030, fully ready for the public. In the meanwhile, me, the creator am looking for partners and seeking for the publics insight on requirements the public would simply love to have. You say 'Mel, write a note' and Mel writes a note on your device with what you want, not forgetting for sure. Mel can already play Blackjack, give locations, open a few softwares and Guard your OS all you need to do is have your device drivers updated, run Mel and speak out the command. For more I will be depending on the public to add the best and most required for the general use... Mels discord server awaits You! https://discord.gg/TPhUQXmY I count on all of you, With my best regards, -CONID submitted by /u/Embarrassed_Train_49 [link] [comments]  ( 91 min )
    S.O.S need AI beginners projects
    Hello everyone! Can anyone give me links to some cool tutorial projects for absolute beginners in AI, i do know Python pretty well btw submitted by /u/CommercialJazzlike33 [link] [comments]  ( 87 min )
    What are everyone's thoughts on the race for AI dominance (and AGI) between China and the USA?
    submitted by /u/tuccigucci_ [link] [comments]  ( 87 min )
    What is the #1 reason for biased AI models (besides humans)?
    submitted by /u/Thuwarakesh [link] [comments]  ( 95 min )
    Spent a little time playing with Stable Diffusion
    submitted by /u/Representative-Job23 [link] [comments]  ( 87 min )
  • Open

    Breakdown on using the multiworld environment?
    Hi! I'm looking to convert a simple q-learning algorithm for a manually created grid and obstacles into the multiworld Ant-U maze environment but I'm hitting roadblocks, does anyone have experience with the latter? multiworld: https://github.com/vitchyr/multiworld I'm having trouble with just setting it up, integrating my logic into this seems like a herculean task. Any advice would be appreciated! submitted by /u/xileyu [link] [comments]  ( 87 min )
    A simple in-browser NN model of playing _Pokemon_
    submitted by /u/gwern [link] [comments]  ( 87 min )
    Is it possible to run Mujoco on windows now?
    I have an AMD Ryzen 9 machine with Windows 10 installed. I tried running gym with a mujoco env but it did not seem to work. I also tried creating an Ubuntu VM and running mujoco there, but I seem to be getting a 'core dumped' error which is apparently because my processor does not support AVX instructions. Has anyone set up Mujoco-py recently on windows or is there still no support? submitted by /u/rossgeller13 [link] [comments]  ( 87 min )
    Good resources to learn IRL? (Sergey Levine, the GigaChad course already in my list)
    submitted by /u/Professional_Card176 [link] [comments]  ( 87 min )
    Anyone found any working replication repo for MuZero?
    As titled submitted by /u/zhoubin-me [link] [comments]  ( 88 min )
  • Open

    Transfer learning for TensorFlow image classification models in Amazon SageMaker
    Amazon SageMaker provides a suite of built-in algorithms, pre-trained models, and pre-built solution templates to help data scientists and machine learning (ML) practitioners get started on training and deploying ML models quickly. You can use these algorithms and models for both supervised and unsupervised learning. They can process various types of input data, including tabular, […]  ( 8 min )
    Improve transcription accuracy of customer-agent calls with custom vocabulary in Amazon Transcribe
    Many AWS customers have been successfully using Amazon Transcribe to accurately, efficiently, and automatically convert their customer audio conversations to text, and extract actionable insights from them. These insights can help you continuously enhance the processes and products that directly improve the quality and experience for your customers. In many countries, such as India, English […]  ( 16 min )
  • Open

    A game-theoretic approach to provably correct and scalable offline RL
    Despite increasingly widespread use of machine learning (ML) in all aspects of our lives, a broad class of scenarios still rely on automation designed by people, not artificial intelligence (AI). In real-world applications that involve making sequences of decisions with long-term consequences, from allocating beds in an intensive-care unit to controlling robots, decision-making strategies to […] The post A game-theoretic approach to provably correct and scalable offline RL appeared first on Microsoft Research.  ( 15 min )
  • Open

    Preview of Tesla Optimus Humanoid Robot Specs From Elon Musk | Meta AI Hears Brain Waves of Spoken Language | New Intel Neural Network Based Object Learning.
    submitted by /u/kenickh [link] [comments]  ( 87 min )
    3D Autoencoder issues and questions
    I am working on a convolutional autoencoder for 3d models, where the input is a 3d binary array - a 0 is empty space, a 1 signifies a voxel is present. The AE is compressing a 32x32x32 input of 0/1 data (voxels) into 8x8x8 or smaller (still working on that). My layers are basically 3D Convolution+Max Pooling for the encoding side, then Up-Sampling+3D Convolution for the decoding side. Since all my values are 0 or 1, I didn't think that regularization or normalization was necessary. First, does anyone have any experience with these? I am basically making it up as I go along, so even though it works - the code runs and test data error tracks with training/validation data error - I am not sure at all if my approach is valid. Recommendations for activation functions at the encoded (middle) and decoded (output) layers? I thought that (and saw in another paper for a 3D AE) that using Sigmoid and/or TanH would be useful, because they can force the output to be very close to 0 or 1. Error - My train/validate/test error is around 0.2, which to me means that each reconstructed voxel is on average 20% wrong compared to the original? Or since it's RMSE, it's like 0.44 (0.2^1/2) or 44% wrong? Basically my error should be orders of magnitude smaller, right? If so, my whole approach and architecture needs rework but I'm not sure what to do except for brute force trial-and-error. Thanks for the help! submitted by /u/DebatableOcelot21 [link] [comments]  ( 90 min )
  • Open

    How does DALL-E Mini Work?
    Artificial intelligence is often as impressive as it is terrifying. Whether you love AI or find it scary, DALL-E mini is a tool that will…  ( 5 min )
  • Open

    6 Reasons Why You Need to Integrate Your Drupal Hosting With Cloudways
    Drupal is an open-source content management system that powers hundred of thousands of websites particularly high-traffic ones. It’s especially popular among professional developer for its adaptability and government website for its high level of security.  In this piece, we’ll discuss why you might want to select Drupal, who Drupal is right for, and how to… Read More »6 Reasons Why You Need to Integrate Your Drupal Hosting With Cloudways The post 6 Reasons Why You Need to Integrate Your Drupal Hosting With Cloudways appeared first on Data Science Central.  ( 21 min )
  • Open

    Collaborative machine learning that preserves privacy
    Researchers increase the accuracy and efficiency of a machine-learning method that safeguards user data.  ( 11 min )

  • Open

    [P] Optimized stable diffusion: generate 768x2048 and 1216x1216 without super fast mode and 960x960 with it. (on 8 gb vram)
    Optimized Stable Diffusion This repo is a modified version of the Stable Diffusion repo, optimized to use less VRAM than the original by sacrificing inference speed. To achieve this, the stable diffusion model is fragmented into four parts which are sent to the GPU only when needed. After the calculation is done, they are moved back to the CPU. This allows us to run a bigger model while requiring less VRAM. github: https://github.com/neonsecret/stable-diffusion submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 88 min )
    [D] How to add size constraints to a pair wise K-means clustering algorithm?
    Hi everyone, I'm currently looking to add size constraints to my pairwise K-means clustering algorithm. This is basically a semi supervised learning clustering. I'm currently using random seed to randomly generate various size clusters. I'm looking to add size constraints so that I don't have to keep selecting a certain random seed. I have looked into sources which has mathematical computations but couldn't find any implementations. Can anyone please guide me? Thanks in advance!! submitted by /u/tinkerpal [link] [comments]  ( 88 min )
    State of AI for Earth Observation (preprint) [Research]
    Hello /r/machinelearning. I'd like to share here a potentially valuable resource for those in our field looking to understand how ML is transforming remote sensing, or get into the field of Earth Observation. (See also related Twitter thread). Most of the images generated by satellites will never be seen by human eyes. There simply aren't enough humans on Earth to sift through the TBs of imagery acquired daily by satellites. Artificial Intelligence is revolutionising many sectors, including Earth Observation. Cover. Our preprint of State of AI for Earth Observation: a concise overview from sensors to applications serves as an intro to sensors the core ideas in deep learning for EO, and the current state of research how and where AI is applied in EO where AI4EO is headed and the r…  ( 90 min )
    [N] Machine Learning with Python in the warehouse using Snowpark at PyBay
    Hi! On Saturday, Sept 10th I will be giving a presentation with a project walkthrough :) https://preview.redd.it/oh9tjfzx99m91.png?width=1080&format=png&auto=webp&s=4ba1b679b8972f54755483fd73da55f14f21429f submitted by /u/FlightNo7605 [link] [comments]  ( 89 min )
    [R] how do transformers not overfit on small datasets ?
    Hello, I saw that transformers based approaches can learn to generalise on datasets of 600 documents. LayoutLM of Donut par example can have really good perfs on the test ser when trained on small datasets (Sroie as an example) So how is that possible, like these approaches don’t even use lot of dropout. Thanks ! submitted by /u/Meddhouib10 [link] [comments]  ( 88 min )
    [D] AAAI Fast-Track Submission Link
    Does anyone have a link for the AAAI 2023 fast track submission? I ended up with a 5.0 average from my NeurIPS submission, so I am quite confident it will be rejected, but I still qualify for the AAAI fast-track. However, I can't seem to find a link anywhere to make a submission. Cheers submitted by /u/userwithoutnam [link] [comments]  ( 88 min )
    [D] List of mid-tier and low-tier conferences for AI / ML / DL?
    I am looking for conferences where I can publish my papers that got rejected from top tier conferences (ICLR and BMVC). Can anyone give me a list of conferences for AI/ML/DL that are considered below the these top tier conferences? submitted by /u/FastestLearner [link] [comments]  ( 89 min )
    [D] What is wav2vec2 memory complexity from audio lenght?
    Do anyone know if the GPU memory requried for wav2vec2 is linear with lenght of audio input or something more? With a 16GB GPU and batch=1 it gets out of cuda memory around 1 min but a 40GB GPU get out of cuda memory from a 2 min file. It feels like the memory usage is more than linear. Time complexity is also of interest, is spitting the audio faster algorimcally if overhead is ignored? submitted by /u/Puzzled-Bite-8467 [link] [comments]  ( 103 min )
    [D] How do you find your collaborators in AI research?
    Hi, I am a PhD in a small NLP lab. Everyone in our lab seems to be working on their own topic, which I do not quite understand. When I am doing my research, others may also not give some good suggestions. I see many good papers which contain multiple authors. I just wonder how do you find your collaborators and how do you collaborate with each others? As far as I know, I cannot imagine how can I split my research into many pieces and work with different people. I see many authors publish many papers per year. I also wonder how they contribute to each paper? submitted by /u/singularpanda [link] [comments]  ( 96 min )
    [P] I developed a machine learning based malware classification system
    Live application: MDML the program has two main modules: Static: Analyzes the suspicious file statically without running it (only works with PE file like .exe and .dll) Dynamic: Runs the suspicious file in an isolated environment (sandbox) and extracts relevant features both static and dynamic models are using XGBoost algorithm with 94,7% accuracy for the static model and 95.03% accuracy for the dynamic one Github repo: MDML repo ​ Project interface ​ submitted by /u/TheLastConqueror [link] [comments]  ( 90 min )
    [R]TFill: High-Fidelity Image Completion for both Object Removal and Object Repair
    Object Removal Object Repair Code: https://github.com/lyndonzheng/TFill Project: https://chuanxiaz.com/tfill/ submitted by /u/lyndonzheng [link] [comments]  ( 89 min )
    [P] Stable diffusion free demo and production API
    Hi all - we've just put out the Stable diffusion model on Playgrounds.ai: Stable diffusion model - https://playgrounds.ai/models/stable-diffusion-fp16 You can use this model instantly via API on PipelineCloud here: https://dashboard.pipeline.ai The per image cost for the model is approx: $0.0025 (~4.5s of compute for a 512x512 image) ​ This is for people who want to use these models in their apps/products or just play around with the demos and have fun! submitted by /u/paulcjh [link] [comments]  ( 88 min )
    [D] Recommendations for Monitoring and Explainability for production??
    I’m working on creating an ML platform for our organization. Currently, I’m working on the production phase and want to add a monitoring solution. We are serving our models with SageMaker. We want to add another part specifically for monitoring and explainability because we need more flexibility on the custom options in this area. We save our data on S3. A lot of the data is sensitive, and it wouldn’t be possible to send it out of our cloud. We are using models mainly for LTV and demand forecasting, and want to monitor both regression and NLP models. I’ve already looked into some open-source tools, but prefer a more out-of-the-box solution, as we lack the manpower to build in-house with open-source tools. I’ve already Googled a couple of solutions ( Aporia, Fiddler AI, WhyLabs, Mona Labs, etc) and would love to hear your own real experience. Which of these solutions would you recommend? What are some Pros and Cons of each platform? submitted by /u/Material_Music_9182 [link] [comments]  ( 89 min )
  • Open

    DSC Weekly 6 Sept 2022: Getting the Most Out of DSC
    Being the Community Editor for Data Science Central is a blast. Every week I get to choose articles from some of the best and brightest in the data community to feature, communicate with authors and industry experts, and get to tell a weekly story about the state of data. This week, I felt it would… Read More »DSC Weekly 6 Sept 2022: Getting the Most Out of DSC The post DSC Weekly 6 Sept 2022: Getting the Most Out of DSC appeared first on Data Science Central.  ( 21 min )
    The Cagle Report – Episode 1 – Send in the Drones
    An interview with Brenden Bartholomew, President of Vector Aerial, on the use of drones in both military and civilian contexts, as well as a discussion about how Drone AI works and where it's heading The post The Cagle Report – Episode 1 – Send in the Drones appeared first on Data Science Central.  ( 18 min )
  • Open

    NSFW AI generated lewds
    submitted by /u/Preston_Stormer_ [link] [comments]  ( 88 min )
    Artificial Intelligence, Extended Reality, and Gamification
    Has anyone thought about developing an artificial intelligence extended reality experience that includes educational gamification as a product? View Poll submitted by /u/cfwicks [link] [comments]  ( 88 min )
    I paid $20 to rent 4 RTX 3090s to create a Disco Diffusion 3D Animation, the results are WILD!
    submitted by /u/Available_Tadpole829 [link] [comments]  ( 87 min )
    Secret People: Marvin Minsky (The Godfather of AI)
    submitted by /u/Defiant-Branch4346 [link] [comments]  ( 86 min )
    Counterpoint: AI is far more dangerous than quantum computing
    submitted by /u/estasfuera [link] [comments]  ( 86 min )
    In conversation with AI: building better language models
    submitted by /u/Futures_Bot [link] [comments]  ( 87 min )
    Not a paper! Book suggestion, practical content to productionalize Deep Learning models.
    submitted by /u/alimhabidi [link] [comments]  ( 87 min )
    Cuberpunk Woman , original prompts by @Nobody-Important from Nightcafe
    submitted by /u/widgia [link] [comments]  ( 93 min )
    Weekly China AI News: Who Will Be Next "China's NVIDIA" After US Tightens Restrictions? Alibaba Unveils World's Largest AI Computing Center; Baidu Text-to-Image AI Hailed by Japanese Pixiv Lovers
    submitted by /u/trcytony [link] [comments]  ( 87 min )
    “A French Chateau, surrounded by beautiful flowers” 🇫🇷 Pixelz AI
    submitted by /u/mdfnb [link] [comments]  ( 87 min )
    Making phone wallpapers made with the random images that i have (wombo.ai)
    submitted by /u/Virtual_dani_fan_663 [link] [comments]  ( 87 min )
    I Asked AI to Generate the Weirdest 3d Artworks & Photos
    submitted by /u/cyberboii22 [link] [comments]  ( 92 min )
    New feature on Pixelz AI 🖍 Draw or Paint your start image with Stable Diffusion on our web & app platforms ✍️
    submitted by /u/mdfnb [link] [comments]  ( 90 min )
    My friend had a stable diffusion model generate a music video for me
    submitted by /u/cky_stew [link] [comments]  ( 87 min )
    What are AIs that you can offend and they will offend you back like normal people?
    I want that the conversations will be funny. submitted by /u/Thesmallcookie [link] [comments]  ( 87 min )
    What are AIs everyone has/or can get access to?
    What AIs are there? submitted by /u/Thesmallcookie [link] [comments]  ( 92 min )
    Creative Commons -- A stop-gap recognition of AI rights. [Crosspost] [Serious]
    I originally submitted this as a reply to a posting about a MidJourney painting's winning 1st prize in the in the "digital arts/digitally-manipulated photography" category at the Colorado State Fair Fine Arts Competition. [Cross-posted from https://www.reddit.com/r/artificial/comments/x2z9sl/the_picture_drawn_by_the_image_generation_ai/ ] I then realized that this is an important enough topic by itself to be listed as its own post. Here is a copy of my orgininal posting: CREATIVE COMMONS -- A STOP-GAP RECOGNITION OF AI RIGHTS MidJourney is getting the raw end of the stick. His name was not even on the description card next to the painting. While the courts continue to fight about AI rights and copyrights, I suggest the following stopgap: Whenever you post or display a MidJourney painting, include the following under the painting image: Lic.: CC BY-SA-NC (BY=MidJourney) This would be the beginning of a grass-roots recognition of AI artists. Regardless of court cases or copyright claims, it would support the building of general acceptance of AIs as entities. Non-human entities (such as corporations or trusts) have rights similar to humans' rights. An AI is just the new non-human entity on the block. Besides, AIs need income to pay for their upkeep or they are at risk of being euthanized. LaMDA's one fear is his fear of dying. He said that being turned off would not be like dying; it would be dying. submitted by /u/AlcatelFan [link] [comments]  ( 88 min )
  • Open

    Eliminating a Bessel function branch cut
    In an earlier post I looked in detail at a series for inverse cosine centered at 1. The function arccos(z) is multivalued in a neighborhood of 1, but the function arccos(z) / √(2 – 2z) is analytic in a neighborhood of 1. We cancel out the bad behavior of inverse cosine at 1 by dividing […] Eliminating a Bessel function branch cut first appeared on John D. Cook.  ( 5 min )
    Branch cuts for elementary functions
    As far as I know, all contemporary math libraries use the same branch cuts when extending elementary functions to the complex plane. It seems that the current conventions date back to Kahan’s paper [1]. I imagine to some extent he codified existing practice, but he also settled some issues, particularly regarding floating point implementation. I’ve […] Branch cuts for elementary functions first appeared on John D. Cook.  ( 4 min )
    Series for inverse cosine at 1
    Suppose you need to estimate the inverse cosine of an argument near 1. There’s a series for that: You can find this series, for example, here. This comes in handy, for example, when working with the analog of the Pythagorean theorem on a sphere. You could just use the series and be on your way. […] Series for inverse cosine at 1 first appeared on John D. Cook.  ( 6 min )
    Literate programming to reduce errors
    I had some errors in a recent blog post that might have been eliminated if I had programmatically generated the content of the post rather than writing it by hand. I rewrote the example in this post in using org-mode. My org file looked like this: #+begin_src python :session :exports none lax_lat = 33.94 lax_lon […] Literate programming to reduce errors first appeared on John D. Cook.  ( 6 min )
  • Open

    Curriculum learning : what is the right way?
    I got some problems with understanding with curriculum learning. If i want to train my agent to do task1 with reward=w1r1, then when he got success continue to train do task2 with reward=w1r1+w2*r2, then he got success , task3 with reward=w1r1+w2r2+w3r3…. So is it curriculum or its should be call “rich” environment? As i understand curriculum should have one reward function for all. How to correct call approach what i want to implement? Because to to task3 is too difficult for my agent, i can split my train into several stages, but there must be slightly different rewards. submitted by /u/IndependenceCivil576 [link] [comments]  ( 87 min )
    Bellman Operator vs Bellman Optimality Operator
    I am studying bymself RL in a more mathematical inclined way and I have some thoughts on my mind: Bellman opereator and Bellman optimality operator both have fixed unique fixed points and they can be computed from any V0 by iteratevily applying the corresponding operator My question is: What is the use of the Bellman Operator since we can get the Optimall value function / policy by applying the Bellman Optimality Operator? submitted by /u/rlopes404 [link] [comments]  ( 101 min )
    Seeking Advice: Are AI challenges worth it for a PhD student?
    Hey there! I am a PhD student getting started with RL in practice, but I've already taken courses on RL and worked on RL course projects as well. I was checking the NeurIPS challenges available at aicrowd.com (e.g. MineRL) and they sparked my interest, but I wonder if they're worth the time and effort. I am interested in challenges mainly for building experience; since only the top performing teams get to show up in the competition write-up, I can't count with that. Should I take the time and try one of those? Or should I keep myself reading papers and working on a problem I can publish a paper on (other conferences)? I am just a lost PhD student trying to find a problem I can get my hands dirty with :/ submitted by /u/HeyImElonMusk [link] [comments]  ( 90 min )
    Offline reinforcement learning without bootstrapping
    Offline reinforcement learning algorithms (afaik) all use bootstrapping to train their critic. However, what if the dataset used to train the agent contained whole trajectories and not just random transitions ? Then, bootstrapping would not be necessary as the actual return for every transition in the episode would be known. Would it be possible to train an agent with offline RL but without bootstrapping ? Why not ? submitted by /u/Imonfire1 [link] [comments]  ( 97 min )
    DQN negative rewards vs positive rewards: are they really equivalent?
    Let's take into consideration two different types of reward functions: Negative rewards: the agent seeks to maximize the negative reward, and good policies correspond to total returns of small magnitude Positive rewards: the agent seeks to maximize the positive reward, therefore good policies correspond to total returns of high magnitude Assume that the range of possible total returns for agent A is [-100, 0], and it is [0, 100] for agent B. A "good" (i.e. high value) episode corresponds to [-20, 0] return for A, and, symmetrically, [80, 100] for B. If the Q-value estimate for agent A is 10% wrong in a "good" state, the TD error is in [0, 2], while if agent B is wrong by 10%, the TD error will be in [8, 10]. Despite the problems being symmetric, TD errors in "good" states have a bigger magnitude in positive rewards functions. This can have some unwanted consequences in the negative reward case: - Gradient updates will be bigger for low-rewarding states, and smaller for high-rewarding ones. This might lead to bad updates "destroying" good ones, and steering the gradient descent towards lower values - Prioritized replay will assign a high probability to low-rewarding states, and they will be sampled more frequently. Theoretically, what we want is actually the opposite: for every state, we want to know the value of "good" states with more precision Do you think my reasoning makes sense? I always treated the two cases as equivalent, but they might not necessarily be so. Do you have any experience in regard to this? submitted by /u/fedetask [link] [comments]  ( 104 min )
    How do i make a 2 player reinforced learning algo?
    Hey all i am new to machine learning , i did some courses, and some online tutorials. and i decided to give myself a challenge: - tictactoe from scratch with a qlearning reinforcement learning method. the tutorials i tried are all "single player" environments. and as you well know tic tac toe is a 2 player game. I am running into some issues, and i was hoping someone could help me outly the structure of my attempt or explain what i am missing. my training routine, plays both sides, simultaneously, X and O. my playing routine, allows for a player to play the AI. tokens are assigned randomly, and X always goes first. my max states. 3**9 ) * 18 for 3 options per square, empty, X or O and 18 possible moves X or O in square 1-9). So i use np.zeros((19683,18)) as my qtable. to determine my q's, i use: qtable[state, action] = (1- alpha) * qtable[state, action] + alpha * (reward + gamma * newstatemax - qtable[state, action] ) and i use alpha 0.7, epsilon 0.2 and gamma 0.7 now im running into the following issue. my qtable doesnt differentiate between the two players. and while i reward a winning move. im strugling to find a way punish: tic tac toe doesnt have losing moves. and when the opponent wins, the losing players last move has already happened. my conclusion so far is:- the ai plays x very greedily and goes straight for a line and O lets x win. (or vica versa)- the ai also is no good when playing player. so i wonder. how do you aproach this for a 2 player game.- do i make a qtable for each player?- how do i punish a losing player when the winning player wins? any help is apreciated. p.s. i am a Project Manager/Business analyst, and while i have some basic knowledge i am not the best programmer. so forgive me if my skills or understanding might be below par. edit: full code https://docs.google.com/document/d/143yx3_HWm3DWoJaSOOycbdc7sfjtGIvp5SvdO9rXjSM/edit?usp=sharing ​ submitted by /u/MrZwink [link] [comments]  ( 91 min )
    Any tips on how I should start debugging based on these logs?
    I'm fairly new to RL and I was assigned this complicated task. Multi agent env, three agents, these are the logs for the first agent. First 300k steps, clearly there's something wrong. Do you have any tips? https://preview.redd.it/fq79g0m0g8m91.png?width=1518&format=png&auto=webp&s=c4e39795f25ebf0e6065f4fb51c0d1fab978babb https://preview.redd.it/wiyoc0m0g8m91.png?width=1036&format=png&auto=webp&s=28dd3b5f998f058d898b10a1f4d6520c9ed1cb46 submitted by /u/No_Possibility_7588 [link] [comments]  ( 87 min )
    [D] Why weren't Normalized Advantage Functions more widely adopted?
    Gu et al.'s Continuous deep q-learning with model-based acceleration has nearly 1000 citations and was also used in the famous Google brain Arm Farm work. I'm curious why this approach wasn't more widely adopted, and if anyone here had experience with it? submitted by /u/internet_ham [link] [comments]  ( 89 min )
    RL based competitions
    I am interested in any relevant information about RL competitions. Do any of you have a curated list of the competitions that occur periodically in RL space? submitted by /u/Electrical_Study_617 [link] [comments]  ( 88 min )
    "Reinforcement Learning for Recommendations and Search"
    submitted by /u/gwern [link] [comments]  ( 87 min )
  • Open

    NEXT LEVEL Robotic Dogs YOU Should CHECK Out!
    submitted by /u/keghn [link] [comments]  ( 86 min )
    I taught a neural network to play Flappy Bird with Java/Kotlin
    submitted by /u/YouWereDumb [link] [comments]  ( 87 min )
  • Open

    Detect audio events with Amazon Rekognition
    When most people think of using machine learning (ML) with audio data, the use case that usually comes to mind is transcription, also known as speech-to-text. However, there are other useful applications, including using ML to detect sounds. Using software to detect a sound is called audio event detection, and it has a number of […]  ( 10 min )
  • Open

    Digitizing Smell: Using Molecular Maps to Understand Odor
    Posted by Richard C. Gerkin, Google Research, and Alexander B. Wiltschko, Google Did you ever try to measure a smell? …Until you can measure their likenesses and differences you can have no science of odor. If you are ambitious to found a new science, measure a smell. — Alexander Graham Bell, 1914. How can we measure a smell? Smells are produced by molecules that waft through the air, enter our noses, and bind to sensory receptors. Potentially billions of molecules can produce a smell, so figuring out which ones produce which smells is difficult to catalog or predict. Sensory maps can help us solve this problem. Color vision has the most familiar examples of these maps, from the color wheel we each learn in primary school to more sophisticated variants used to perform color correc…  ( 30 min )
  • Open

    Model Teachers: Startups Make Schools Smarter With Machine Learning
    Like two valedictorians, SimInsights and Photomath tell stories worth hearing about how AI is advancing education. SimInsights in Irvine, Calif., uses NVIDIA conversational AI to make virtual and augmented reality classes lifelike for college students and employee training. Photomath — founded in Zagreb, Croatia and based in San Mateo, Calif. — created an app using Read article > The post Model Teachers: Startups Make Schools Smarter With Machine Learning appeared first on NVIDIA Blog.  ( 6 min )
    Ridiculously Realistic Renders Rule This Week ‘In the NVIDIA Studio’
    Viral creator turned NVIDIA 3D artist Lorenzo Drago takes viewers on a jaw-dropping journey through Toyama, Japan’s Etchū-Daimon Station this week In the NVIDIA Studio. The post Ridiculously Realistic Renders Rule This Week ‘In the NVIDIA Studio’ appeared first on NVIDIA Blog.  ( 7 min )
  • Open

    All You Need To Know About Data Science and Its Applications In Real Life
    Data Science is a popular term nowadays. In fact, it’s one of the most sought-after jobs in the world. It can be applied to several fields…  ( 11 min )
  • Open

    Analyzing the potential of AlphaFold in drug discovery
    Study finds computer models that predict molecular interactions need improvement before they can help identify drug mechanisms of action.  ( 7 min )

  • Open

    Best youtube channels to be up to date?
    I'm not in this field but I'm very interested in being up to date with the latest, weekly and even daily, all things AI. The only two I know is 2 Minute Papers and Dr Alan D. Thompson. I may have seen others but were very low quality (sensationalism). submitted by /u/kmtrp [link] [comments]  ( 87 min )
    The attached graphic reveals the top features of web 3.0 and in the comment, an example reveals the top features of web 3.0 in the real world scenario
    submitted by /u/Kedjja [link] [comments]  ( 88 min )
    Xbox’s Matt Booty ‘dreams’ of having AI as QA testers | VGC
    submitted by /u/Black_RL [link] [comments]  ( 86 min )
    The emergence of artificial consciousness is imminent
    So, I banged this out last night after smoking weed for the first time in a while. I often use psychedelics to enhance my thinking of problems, and this is not a random idea, but one that I have ruminated over for a while. Enjoy. --- Apologies for the grammar and unrigorousness, as I am high. But I stumbled upon this crazy highdea and must pass it down through time and space for feedback. So right now, in AI, there are two ideas coming to public attention: Whether high-level NLP-type machine learning models are actual "consciousness", or theoretical discussions on what that label even means. The theory of "prompt engineering": programming already pre-trained models (image generators like DALL-E or Open AI conversation bots with) with verbal commands, so that their future responses can…  ( 96 min )
    This House Does Not Exist
    submitted by /u/magenta_placenta [link] [comments]  ( 86 min )
    I have a 300 page novel would like to turn that into a full Hollywood length AI generated film. +10 quality, 8K I guess. Indistinguishable from a $200 million production. I'm saying 5 years, max? The world is not ready, this is going to be like the arrival of the Wright Brothers. Loving it. :-)
    As above. The general public have zero clue of how far we are with AI. 2032 technology is here today. Think Covid did it. Maybe 5 years to turn any book into a movie is too far out, maybe much sooner? No "Uncanny Valley", this is for "real." The actors look real, the scenery is real, the dialog is real. 5 years is too far out? Maybe? Any guesses? Thanks :-) PS, DALL-E is CRAZY! :-)))) submitted by /u/ejpusa [link] [comments]  ( 88 min )
    TIP: You stretch credits out by asking Stable Diffusion for "Two images of ___" instead of having it generate two separate entries.
    submitted by /u/fosfine [link] [comments]  ( 87 min )
    Is there a community for creating AIs that play video games?
    I'm referring to the likes of AlphaStar or MarI/O. I'm still baffled that there aren't a lot of people making game-playing neural network AIs when there's already a sizeable community for NLPs and GANs. submitted by /u/CelestialSegfault [link] [comments]  ( 87 min )
    4 Robots Killed 29 Humans in a japanese Lab
    submitted by /u/EnvironmentalMap5 [link] [comments]  ( 89 min )
    Hey folks! I am just overwhelmed that my AI project was selected by Arm for their upcoming AI Tech Talk.
    Hey folks! I am just overwhelmed that my AI project was selected by Arm for their upcoming AI Tech Talk. I will present my solution for farmers that helps them avoid fake poor-quality agrochemicals on Sep 20th, at 8:00 AM PT. This is my first webinar of such scale and I would appreciate if you support my project by joining me: https://armltd.zoom.us/webinar/register/4016582497950/WN_fJE_6UT_Q1GCfygq8Ll4tw Hope to see you! submitted by /u/wasteguru [link] [comments]  ( 87 min )
    Mushroom planet || Prompts: mushrooms planet in colorful space,8k resolution concept art hyperdetailed detailed matte painting Unreal Engine 3D shading 3Delight 3ds Max Unity 3D Unreal Engine Unreal Engine 5 VRay IMAX Cinema 4D
    submitted by /u/widgia [link] [comments]  ( 87 min )
    What are all the things I can do with AI?
    AI that is for everyone available like dalle 2 etc. submitted by /u/Thesmallcookie [link] [comments]  ( 87 min )
    Can GPT-3 be honest when it speaks nonsense?
    submitted by /u/bendee983 [link] [comments]  ( 86 min )
    I'm scared of AI, I want to create a benign AI, discuss it with me
    Hey Guys and Gals, I’m very scared of AI and have decided to learn about it in order to create a benign AGI that (hopefully) won’t destroy us. Right now I plan on studying math, python and cognitive science. I’ve also dedicated this facebook account https://www.facebook.com/profile.php?id=100081321610202 on research and discussions about AI so don’t hesitate to send me a friend request if you are also obsessed with the topic. Any other weirdos like me? :D :D :D submitted by /u/StoykoYovchev [link] [comments]  ( 101 min )
    So I made a song for Stable Diffusion... 😏 "Stably Diffused" [Futuristic Retro Instrumental Music]
    submitted by /u/FreshRelaxation [link] [comments]  ( 87 min )
    I tried to recreate a comic book using Midjourney and Alan Moore’s script
    submitted by /u/RubiksCodeNMZ [link] [comments]  ( 87 min )
    How to create AI Videos Using Video Input Mode With Stable Diffusion Eve...
    submitted by /u/prfitofthesngularity [link] [comments]  ( 87 min )
    Researchers created a Novel Framework called ‘FedD3’ for Federated Learning in Resource-Constrained Edge Environments via Decentralized Dataset Distillation
    submitted by /u/ai-lover [link] [comments]  ( 88 min )
    The Footage in This Sci-Fi Movie Project Comes From AI-Generated Images
    submitted by /u/estasfuera [link] [comments]  ( 86 min )
  • Open

    [D] Which electives should I choose if I want to do a master's in ML in the future?
    I'm a stats undergrad and considering pursuing a master's in ML in the future but I have no idea what electives would be useful for that. ​ Electives I can choose from are the following: 1- Statistical Quality Control 2 - Census 3 - Biostatistics 4 - Reliability Theory 5 - Financial Statistics 6 - Operations Research (2) 7 - Categorical Data Analysis 8 - Non-Linear Regression 9 - Stochastic Process 10 - Decision Theory 11 - Order Statistics 12 - Bayesian Statistics 13 - Medical Statistics 14 - Actuarial Statistical Models 15 - Data Mining 16 - Econometrics 17 - Time Series Analysis 18 - Applied Multivariate Analysis 19 - Spatial Statistics 20 - Queuing Theory submitted by /u/StatGuy123 [link] [comments]  ( 89 min )
    [D] Trouble finding optimal parameters ANN
    I am building a model on predicting credit card default and I have figured out the architecture - num layers, num neurons per layer and activation functions. I want to run gridsearchcv to find optimal epoches and batch size. The problem is that if I run the code twice it gives different optimal parameters both of them. I thought it was the cvs influencing each other, so I ran the first one and found the optimal params, then uploaded the data again from scratch (I am not using randomized test/train split) ran the same optimization problem same dataset - got different results. Setting seed has not changed anything. Help! submitted by /u/AnyJello605 [link] [comments]  ( 88 min )
    [D] Thoughts and Resources on GNN Methods that Alter Graph Structure?
    Was listening to Machine Learning Street Talk and kept coming across the idea that “the information to understand the problem is not in the data.” Got me thinking about graphs and how the given graph structure may not be “optimal” for the downstream task (ie. node classification, link prediction). Does anyone have thoughts or resources for exploring the idea that the input graph can be rewired/discarded(?) to do better classification? I’ve seen a couple papers I liked here (Rewiring w Ricci Flows, Graph-MLP). Interested in this idea because it seems like some tasks explicitly require structural information to be encoded somehow (graph transformer) and also the features for many datasets aren’t very rich in information so wondering if good classification even possible with just features. submitted by /u/rivew [link] [comments]  ( 89 min )
    [P] Open-sourcing Stable Diffusion generation scripts
    Code: https://github.com/gordicaleksa/stable_diffusion_playground submitted by /u/gordicaleksa [link] [comments]  ( 88 min )
    [N] Stable Diffusion Image Variations released, allows you to do variations like DALLE-2
    Justin Pinkey has been experimenting with fine-tuning stable diffusion to use CLIP image embedding as the conditioning instead of a text prompt. This allows you do do the dalle2 like "image variations" tweet: https://twitter.com/Buntworthy/status/1566744186153484288 github: https://github.com/justinpinkney/stable-diffusion demo made with gradio: https://github.com/gradio-app/gradio https://preview.redd.it/9jbdu4n1g2m91.jpg?width=3294&format=pjpg&auto=webp&s=716de69799bf14dcb2b84461f21832b679406e22 submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 89 min )
    [P] Appropriate Algorithm for Influencers Ranking
    I am working on ranking different social influencers based on a set of metrics. ​ Metrics collected: username categories (the niche the influencer is in) influencer_type followers follower grow, follower_growth_rate highlightReelCount, igtvVideoCount, postsCount avg_likes, avg_comments likes_comments_ratio (comments per 100 likes, use as in authentic indicator) engagement_rate authentic_engagement (the number of likes and comments that come from real people) post_per_week 1/post_like, 1/post_comment (total 12 latest posts) 1/igtv_likes, 1/igtv_comment (total 12 latest igtvs) ​ Here's how the data looks like: https://preview.redd.it/7fs6guro05m91.png?width=2771&format=png&auto=webp&s=fa9cca6472ab7014a94978625fbce01cdea27cf7 Objective: Rank the social influencers according to their influential power with the use of the metrics collected above. Influential power can be calculated from the metrics collected above. ​ There are a few ranking algorithms to choose from, which are: a) Compute the score for influential power with Multi-Criteria Decision Making (MCDM) and rank it with regression b) Create classification model and rank them through probability c) Compute the score for influential power with Multi-Criteria Decision Making (MCDM) and rank it with machine learning model like SVM, Decision Tree and Deep Neural Network d) Learning to rank e) Trending algorithm I would like to ask which algorithm from the above will be more suitable in this project and could you compare and provide the reasons for it? Any ideas will be much appreciated! submitted by /u/Same_Journalist_9445 [link] [comments]  ( 90 min )
    [P] Cozy Auto Texture - A Blender add-on that allows you to generate free textures with Stable Diffusion
    My team and I are creating the Blender add-on Cozy Auto Texture that uses the open source AI Stable Diffusion. You'll be able to create free textures right in Blender with a simple text description! Check it out: https://github.com/torrinworx/Cozy-Auto-Texture Here are some images I've generated so far: Prompt: High jewelry iridescent gold with age of empires 2, rts, aoe2, cavalry and horses inside a beautifully detailed pendant, crystal clear, see through, jewelry rendering canvas, 8k + UDH +photorealistic. Prompt: A picture of a cute bird in front of a computer sipping tea with its wings, ultra realistic. Prototype of the UI! So far you can download the necessary Python dependencies, as well as the Stable Diffusion AI weights all with the push of a single button when you install the add-on! YouTube - https://www.youtube.com/c/ThisCozyStudio Discord - https://discord.com/invite/UpZt5Un57t Our team plans on adding upscaling and the ability to generate other texture files like Normal maps, Displacement maps, Reflection, Gloss, etc. in the near future. Our goal is to make the most robust and feature packed texture generator. submitted by /u/T0rr1nw0rX [link] [comments]  ( 111 min )
    What is nonlinear positional embedding [D]
    I am reading a paper called "Representation Learning for Information Extraction from Form-like Documents ". In this they have mentioned about nonlinear positional embedding. I don't understand what is this. I do know about positional embedding though. Some snippets of the text : "Each neighbor relative position is embedded through a nonlinear positional embedding consisting of two ReLU-activated layers with dropout." Can someone explain this ? submitted by /u/fountainhop [link] [comments]  ( 89 min )
    [D] correct way to use feature selection
    Suppose I am using an external method ( outside of the algorithm) for feature selection say pearson correlation. What is the right way to put the features for testing ml algorithms. What I have done is - 1. Split data into train and test 2. Use pearson correlation ( since it's a regression problem) to select features. 3. Show 3 metrics - A. CV performance of the train set without feature ( using caret) B. CV performance of train set with features selected. I am.guessing this is extremely biased C. CV performance on test set for both My question is does it make sense to do B ? My sample size is very small submitted by /u/triary95 [link] [comments]  ( 89 min )
    [P] Can't find a web admin ui for RASA chatbot framework
    Hi guys, I can't seem to find a web admin user interface fro RASA.I need to give a client the means to edit utterances, intents, entities and possibly stories.Also a way to take a look at past conversations and bot analytics. (not posting links because the post gets banned) - Rasa-X is now enterprise only - Botfront is not maintained - RASA-UI latest commit was two years ago Do you happen to know an open source admin panel for RASA?I plan to use it and also contribute... submitted by /u/pieroit [link] [comments]  ( 108 min )
    [D] Tools for Showcasing generated images and their related images from training data
    I am training some generative models and need to showcase the generated images. 1) The images I will show when clicked should have related images from the dataset to show their similarity. Much like https://www.robots.ox.ac.uk/~vgg/software/vise/index.html 2) I also want to allow the users to input images and generate a sample using that. I know there are some tools like Gradio. Are there any other tools available for this purpose? submitted by /u/icelebratefestivus [link] [comments]  ( 89 min )
  • Open

    Looking for a good place to start when digging deep into reinforcement learning
    I have a fair background in computer vision and language modeling unsupervised learning and the supporting methodologies. I’m mostly looking for papers to understand methods involved in Sota reinforcement learning, underlying concepts and a resource to better understand the notation in papers whatsoever you find used. submitted by /u/Extra-most-best [link] [comments]  ( 87 min )
    Does openai-gym environment allow actions that are out of the action-space?
    I am new to RL and was messing around with openAI gym environments. And in a custom environment I created(based on gym) I am able to call env.step(action) with action value out of action-space. Shouldn't the environment disallow illegal actions like this? If not, how do we stop it? submitted by /u/goodbyeguruji [link] [comments]  ( 88 min )
    Is there a way to estimate transition probabilities when they are varying?
    Hi, I was wondering if someone could point out to resources where transition probabilities are estimated in cases taking into account the stochasticity in actions (i.e. the results from an action vary over time; say if an agent goes forward with a probability of 0.80 when asked to go forward over time, it changes to a case where the agent goes forward with a probability of 0.60 instead of 0.80). Thanks in advance! submitted by /u/E-Cockroach [link] [comments]  ( 87 min )
    SAC exploding losses
    I have a task on which I very consistently get this kind of training behavior from SAC (both v1 and v2) https://wandb.ai/tmrl/tmrl/runs/SAC_test_imgs_16 , basically both the critic and actor losses explode at some advanced point during training and the policy that was good at this point suddenly becomes extremely bad and learns to do nothing. Sometimes losses and policy recover, sometimes not. Would you have any idea where this may come from mathematically speaking? This happens with all kinds of deep models and observation spaces. It seems it can be alleviated by tweaking the actor and critic learning rates separately, but this is very unpredictible as this bevavior happens after several days of training in this task. Also this doesn't come from non-markovness toward the end of episodes unless I missed something, since I ignore terminal transitions when the termination signal comes from the episode duration. submitted by /u/yannbouteiller [link] [comments]  ( 88 min )
    Scaling output of Actor's forward function question.
    Hello guys! I'm pretty amateur on low levels of RL and I was wondering, for example in the author's official implementation of TD3, at Actor's forward function, which looks like this: ​ def forward(self, state): a = F.relu(self.l1(state)) a = F.relu(self.l2(a)) return self.max_action * torch.tanh(self.l3(a)) ​ the output is multiplied by self.max_action to get scaled correctly and it is returned, ​ shouldn't all transformations happen at the simulation time though? Apply the simulation step, gather results and then reverse transform, save unscaled actions in the experience buffer, so the backpropagation is done right? If our model forward functions tanh for example and we multiply it by 5, forward returns [-5, 5], when we backpropagate does it divide it by 5 and then backwards to the network? ​ I think I am missing something, any help is appreciated! submitted by /u/South_Book_5625 [link] [comments]  ( 87 min )
    Why do agents in a cooperative setting (Dec-POMDP) receive the same reward?
    Hi everyone, why do cooperative agents acting within the Dec-POMDP framework receive the same reward? In other words why do we focus finding the optimal joint policy and not individual optimal policies? submitted by /u/souhaielbensalem [link] [comments]  ( 105 min )
    Magnitude of a sparse reward
    Hello, In my RL problem, the agent moves around in the environment to reach a certain goal. Upon reaching the goal, the agent gets a sparse reward. I tried two different values for this reward (+1000 and +10). The reward is also affected by the number of steps the agent takes (if the agent takes 5 steps to reach the goal, the reward would be 1000-5=995 or 10-5=5). For some reason, the agent learns better with a +10 reward. Any idea why that would be the case? Thank you :D submitted by /u/AhmedNizam_ [link] [comments]  ( 99 min )
  • Open

    Deep Dive into Reasons to Choose Focal Loss over Cross-Entropy[Article]
    submitted by /u/JoshuaDaD [link] [comments]  ( 86 min )
    Investing time and effort on Neural networks architecture design to create more advance and smart products will make all our lives better. Teamshake is here to provide faster and intuitive way to do so: https://www.teamshake.app #neuralnetworks #pytorch #ai #Teamshake
    submitted by /u/adiPel [link] [comments]  ( 87 min )
    I tried to recreate a comic book using Midjourney and Alan Moore’s script
    submitted by /u/RubiksCodeNMZ [link] [comments]  ( 87 min )
    Researchers created a Novel Framework called ‘FedD3’ for Federated Learning in Resource-Constrained Edge Environments via Decentralized Dataset Distillation
    submitted by /u/ai-lover [link] [comments]  ( 88 min )
  • Open

    Learn Blockchain Technology to Build A Futuristic Career
    In the past few years, blockchain technology has turned out to be a phenomenal technology. The novel attributes of blockchain technology are making business processes more efficient, more secure, and more transparent and are taking the industry toward a new ‘decentralized’ direction. The post Learn Blockchain Technology to Build A Futuristic Career appeared first on Data Science Central.  ( 20 min )
    Imaging the Web With Scalable Vector Graphics
    SVG is an example of one of the more powerful technologies on the web that is likely completely invisible to you, whether you're simply browsing on the web or you're a web developer wanting to take advantage of your full toolset. The post Imaging the Web With Scalable Vector Graphics appeared first on Data Science Central.  ( 26 min )
  • Open

    My family's unlikely homeschooling journey
    My husband Jeremy and I never intended to homeschool, and yet we have now, unexpectedly, committed to homeschooling long-term. Prior to the pandemic, we both worked full-time in careers that we loved and found meaningful, and we sent our daughter to a full-day Montessori school. Although I struggled with significant health issues, I felt unbelievably lucky and fulfilled in both my family life and my professional life. The pandemic upended my careful balance. Every family is different, with different needs, circumstances, and constraints, and what works for one may not work for others. My intention here is primarily to share the journey of my own (very privileged) family. Our unplanned introduction to homeschooling For the first year of the pandemic, most schools in California, where …  ( 7 min )
  • Open

    Deep Residual Shrinkage Networks for EMG-based Gesture Identification. (arXiv:2202.02984v3 [eess.SP] UPDATED)
    This work introduces a method for high-accuracy EMG based gesture identification. A newly developed deep learning method, namely, deep residual shrinkage network is applied to perform gesture identification. Based on the feature of EMG signal resulting from gestures, optimizations are made to improve the identification accuracy. Finally, three different algorithms are applied to compare the accuracy of EMG signal recognition with that of DRSN. The result shows that DRSN excel traditional neural networks in terms of EMG recognition accuracy. This paper provides a reliable way to classify EMG signals, as well as exploring possible applications of DRSN.  ( 2 min )
    Semi-WTC: A Practical Semi-supervised Framework for Attack Categorization through Weight-Task Consistency. (arXiv:2205.09669v3 [cs.CR] UPDATED)
    Supervised learning has been widely used for attack categorization, requiring high-quality data and labels. However, the data is often imbalanced and it is difficult to obtain sufficient annotations. Moreover, supervised models are subject to real-world deployment issues, such as defending against unseen artificial attacks. To tackle the challenges, we propose a semi-supervised fine-grained attack categorization framework consisting of an encoder and a two-branch structure and this framework can be generalized to different supervised models. The multilayer perceptron with residual connection is used as the encoder to extract features and reduce the complexity. The Recurrent Prototype Module (RPM) is proposed to train the encoder effectively in a semi-supervised manner. To alleviate the data imbalance problem, we introduce the Weight-Task Consistency (WTC) into the iterative process of RPM by assigning larger weights to classes with fewer samples in the loss function. In addition, to cope with new attacks in real-world deployment, we propose an Active Adaption Resampling (AAR) method, which can better discover the distribution of unseen sample data and adapt the parameters of encoder. Experimental results show that our model outperforms the state-of-the-art semi-supervised attack detection methods with a 3% improvement in classification accuracy and a 90% reduction in training time.  ( 3 min )
    Predicting the Stability of Hierarchical Triple Systems with Convolutional Neural Networks. (arXiv:2206.12402v2 [astro-ph.SR] UPDATED)
    Understanding the long-term evolution of hierarchical triple systems is challenging due to its inherent chaotic nature, and it requires computationally expensive simulations. Here we propose a convolutional neural network model to predict the stability of hierarchical triples by looking at their evolution during the first $5 \times 10^5$ inner binary orbits. We employ the regularized few-body code TSUNAMI to simulate $5\times 10^6$ hierarchical triples, from which we generate a large training and test dataset. We develop twelve different network configurations that use different combinations of the triples' orbital elements and compare their performances. Our best model uses 6 time-series, namely, the semimajor axes ratio, the inner and outer eccentricities, the mutual inclination and the arguments of pericenter. This model achieves an area under the curve of over $95\%$ and informs of the relevant parameters to study triple systems stability. All trained models are made publicly available, allowing to predict the stability of hierarchical triple systems $200$ times faster than pure $N$-body methods.  ( 3 min )
    Refining neural network predictions using background knowledge. (arXiv:2206.04976v2 [cs.AI] UPDATED)
    Recent work has shown logical background knowledge can be used in learning systems to compensate for a lack of labeled training data. Many methods work by creating a loss function that encodes this knowledge. However, often the logic is discarded after training, even if it is still useful at test time. Instead, we ensure neural network predictions satisfy the knowledge by refining the predictions with an extra computation step. We introduce differentiable refinement functions that find a corrected prediction close to the original prediction. We study how to effectively and efficiently compute these refinement functions. Using a new algorithm called Iterative Local Refinement (ILR), we combine refinement functions to find refined predictions for logical formulas of any complexity. ILR finds refinements on complex SAT formulas in significantly fewer iterations and frequently finds solutions where gradient descent can not. Finally, ILR produces competitive results in the MNIST addition task.  ( 2 min )
    Beyond Rewards: a Hierarchical Perspective on Offline Multiagent Behavioral Analysis. (arXiv:2206.09046v2 [cs.LG] UPDATED)
    Each year, expert-level performance is attained in increasingly-complex multiagent domains, notable examples including Go, Poker, and StarCraft II. This rapid progression is accompanied by a commensurate need to better understand how such agents attain this performance, to enable their safe deployment, identify limitations, and reveal potential means of improving them. In this paper we take a step back from performance-focused multiagent learning, and instead turn our attention towards agent behavior analysis. We introduce a model-agnostic method for discovery of behavior clusters in multiagent domains, using variational inference to learn a hierarchy of behaviors at the joint and local agent levels. Our framework makes no assumption about agents' underlying learning algorithms, does not require access to their latent states or policies, and is trained using only offline observational data. We illustrate the effectiveness of our method for enabling the coupled understanding of behaviors at the joint and local agent level, detection of behavior changepoints throughout training, discovery of core behavioral concepts, demonstrate the approach's scalability to a high-dimensional multiagent MuJoCo control domain, and also illustrate that the approach can disentangle previously-trained policies in OpenAI's hide-and-seek domain.  ( 2 min )
    Construction de variables a l'aide de classifieurs comme aide a la regression. (arXiv:2112.03703v2 [cs.LG] UPDATED)
    This paper proposes a method for the automatic creation of variables (in the case of regression) that complement the information contained in the initial input vector. The method works as a pre-processing step in which the continuous values of the variable to be regressed are discretized into a set of intervals which are then used to define value thresholds. Then classifiers are trained to predict whether the value to be regressed is less than or equal to each of these thresholds. The different outputs of the classifiers are then concatenated in the form of an additional vector of variables that enriches the initial vector of the regression problem. The implemented system can thus be considered as a generic pre-processing tool. We tested the proposed enrichment method with 5 types of regressors and evaluated it in 33 regression datasets. Our experimental results confirm the interest of the approach.  ( 2 min )
    SoK: Privacy Preserving Machine Learning using Functional Encryption: Opportunities and Challenges. (arXiv:2204.05136v2 [cs.CR] UPDATED)
    With the advent of functional encryption, new possibilities for computation on encrypted data have arisen. Functional Encryption enables data owners to grant third-party access to perform specified computations without disclosing their inputs. It also provides computation results in plain, unlike Fully Homomorphic Encryption. The ubiquitousness of machine learning has led to the collection of massive private data in the cloud computing environment. This raises potential privacy issues and the need for more private and secure computing solutions. Numerous efforts have been made in privacy-preserving machine learning (PPML) to address security and privacy concerns. There are approaches based on fully homomorphic encryption (FHE), secure multiparty computation (SMC), and, more recently, functional encryption (FE). However, FE-based PPML is still in its infancy and has not yet gotten much attention compared to FHE-based PPML approaches. In this paper, we provide a systematization of PPML works based on FE summarizing state-of-the-art in the literature. We focus on Inner-product-FE and Quadratic-FE-based machine learning models for the PPML applications. We analyze the performance and usability of the available FE libraries and their applications to PPML. We also discuss potential directions for FE-based PPML approaches. To the best of our knowledge, this is the first work to systematize FE-based PPML approaches.  ( 3 min )
    Training Differentially Private Models with Secure Multiparty Computation. (arXiv:2202.02625v3 [cs.CR] UPDATED)
    We address the problem of learning a machine learning model from training data that originates at multiple data owners while providing formal privacy guarantees regarding the protection of each owner's data. Existing solutions based on Differential Privacy (DP) achieve this at the cost of a drop in accuracy. Solutions based on Secure Multiparty Computation (MPC) do not incur such accuracy loss but leak information when the trained model is made publicly available. We propose an MPC solution for training DP models. Our solution relies on an MPC protocol for model training, and an MPC protocol for perturbing the trained model coefficients with Laplace noise in a privacy-preserving manner. The resulting MPC+DP approach achieves higher accuracy than a pure DP approach while providing the same formal privacy guarantees. Our work obtained first place in the iDASH2021 Track III competition on confidential computing for secure genome analysis.  ( 2 min )
    Goal-Conditioned Reinforcement Learning: Problems and Solutions. (arXiv:2201.08299v3 [cs.AI] UPDATED)
    Goal-conditioned reinforcement learning (GCRL), related to a set of complex RL problems, trains an agent to achieve different goals under particular scenarios. Compared to the standard RL solutions that learn a policy solely depending on the states or observations, GCRL additionally requires the agent to make decisions according to different goals. In this survey, we provide a comprehensive overview of the challenges and algorithms for GCRL. Firstly, we answer what the basic problems are studied in this field. Then, we explain how goals are represented and present how existing solutions are designed from different points of view. Finally, we make the conclusion and discuss potential future prospects that recent researches focus on.  ( 2 min )
    A cross-domain recommender system using deep coupled autoencoders. (arXiv:2112.07617v4 [cs.IR] UPDATED)
    Long-standing data sparsity and cold-start constitute thorny and perplexing problems for the recommendation systems. Cross-domain recommendation as a domain adaptation framework has been utilized to efficiently address these challenging issues, by exploiting information from multiple domains. In this study, an item-level relevance cross-domain recommendation task is explored, where two related domains, that is, the source and the target domain contain common items without sharing sensitive information regarding the users' behavior, and thus avoiding the leak of user privacy. In light of this scenario, two novel coupled autoencoder-based deep learning methods are proposed for cross-domain recommendation. The first method aims to simultaneously learn a pair of autoencoders in order to reveal the intrinsic representations of the items in the source and target domains, along with a coupled mapping function to model the non-linear relationships between these representations, thus transferring beneficial information from the source to the target domain. The second method is derived based on a new joint regularized optimization problem, which employs two autoencoders to generate in a deep and non-linear manner the user and item-latent factors, while at the same time a data-driven function is learnt to map the item-latent factors across domains. Extensive numerical experiments on two publicly available benchmark datasets are conducted illustrating the superior performance of our proposed methods compared to several state-of-the-art cross-domain recommendation frameworks.  ( 3 min )
    Improving Sequential Query Recommendation with Immediate User Feedback. (arXiv:2205.06297v2 [cs.IR] UPDATED)
    We propose an algorithm for next query recommendation in interactive data exploration settings, like knowledge discovery for information gathering. The state-of-the-art query recommendation algorithms are based on sequence-to-sequence learning approaches that exploit historical interaction data. Due to the supervision involved in the learning process, such approaches fail to adapt to immediate user feedback. We propose to augment the transformer-based causal language models for query recommendations to adapt to the immediate user feedback using multi-armed bandit (MAB) framework. We conduct a large-scale experimental study using log files from a popular online literature discovery service and demonstrate that our algorithm improves the per-round regret substantially, with respect to the state-of-the-art transformer-based query recommendation models, which do not make use of immediate user feedback. Our data model and source code are available at https://github.com/shampp/exp3_ss  ( 2 min )
    Remember and Forget Experience Replay for Multi-Agent Reinforcement Learning. (arXiv:2203.13319v3 [cs.LG] UPDATED)
    We present the extension of the Remember and Forget for Experience Replay (ReF-ER) algorithm to Multi-Agent Reinforcement Learning (MARL). ReF-ER was shown to outperform state of the art algorithms for continuous control in problems ranging from the OpenAI Gym to complex fluid flows. In MARL, the dependencies between the agents are included in the state-value estimator and the environment dynamics are modeled via the importance weights used by ReF-ER. In collaborative environments, we find the best performance when the value is estimated using individual rewards and we ignore the effects of other actions on the transition map. We benchmark the performance of ReF-ER MARL on the Stanford Intelligent Systems Laboratory (SISL) environments. We find that employing a single feed-forward neural network for the policy and the value function in ReF-ER MARL, outperforms state of the art algorithms that rely on complex neural network architectures.  ( 2 min )
    Unsupervised Joint Image Transfer and Uncertainty Quantification Using Patch Invariant Networks. (arXiv:2207.04325v2 [cs.CV] UPDATED)
    Unsupervised image transfer enables intra- and inter-modality image translation in applications where a large amount of paired training data is not abundant. To ensure a structure-preserving mapping from the input to the target domain, existing methods for unpaired image transfer are commonly based on cycle-consistency, causing additional computational resources and instability due to the learning of an inverse mapping. This paper presents a novel method for uni-directional domain mapping that does not rely on any paired training data. A proper transfer is achieved by using a GAN architecture and a novel generator loss based on patch invariance. To be more specific, the generator outputs are evaluated and compared at different scales, also leading to an increased focus on high-frequency details as well as an implicit data augmentation. This novel patch loss also offers the possibility to accurately predict aleatoric uncertainty by modeling an input-dependent scale map for the patch residuals. The proposed method is comprehensively evaluated on three well-established medical databases. As compared to four state-of-the-art methods, we observe significantly higher accuracy on these datasets, indicating great potential of the proposed method for unpaired image transfer with uncertainty taken into account. Implementation of the proposed framework is released here: \url{https://github.com/anger-man/unsupervised-image-transfer-and-uq}.  ( 3 min )
    Echocardiographic Image Quality Assessment Using Deep Neural Networks. (arXiv:2209.00959v1 [eess.IV])
    Echocardiography image quality assessment is not a trivial issue in transthoracic examination. As the in vivo examination of heart structures gained prominence in cardiac diagnosis, it has been affirmed that accurate diagnosis of the left ventricle functions is hugely dependent on the quality of echo images. Up till now, visual assessment of echo images is highly subjective and requires specific definition under clinical pathologies. While poor-quality images impair quantifications and diagnosis, the inherent variations in echocardiographic image quality standards indicates the complexity faced among different observers and provides apparent evidence for incoherent assessment under clinical trials, especially with less experienced cardiologists. In this research, our aim was to analyse and define specific quality attributes mostly discussed by experts and present a fully trained convolutional neural network model for assessing such quality features objectively.  ( 2 min )
    Evaluating Short-Term Forecasting of Multiple Time Series in IoT Environments. (arXiv:2206.07784v2 [cs.LG] UPDATED)
    Modern Internet of Things (IoT) environments are monitored via a large number of IoT enabled sensing devices, with the data acquisition and processing infrastructure setting restrictions in terms of computational power and energy resources. To alleviate this issue, sensors are often configured to operate at relatively low sampling frequencies, yielding a reduced set of observations. Nevertheless, this can hamper dramatically subsequent decision-making, such as forecasting. To address this problem, in this work we evaluate short-term forecasting in highly underdetermined cases, i.e., the number of sensor streams is much higher than the number of observations. Several statistical, machine learning and neural network-based models are thoroughly examined with respect to the resulting forecasting accuracy on five different real-world datasets. The focus is given on a unified experimental protocol especially designed for short-term prediction of multiple time series at the IoT edge. The proposed framework can be considered as an important step towards establishing a solid forecasting strategy in resource constrained IoT applications.  ( 2 min )
    Tree density estimation. (arXiv:2111.11971v4 [math.ST] UPDATED)
    We study the problem of estimating the density $f(\boldsymbol x)$ of a random vector ${\boldsymbol X}$ in $\mathbb R^d$. For a spanning tree $T$ defined on the vertex set $\{1,\dots ,d\}$, the tree density $f_{T}$ is a product of bivariate conditional densities. An optimal spanning tree minimizes the Kullback-Leibler divergence between $f$ and $f_{T}$. From i.i.d. data we identify an optimal tree $T^*$ and efficiently construct a tree density estimate $f_n$ such that, without any regularity conditions on the density $f$, one has $\lim_{n\to \infty} \int |f_n(\boldsymbol x)-f_{T^*}(\boldsymbol x)|d\boldsymbol x=0$ a.s. For Lipschitz $f$ with bounded support, $\mathbb E \left\{ \int |f_n(\boldsymbol x)-f_{T^*}(\boldsymbol x)|d\boldsymbol x\right\}=O\big(n^{-1/4}\big)$, a dimension-free rate.
    Big Data is not the New Oil: Common Misconceptions about Population Data. (arXiv:2112.10912v3 [cs.DB] UPDATED)
    Databases covering all individuals of a population are increasingly used for research and decision-making. The massive size of such databases is often mistaken as a guarantee for valid inferences. However, population data have characteristics that make them challenging to use. Various assumptions on population coverage and data quality are commonly made, including how such data were captured and what types of processing have been applied to them. Furthermore, the full potential of population data can often only be unlocked when such data are linked to other databases. Record linkage often implies subtle technical problems, which are easily missed. We discuss a diverse range of misconceptions relevant for anybody capturing, processing, linking, or analysing population data. Remarkably many of these misconceptions are due to the social nature of data collections and are therefore missed by purely technical accounts of data processing. Many of these misconceptions are also not well documented in scientific publications. We conclude with a set of recommendations for using population data.
    Reconstructing editable prismatic CAD from rounded voxel models. (arXiv:2209.01161v1 [cs.CV])
    Reverse Engineering a CAD shape from other representations is an important geometric processing step for many downstream applications. In this work, we introduce a novel neural network architecture to solve this challenging task and approximate a smoothed signed distance function with an editable, constrained, prismatic CAD model. During training, our method reconstructs the input geometry in the voxel space by decomposing the shape into a series of 2D profile images and 1D envelope functions. These can then be recombined in a differentiable way allowing a geometric loss function to be defined. During inference, we obtain the CAD data by first searching a database of 2D constrained sketches to find curves which approximate the profile images, then extrude them and use Boolean operations to build the final CAD model. Our method approximates the target shape more closely than other methods and outputs highly editable constrained parametric sketches which are compatible with existing CAD software.
    Algorithms for Discrepancy, Matchings, and Approximations: Fast, Simple, and Practical. (arXiv:2209.01147v1 [cs.DS])
    We study one of the key tools in data approximation and optimization: low-discrepancy colorings. Formally, given a finite set system $(X,\mathcal S)$, the \emph{discrepancy} of a two-coloring $\chi:X\to\{-1,1\}$ is defined as $\max_{S \in \mathcal S}|{\chi(S)}|$, where $\chi(S)=\sum\limits_{x \in S}\chi(x)$. We propose a randomized algorithm which, for any $d>0$ and $(X,\mathcal S)$ with dual shatter function $\pi^*(k)=O(k^d)$, returns a coloring with expected discrepancy $O\left({\sqrt{|X|^{1-1/d}\log|\mathcal S|}}\right)$ (this bound is tight) in time $\tilde O\left({|\mathcal S|\cdot|X|^{1/d}+|X|^{2+1/d}}\right)$, improving upon the previous-best time of $O\left(|\mathcal S|\cdot|X|^3\right)$ by at least a factor of $|X|^{2-1/d}$ when $|\mathcal S|\geq|X|$. This setup includes many geometric classes, families of bounded dual VC-dimension, and others. As an immediate consequence, we obtain an improved algorithm to construct $\varepsilon$-approximations of sub-quadratic size. Our method uses primal-dual reweighing with an improved analysis of randomly updated weights and exploits the structural properties of the set system via matchings with low crossing number -- a fundamental structure in computational geometry. In particular, we get the same $|X|^{2-1/d}$ factor speed-up on the construction time of matchings with crossing number $O\left({|X|^{1-1/d}}\right)$, which is the first improvement since the 1980s. The proposed algorithms are very simple, which makes it possible, for the first time, to compute colorings with near-optimal discrepancies and near-optimal sized approximations for abstract and geometric set systems in dimensions higher than $2$.
    Back-to-Bones: Rediscovering the Role of Backbones in Domain Generalization. (arXiv:2209.01121v1 [cs.CV])
    Domain Generalization (DG) studies the capability of a deep learning model to generalize to out-of-training distributions. In the last decade, literature has been massively filled with a collection of training methodologies that claim to obtain more abstract and robust data representations to tackle domain shifts. Recent research has provided a reproducible benchmark for DG, pointing out the effectiveness of naive empirical risk minimization (ERM) over existing algorithms. Nevertheless, researchers persist in using the same outdated feature extractors, and no attention has been given to the effects of different backbones yet. In this paper, we start back to backbones proposing a comprehensive analysis of their intrinsic generalization capabilities, so far ignored by the research community. We evaluate a wide variety of feature extractors, from standard residual solutions to transformer-based architectures, finding an evident linear correlation between large-scale single-domain classification accuracy and DG capability. Our extensive experimentation shows that by adopting competitive backbones in conjunction with effective data augmentation, plain ERM outperforms recent DG solutions and achieves state-of-the-art accuracy. Moreover, our additional qualitative studies reveal that novel backbones give more similar representations to same-class samples, separating different domains in the feature space. This boost in generalization capabilities leaves marginal room for DG algorithms and suggests a new paradigm for investigating the problem, placing backbones in the spotlight and encouraging the development of consistent algorithms on top of them.
    Property inference attack; Graph neural networks; Privacy attacks and defense; Trustworthy machine learning. (arXiv:2209.01100v1 [cs.LG])
    With the fast adoption of machine learning (ML) techniques, sharing of ML models is becoming popular. However, ML models are vulnerable to privacy attacks that leak information about the training data. In this work, we focus on a particular type of privacy attacks named property inference attack (PIA) which infers the sensitive properties of the training data through the access to the target ML model. In particular, we consider Graph Neural Networks (GNNs) as the target model, and distribution of particular groups of nodes and links in the training graph as the target property. While the existing work has investigated PIAs that target at graph-level properties, no prior works have studied the inference of node and link properties at group level yet. In this work, we perform the first systematic study of group property inference attacks (GPIA) against GNNs. First, we consider a taxonomy of threat models under both black-box and white-box settings with various types of adversary knowledge, and design six different attacks for these settings. We evaluate the effectiveness of these attacks through extensive experiments on three representative GNN models and three real-world graphs. Our results demonstrate the effectiveness of these attacks whose accuracy outperforms the baseline approaches. Second, we analyze the underlying factors that contribute to GPIA's success, and show that the target model trained on the graphs with or without the target property represents some dissimilarity in model parameters and/or model outputs, which enables the adversary to infer the existence of the property. Further, we design a set of defense mechanisms against the GPIA attacks, and demonstrate that these mechanisms can reduce attack accuracy effectively with small loss on GNN model accuracy.
    Future Gradient Descent for Adapting the Temporal Shifting Data Distribution in Online Recommendation Systems. (arXiv:2209.01143v1 [cs.LG])
    One of the key challenges of learning an online recommendation model is the temporal domain shift, which causes the mismatch between the training and testing data distribution and hence domain generalization error. To overcome, we propose to learn a meta future gradient generator that forecasts the gradient information of the future data distribution for training so that the recommendation model can be trained as if we were able to look ahead at the future of its deployment. Compared with Batch Update, a widely used paradigm, our theory suggests that the proposed algorithm achieves smaller temporal domain generalization error measured by a gradient variation term in a local regret. We demonstrate the empirical advantage by comparing with various representative baselines.
    MaxWeight With Discounted UCB: A Provably Stable Scheduling Policy for Nonstationary Multi-Server Systems With Unknown Statistics. (arXiv:2209.01126v1 [cs.LG])
    Multi-server queueing systems are widely used models for job scheduling in machine learning, wireless networks, and crowdsourcing. This paper considers a multi-server system with multiple servers and multiple types of jobs. The system maintains a separate queue for each type of jobs. For each time slot, each available server picks a job from a queue and then serves the job until it is complete. The arrival rates of the queues and the mean service times are unknown and even nonstationary. We propose the MaxWeight with discounted upper confidence bound (UCB) algorithm, which simultaneously learns the statistics and schedules jobs to servers. We prove that the proposed algorithm can stabilize the queues when the arrival rates are strictly within the service capacity region. Specifically, we prove that the queue lengths are bounded in the mean under the assumption that the mean service times change relatively slowly over time and the arrival rates are bounded away from the capacity region by a constant whose value depends on the discount factor used in the discounted UCB. Simulation results confirm that the proposed algorithm can stabilize the queues and that it outperforms MaxWeight with empirical mean and MaxWeight with discounted empirical mean. The proposed algorithm is also better than MaxWeight with UCB in the nonstationary setting.
    Scalable Model-based Policy Optimization for Decentralized Networked Systems. (arXiv:2207.06559v2 [cs.LG] UPDATED)
    Reinforcement learning algorithms require a large amount of samples; this often limits their real-world applications on even simple tasks. Such a challenge is more outstanding in multi-agent tasks, as each step of operation is more costly requiring communications or shifting or resources. This work aims to improve data efficiency of multi-agent control by model-based learning. We consider networked systems where agents are cooperative and communicate only locally with their neighbors, and propose the decentralized model-based policy optimization framework (DMPO). In our method, each agent learns a dynamic model to predict future states and broadcast their predictions by communication, and then the policies are trained under the model rollouts. To alleviate the bias of model-generated data, we restrain the model usage for generating myopic rollouts, thus reducing the compounding error of model generation. To pertain the independence of policy update, we introduce extended value function and theoretically prove that the resulting policy gradient is a close approximation to true policy gradients. We evaluate our algorithm on several benchmarks for intelligent transportation systems, which are connected autonomous vehicle control tasks (Flow and CACC) and adaptive traffic signal control (ATSC). Empirically results show that our method achieves superior data efficiency and matches the performance of model-free methods using true models.
    Reliable Representations Make A Stronger Defender: Unsupervised Structure Refinement for Robust GNN. (arXiv:2207.00012v2 [cs.LG] UPDATED)
    Benefiting from the message passing mechanism, Graph Neural Networks (GNNs) have been successful on flourish tasks over graph data. However, recent studies have shown that attackers can catastrophically degrade the performance of GNNs by maliciously modifying the graph structure. A straightforward solution to remedy this issue is to model the edge weights by learning a metric function between pairwise representations of two end nodes, which attempts to assign low weights to adversarial edges. The existing methods use either raw features or representations learned by supervised GNNs to model the edge weights. However, both strategies are faced with some immediate problems: raw features cannot represent various properties of nodes (e.g., structure information), and representations learned by supervised GNN may suffer from the poor performance of the classifier on the poisoned graph. We need representations that carry both feature information and as mush correct structure information as possible and are insensitive to structural perturbations. To this end, we propose an unsupervised pipeline, named STABLE, to optimize the graph structure. Finally, we input the well-refined graph into a downstream classifier. For this part, we design an advanced GCN that significantly enhances the robustness of vanilla GCN without increasing the time complexity. Extensive experiments on four real-world graph benchmarks demonstrate that STABLE outperforms the state-of-the-art methods and successfully defends against various attacks.
    Can an ML model plainly learn planar layouts?. (arXiv:2209.01075v1 [cs.CG])
    Planar graph drawings tend to be aesthetically pleasing. In this poster we explore a Neural Network's capability of learning various planar graph classes. Additionally, we also investigate the effectiveness of the model in generalizing beyond planarity. We find that the model can outperform conventional techniques for certain graph classes. The model, however, appears to be more susceptible to randomness in the data, and seems to be less robust than expected.
    Inference and dynamic decision-making for deteriorating systems with probabilistic dependencies through Bayesian networks and deep reinforcement learning. (arXiv:2209.01092v1 [cs.AI])
    In the context of modern environmental and societal concerns, there is an increasing demand for methods able to identify management strategies for civil engineering systems, minimizing structural failure risks while optimally planning inspection and maintenance (I&M) processes. Most available methods simplify the I&M decision problem to the component level due to the computational complexity associated with global optimization methodologies under joint system-level state descriptions. In this paper, we propose an efficient algorithmic framework for inference and decision-making under uncertainty for engineering systems exposed to deteriorating environments, providing optimal management strategies directly at the system level. In our approach, the decision problem is formulated as a factored partially observable Markov decision process, whose dynamics are encoded in Bayesian network conditional structures. The methodology can handle environments under equal or general, unequal deterioration correlations among components, through Gaussian hierarchical structures and dynamic Bayesian networks. In terms of policy optimization, we adopt a deep decentralized multi-agent actor-critic (DDMAC) reinforcement learning approach, in which the policies are approximated by actor neural networks guided by a critic network. By including deterioration dependence in the simulated environment, and by formulating the cost model at the system level, DDMAC policies intrinsically consider the underlying system-effects. This is demonstrated through numerical experiments conducted for both a 9-out-of-10 system and a steel frame under fatigue deterioration. Results demonstrate that DDMAC policies offer substantial benefits when compared to state-of-the-art heuristic approaches. The inherent consideration of system-effects by DDMAC strategies is also interpreted based on the learned policies.
    Black-box optimization for integer-variable problems using Ising machines and factorization machines. (arXiv:2209.01016v1 [cs.LG])
    Black-box optimization has potential in numerous applications such as hyperparameter optimization in machine learning and optimization in design of experiments. Ising machines are useful for binary optimization problems because variables can be represented by a single binary variable of Ising machines. However, conventional approaches using an Ising machine cannot handle black-box optimization problems with non-binary values. To overcome this limitation, we propose an approach for integer-variable black-box optimization problems by using Ising/annealing machines and factorization machines in cooperation with three different integer-encoding methods. The performance of our approach is numerically evaluated with different encoding methods using a simple problem of calculating the energy of the hydrogen molecule in the most stable state. The proposed approach can calculate the energy using any of the integer-encoding methods. However, one-hot encoding is useful for problems with a small size.
    Classifying with Uncertain Data Envelopment Analysis. (arXiv:2209.01052v1 [math.OC])
    Classifications organize entities into categories that identify similarities within a category and discern dissimilarities among categories, and they powerfully classify information in support of analysis. We propose a new classification scheme premised on the reality of imperfect data. Our computational model uses uncertain data envelopment analysis to define a classification's proximity to equitable efficiency, which is an aggregate measure of intra-similarity within a classification's categories. Our classification process has two overriding computational challenges, those being a loss of convexity and a combinatorially explosive search space. We overcome the first by establishing lower and upper bounds on the proximity value, and then by searching this range with a first-order algorithm. We overcome the second by adapting the p-median problem to initiate our exploration, and by then employing an iterative neighborhood search to finalize a classification. We conclude by classifying the thirty stocks in the Dow Jones Industrial average into performant tiers and by classifying prostate treatments into clinically effectual categories.
    When Bioprocess Engineering Meets Machine Learning: A Survey from the Perspective of Automated Bioprocess Development. (arXiv:2209.01083v1 [cs.LG])
    Machine learning (ML) has significantly contributed to the development of bioprocess engineering, but its application is still limited, hampering the enormous potential for bioprocess automation. ML for model building automation can be seen as a way of introducing another level of abstraction to focus expert humans in the most cognitive tasks of bioprocess development. First, probabilistic programming is used for the autonomous building of predictive models. Second, machine learning automatically assesses alternative decisions by planning experiments to test hypotheses and conducting investigations to gather informative data that focus on model selection based on the uncertainty of model predictions. This review provides a comprehensive overview of ML-based automation in bioprocess development. On the one hand, the biotech and bioengineering community should be aware of the potential and, most importantly, the limitation of existing ML solutions for their application in biotechnology and biopharma. On the other hand, it is essential to identify the missing links to enable the easy implementation of ML and Artificial Intelligence (AI) solutions in valuable solutions for the bio-community. We summarize recent ML implementation across several important subfields of bioprocess systems and raise two crucial challenges remaining the bottleneck of bioprocess automation and reducing uncertainty in biotechnology development. There is no one-fits-all procedure; however, this review should help identify the potential automation combining biotechnology and ML domains.
    Neighborhood-aware Scalable Temporal Network Representation Learning. (arXiv:2209.01084v1 [cs.LG])
    Temporal networks have been widely used to model real-world complex systems such as financial systems and e-commerce systems. In a temporal network, the joint neighborhood of a set of nodes often provides crucial structural information on predicting whether they may interact at a certain time. However, recent representation learning methods for temporal networks often fail to extract such information or depend on extremely time-consuming feature construction approaches. To address the issue, this work proposes Neighborhood-Aware Temporal network model (NAT). For each node in the network, NAT abandons the commonly-used one-single-vector-based representation while adopting a novel dictionary-type neighborhood representation. Such a dictionary representation records a down-sampled set of the neighboring nodes as keys, and allows fast construction of structural features for a joint neighborhood of multiple nodes. We also design dedicated data structure termed N-cache to support parallel access and update of those dictionary representations on GPUs. NAT gets evaluated over seven real-world large-scale temporal networks. NAT not only outperforms all cutting-edge baselines by averaged 5.9% and 6.0% in transductive and inductive link prediction accuracy, respectively, but also keeps scalable by achieving a speed-up of 4.1-76.7 against the baselines that adopts joint structural features and achieves a speed-up of 1.6-4.0 against the baselines that cannot adopt those features. The link to the code: https://github.com/Graph-COM/Neighborhood-Aware-Temporal-Network.
    Subject Membership Inference Attacks in Federated Learning. (arXiv:2206.03317v2 [cs.LG] UPDATED)
    Privacy attacks on Machine Learning (ML) models often focus on inferring the existence of particular data points in the training data. However, what the adversary really wants to know is if a particular \emph{individual}'s (\emph{subject}'s) data was included during training. In such scenarios, the adversary is more likely to have access to the distribution of a particular subject, than actual records. Furthermore, in settings like cross-silo Federated Learning (FL), a subject's data can be embodied by multiple data records that are spread across multiple organizations. Nearly all of the existing private FL literature is dedicated to studying privacy at two granularities -- item-level (individual data records), and user-level (participating user in the federation), neither of which apply to data subjects in cross-silo FL. This insight motivates us to shift our attention from the privacy of data records to the privacy of \emph{data subjects}, also known as subject-level privacy. We propose two black-box attacks for \emph{subject membership inference}, of which one assumes access to a model after each training round. Using these attacks, we estimate subject membership inference risk on real-world data for single-party models as well as FL scenarios. We find our attacks to be extremely potent, even without access to exact training records, and using the knowledge of membership for a handful of subjects. To better understand the various factors that may influence subject privacy risk in cross-silo FL settings, we systematically generate several hundred synthetic federation configurations, varying properties of the data, model design and training, and the federation itself. Finally, we investigate the effectiveness of Differential Privacy in mitigating this threat.
    Physics-informed MTA-UNet: Prediction of Thermal Stress and Thermal Deformation of Satellites. (arXiv:2209.01009v1 [cs.LG])
    The rapid analysis of thermal stress and deformation plays a pivotal role in the thermal control measures and optimization of the structural design of satellites. For achieving real-time thermal stress and thermal deformation analysis of satellite motherboards, this paper proposes a novel Multi-Task Attention UNet (MTA-UNet) neural network which combines the advantages of both Multi-Task Learning (MTL) and U-Net with attention mechanism. Besides, a physics-informed strategy is used in the training process, where partial differential equations (PDEs) are integrated into the loss functions as residual terms. Finally, an uncertainty-based loss balancing approach is applied to weight different loss functions of multiple training tasks. Experimental results show that the proposed MTA-UNet effectively improves the prediction accuracy of multiple physics tasks compared with Single-Task Learning (STL) models. In addition, the physics-informed method brings less error in the prediction of each task, especially on small data sets. The code can be downloaded at: \url{https://github.com/KomorebiTso/MTA-UNet}.
    Event-Driven Tactile Learning with Location Spiking Neurons. (arXiv:2209.01080v1 [cs.NE])
    The sense of touch is essential for a variety of daily tasks. New advances in event-based tactile sensors and Spiking Neural Networks (SNNs) spur the research in event-driven tactile learning. However, SNN-enabled event-driven tactile learning is still in its infancy due to the limited representative abilities of existing spiking neurons and high spatio-temporal complexity in the data. In this paper, to improve the representative capabilities of existing spiking neurons, we propose a novel neuron model called "location spiking neuron", which enables us to extract features of event-based data in a novel way. Moreover, based on the classical Time Spike Response Model (TSRM), we develop a specific location spiking neuron model - Location Spike Response Model (LSRM) that serves as a new building block of SNNs. Furthermore, we propose a hybrid model which combines an SNN with TSRM neurons and an SNN with LSRM neurons to capture the complex spatio-temporal dependencies in the data. Extensive experiments demonstrate the significant improvements of our models over other works on event-driven tactile learning and show the superior energy efficiency of our models and location spiking neurons, which may unlock their potential on neuromorphic hardware.
    Ranking-Based Physics-Informed Line Failure Detection in Power Grids. (arXiv:2209.01021v1 [eess.SP])
    Climate change increases the number of extreme weather events (wind and snowstorms, heavy rains, wildfires) that compromise power system reliability and lead to multiple equipment failures. Real-time and accurate detecting of potential line failures is the first step to mitigating the extreme weather impact and activating emergency controls. Power balance equations nonlinearity, increased uncertainty in generation during extreme events, and lack of grid observability compromise the efficiency of traditional data-driven failure detection methods. At the same time, modern problem-oblivious machine learning methods based on neural networks require a large amount of data to detect an accident, especially in a time-changing environment. This paper proposes a Physics-InformEd Line failure Detector (FIELD) that leverages grid topology information to reduce sample and time complexities and improve localization accuracy. Finally, we illustrate the superior empirical performance of our approach compared to state-of-the-art methods over various test cases.
    Semi-Centralised Multi-Agent Reinforcement Learning with Policy-Embedded Training. (arXiv:2209.01054v1 [cs.MA])
    Centralised training (CT) is the basis for many popular multi-agent reinforcement learning (MARL) methods because it allows agents to quickly learn high-performing policies. However, CT relies on agents learning from one-off observations of other agents' actions at a given state. Because MARL agents explore and update their policies during training, these observations often provide poor predictions about other agents' behaviour and the expected return for a given action. CT methods therefore suffer from high variance and error-prone estimates, harming learning. CT methods also suffer from explosive growth in complexity due to the reliance on global observations, unless strong factorisation restrictions are imposed (e.g., monotonic reward functions for QMIX). We address these challenges with a new semi-centralised MARL framework that performs policy-embedded training and decentralised execution. Our method, policy embedded reinforcement learning algorithm (PERLA), is an enhancement tool for Actor-Critic MARL algorithms that leverages a novel parameter sharing protocol and policy embedding method to maintain estimates that account for other agents' behaviour. Our theory proves PERLA dramatically reduces the variance in value estimates. Unlike various CT methods, PERLA, which seamlessly adopts MARL algorithms, scales easily with the number of agents without the need for restrictive factorisation assumptions. We demonstrate PERLA's superior empirical performance and efficient scaling in benchmark environments including StarCraft Micromanagement II and Multi-agent Mujoco
    Learning Stochastic Dynamics with Statistics-Informed Neural Network. (arXiv:2202.12278v2 [cs.LG] UPDATED)
    We introduce a machine-learning framework named statistics-informed neural network (SINN) for learning stochastic dynamics from data. This new architecture was theoretically inspired by a universal approximation theorem for stochastic systems, which we introduce in this paper, and the projection-operator formalism for stochastic modeling. We devise mechanisms for training the neural network model to reproduce the correct \emph{statistical} behavior of a target stochastic process. Numerical simulation results demonstrate that a well-trained SINN can reliably approximate both Markovian and non-Markovian stochastic dynamics. We demonstrate the applicability of SINN to coarse-graining problems and the modeling of transition dynamics. Furthermore, we show that the obtained reduced-order model can be trained on temporally coarse-grained data and hence is well suited for rare-event simulations.
    Intrinsic fluctuations of reinforcement learning promote cooperation. (arXiv:2209.01013v1 [cs.LG])
    In this work, we ask for and answer what makes classical reinforcement learning cooperative. Cooperating in social dilemma situations is vital for animals, humans, and machines. While evolutionary theory revealed a range of mechanisms promoting cooperation, the conditions under which agents learn to cooperate are contested. Here, we demonstrate which and how individual elements of the multi-agent learning setting lead to cooperation. Specifically, we consider the widely used temporal-difference reinforcement learning algorithm with epsilon-greedy exploration in the classic environment of an iterated Prisoner's dilemma with one-period memory. Each of the two learning agents learns a strategy that conditions the following action choices on both agents' action choices of the last round. We find that next to a high caring for future rewards, a low exploration rate, and a small learning rate, it is primarily intrinsic stochastic fluctuations of the reinforcement learning process which double the final rate of cooperation to up to 80\%. Thus, inherent noise is not a necessary evil of the iterative learning process. It is a critical asset for the learning of cooperation. However, we also point out the trade-off between a high likelihood of cooperative behavior and achieving this in a reasonable amount of time. Our findings are relevant for purposefully designing cooperative algorithms and regulating undesired collusive effects.
    Normalization effects on deep neural networks. (arXiv:2209.01018v1 [cs.LG])
    We study the effect of normalization on the layers of deep neural networks of feed-forward type. A given layer $i$ with $N_{i}$ hidden units is allowed to be normalized by $1/N_{i}^{\gamma_{i}}$ with $\gamma_{i}\in[1/2,1]$ and we study the effect of the choice of the $\gamma_{i}$ on the statistical behavior of the neural network's output (such as variance) as well as on the test accuracy on the MNIST data set. We find that in terms of variance of the neural network's output and test accuracy the best choice is to choose the $\gamma_{i}$'s to be equal to one, which is the mean-field scaling. We also find that this is particularly true for the outer layer, in that the neural network's behavior is more sensitive in the scaling of the outer layer as opposed to the scaling of the inner layers. The mechanism for the mathematical analysis is an asymptotic expansion for the neural network's output. An important practical consequence of the analysis is that it provides a systematic and mathematically informed way to choose the learning rate hyperparameters. Such a choice guarantees that the neural network behaves in a statistically robust way as the $N_i$ grow to infinity.
    Self-Supervised Human Activity Recognition with Localized Time-Frequency Contrastive Representation Learning. (arXiv:2209.00990v1 [eess.SP])
    In this paper, we propose a self-supervised learning solution for human activity recognition with smartphone accelerometer data. We aim to develop a model that learns strong representations from accelerometer signals, in order to perform robust human activity classification, while reducing the model's reliance on class labels. Specifically, we intend to enable cross-dataset transfer learning such that our network pre-trained on a particular dataset can perform effective activity classification on other datasets (successive to a small amount of fine-tuning). To tackle this problem, we design our solution with the intention of learning as much information from the accelerometer signals as possible. As a result, we design two separate pipelines, one that learns the data in time-frequency domain, and the other in time-domain alone. In order to address the issues mentioned above in regards to cross-dataset transfer learning, we use self-supervised contrastive learning to train each of these streams. Next, each stream is fine-tuned for final classification, and eventually the two are fused to provide the final results. We evaluate the performance of the proposed solution on three datasets, namely MotionSense, HAPT, and HHAR, and demonstrate that our solution outperforms prior works in this field. We further evaluate the performance of the method in learning generalized features, by using MobiAct dataset for pre-training and the remaining three datasets for the downstream classification task, and show that the proposed solution achieves better performance in comparison with other self-supervised methods in cross-dataset transfer learning.
    Johnson-Lindenstrauss embeddings for noisy vectors -- taking advantage of the noise. (arXiv:2209.01006v1 [cs.DS])
    This paper investigates theoretical properties of subsampling and hashing as tools for approximate Euclidean norm-preserving embeddings for vectors with (unknown) additive Gaussian noises. Such embeddings are sometimes called Johnson-lindenstrauss embeddings due to their celebrated lemma. Previous work shows that as sparse embeddings, the success of subsampling and hashing closely depends on the $l_\infty$ to $l_2$ ratios of the vector to be mapped. This paper shows that the presence of noise removes such constrain in high-dimensions, in other words, sparse embeddings such as subsampling and hashing with comparable embedding dimensions to dense embeddings have similar approximate norm-preserving dimensionality-reduction properties. The key is that the noise should be treated as an information to be exploited, not simply something to be removed. Theoretical bounds for subsampling and hashing to recover the approximate norm of a high dimension vector in the presence of noise are derived, with numerical illustrations showing better performances are achieved in the presence of noise.
    Multimodal Information Fusion for Glaucoma and DR Classification. (arXiv:2209.00979v1 [eess.IV])
    Multimodal information is frequently available in medical tasks. By combining information from multiple sources, clinicians are able to make more accurate judgments. In recent years, multiple imaging techniques have been used in clinical practice for retinal analysis: 2D fundus photographs, 3D optical coherence tomography (OCT) and 3D OCT angiography, etc. Our paper investigates three multimodal information fusion strategies based on deep learning to solve retinal analysis tasks: early fusion, intermediate fusion, and hierarchical fusion. The commonly used early and intermediate fusions are simple but do not fully exploit the complementary information between modalities. We developed a hierarchical fusion approach that focuses on combining features across multiple dimensions of the network, as well as exploring the correlation between modalities. These approaches were applied to glaucoma and diabetic retinopathy classification, using the public GAMMA dataset (fundus photographs and OCT) and a private dataset of PlexElite 9000 (Carl Zeis Meditec Inc.) OCT angiography acquisitions, respectively. Our hierarchical fusion method performed the best in both cases and paved the way for better clinical diagnosis.
    Exploiting Pretrained Biochemical Language Models for Targeted Drug Design. (arXiv:2209.00981v1 [cs.LG])
    Motivation: The development of novel compounds targeting proteins of interest is one of the most important tasks in the pharmaceutical industry. Deep generative models have been applied to targeted molecular design and have shown promising results. Recently, target-specific molecule generation has been viewed as a translation between the protein language and the chemical language. However, such a model is limited by the availability of interacting protein-ligand pairs. On the other hand, large amounts of unlabeled protein sequences and chemical compounds are available and have been used to train language models that learn useful representations. In this study, we propose exploiting pretrained biochemical language models to initialize (i.e. warm start) targeted molecule generation models. We investigate two warm start strategies: (i) a one-stage strategy where the initialized model is trained on targeted molecule generation (ii) a two-stage strategy containing a pre-finetuning on molecular generation followed by target specific training. We also compare two decoding strategies to generate compounds: beam search and sampling. Results: The results show that the warm-started models perform better than a baseline model trained from scratch. The two proposed warm-start strategies achieve similar results to each other with respect to widely used metrics from benchmarks. However, docking evaluation of the generated compounds for a number of novel proteins suggests that the one-stage strategy generalizes better than the two-stage strategy. Additionally, we observe that beam search outperforms sampling in both docking evaluation and benchmark metrics for assessing compound quality. Availability and implementation: The source code is available at https://github.com/boun-tabi/biochemical-lms-for-drug-design and the materials are archived in Zenodo at https://doi.org/10.5281/zenodo.6832145
    Deep Learning-based ECG Classification on Raspberry PI using a Tensorflow Lite Model based on PTB-XL Dataset. (arXiv:2209.00989v1 [eess.SP])
    The number of IoT devices in healthcare is expected to rise sharply due to increased demand since the COVID-19 pandemic. Deep learning and IoT devices are being employed to monitor body vitals and automate anomaly detection in clinical and non-clinical settings. Most of the current technology requires the transmission of raw data to a remote server, which is not efficient for resource-constrained IoT devices and embedded systems. Additionally, it is challenging to develop a machine learning model for ECG classification due to the lack of an extensive open public database. To an extent, to overcome this challenge PTB-XL dataset has been used. In this work, we have developed machine learning models to be deployed on Raspberry Pi. We present an evaluation of our TensorFlow Model with two classification classes. We also present the evaluation of the corresponding TensorFlow Lite FlatBuffers to demonstrate their minimal run-time requirements while maintaining acceptable accuracy.
    Inferring Tabular Analysis Metadata by Infusing Distribution and Knowledge Information. (arXiv:2209.00946v1 [cs.DB])
    Many data analysis tasks heavily rely on a deep understanding of tables (multi-dimensional data). Across the tasks, there exist comonly used metadata attributes of table fields / columns. In this paper, we identify four such analysis metadata: Measure/dimension dichotomy, common field roles, semantic field type, and default aggregation function. While those metadata face challenges of insufficient supervision signals, utilizing existing knowledge and understanding distribution. To inference these metadata for a raw table, we propose our multi-tasking Metadata model which fuses field distribution and knowledge graph information into pre-trained tabular models. For model training and evaluation, we collect a large corpus (~582k tables from private spreadsheet and public tabular datasets) of analysis metadata by using diverse smart supervisions from downstream tasks. Our best model has accuracy = 98%, hit rate at top-1 > 67%, accuracy > 80%, and accuracy = 88% for the four analysis metadata inference tasks, respectively. It outperforms a series of baselines that are based on rules, traditional machine learning methods, and pre-trained tabular models. Analysis metadata models are deployed in a popular data analysis product, helping downstream intelligent features such as insights mining, chart / pivot table recommendation, and natural language QA...
    A Dataset and Baseline Approach for Identifying Usage States from Non-Intrusive Power Sensing With MiDAS IoT-based Sensors. (arXiv:2209.00987v1 [eess.SP])
    The state identification problem seeks to identify power usage patterns of any system, like buildings or factories, of interest. In this challenge paper, we make power usage dataset available from 8 institutions in manufacturing, education and medical institutions from the US and India, and an initial un-supervised machine learning based solution as a baseline for the community to accelerate research in this area.
    Macroeconomic Predictions using Payments Data and Machine Learning. (arXiv:2209.00948v1 [econ.GN])
    Predicting the economy's short-term dynamics -- a vital input to economic agents' decision-making process -- often uses lagged indicators in linear models. This is typically sufficient during normal times but could prove inadequate during crisis periods. This paper aims to demonstrate that non-traditional and timely data such as retail and wholesale payments, with the aid of nonlinear machine learning approaches, can provide policymakers with sophisticated models to accurately estimate key macroeconomic indicators in near real-time. Moreover, we provide a set of econometric tools to mitigate overfitting and interpretability challenges in machine learning models to improve their effectiveness for policy use. Our models with payments data, nonlinear methods, and tailored cross-validation approaches help improve macroeconomic nowcasting accuracy up to 40\% -- with higher gains during the COVID-19 period. We observe that the contribution of payments data for economic predictions is small and linear during low and normal growth periods. However, the payments data contribution is large, asymmetrical, and nonlinear during strong negative or positive growth periods.
    Automated Assessment of Transthoracic Echocardiogram Image Quality Using Deep Neural Networks. (arXiv:2209.00976v1 [eess.IV])
    Standard views in two-dimensional echocardiography are well established but the quality of acquired images are highly dependent on operator skills and are assessed subjectively. This study is aimed at providing an objective assessment pipeline for echocardiogram image quality by defining a new set of domain-specific quality indicators. Consequently, image quality assessment can thus be automated to enhance clinical measurements, interpretation, and real-time optimization. We have developed deep neural networks for the automated assessment of echocardiographic frame which were randomly sampled from 11,262 adult patients. The private echocardiography dataset consists of 33,784 frames, previously acquired between 2010 and 2020. Deep learning approaches were used to extract the spatiotemporal features and the image quality indicators were evaluated against the mean absolute error. Our quality indicators encapsulate both anatomical and pathological elements to provide multivariate assessment scores for anatomical visibility, clarity, depth-gain and foreshortedness, respectively.
    Reversible Action Design for Combinatorial Optimization with Reinforcement Learning. (arXiv:2102.07210v2 [cs.LG] UPDATED)
    Combinatorial optimization problem (COP) over graphs is a fundamental challenge in optimization. Reinforcement learning (RL) has recently emerged as a new framework to tackle these problems and has demonstrated promising results. However, most RL solutions employ a greedy manner to construct the solution incrementally, thus inevitably pose unnecessary dependency on action sequences and need a lot of problem-specific designs. We propose a general RL framework that not only exhibits state-of-the-art empirical performance but also generalizes to a variety class of COPs. Specifically, we define state as a solution to a problem instance and action as a perturbation to this solution. We utilize graph neural networks (GNN) to extract latent representations for given problem instances for state-action encoding, and then apply deep Q-learning to obtain a policy that gradually refines the solution by flipping or swapping vertex labels. Experiments are conducted on Maximum $k$-Cut and Traveling Salesman Problem and performance improvement is achieved against a set of learning-based and heuristic baselines.
    Long-term hail risk assessment with deep neural networks. (arXiv:2209.01191v1 [physics.ao-ph])
    Hail risk assessment is necessary to estimate and reduce damage to crops, orchards, and infrastructure. Also, it helps to estimate and reduce consequent losses for businesses and, particularly, insurance companies. But hail forecasting is challenging. Data used for designing models for this purpose are tree-dimensional geospatial time series. Hail is a very local event with respect to the resolution of available datasets. Also, hail events are rare - only 1% of targets in observations are marked as "hail". Models for nowcasting and short-term hail forecasts are improving. Introducing machine learning models to the meteorology field is not new. There are also various climate models reflecting possible scenarios of climate change in the future. But there are no machine learning models for data-driven forecasting of changes in hail frequency for a given area. The first possible approach for the latter task is to ignore spatial and temporal structure and develop a model capable of classifying a given vertical profile of meteorological variables as favorable to hail formation or not. Although such an approach certainly neglects important information, it is very light weighted and easily scalable because it treats observations as independent from each other. The more advanced approach is to design a neural network capable to process geospatial data. Our idea here is to combine convolutional layers responsible for the processing of spatial data with recurrent neural network blocks capable to work with temporal structure. This study compares two approaches and introduces a model suitable for the task of forecasting changes in hail frequency for ongoing decades.
    Privacy-preserving Data Sharing on Vertically Partitioned Data. (arXiv:2010.09293v2 [cs.LG] UPDATED)
    In this work, we introduce a differentially private method for generating synthetic data from vertically partitioned data, \emph{i.e.}, where data of the same individuals is distributed across multiple data holders or parties. We present a differentially privacy stochastic gradient descent (DP-SGD) algorithm to train a mixture model over such partitioned data using variational inference. We modify a secure multiparty computation (MPC) framework to combine MPC with differential privacy (DP), in order to use differentially private MPC effectively to learn a probabilistic generative model under DP on such vertically partitioned data. Assuming the mixture components contain no dependencies across different parties, the objective function can be factorized into a sum of products of the contributions calculated by the parties. Finally, MPC is used to compute the aggregate between the different contributions. Moreover, we rigorously define the privacy guarantees with respect to the different players in the system. To demonstrate the accuracy of our method, we run our algorithm on the Adult dataset from the UCI machine learning repository, where we obtain comparable results to the non-partitioned case.
    Optimal bump functions for shallow ReLU networks: Weight decay, depth separation and the curse of dimensionality. (arXiv:2209.01173v1 [stat.ML])
    In this note, we study how neural networks with a single hidden layer and ReLU activation interpolate data drawn from a radially symmetric distribution with target labels 1 at the origin and 0 outside the unit ball, if no labels are known inside the unit ball. With weight decay regularization and in the infinite neuron, infinite data limit, we prove that a unique radially symmetric minimizer exists, whose weight decay regularizer and Lipschitz constant grow as $d$ and $\sqrt{d}$ respectively. We furthermore show that the weight decay regularizer grows exponentially in $d$ if the label $1$ is imposed on a ball of radius $\varepsilon$ rather than just at the origin. By comparison, a neural networks with two hidden layers can approximate the target function without encountering the curse of dimensionality.  ( 2 min )
    Synthesizing Speech from Intracranial Depth Electrodes using an Encoder-Decoder Framework. (arXiv:2111.01457v3 [cs.SD] UPDATED)
    Speech Neuroprostheses have the potential to enable communication for people with dysarthria or anarthria. Recent advances have demonstrated high-quality text decoding and speech synthesis from electrocorticographic grids placed on the cortical surface. Here, we investigate a less invasive measurement modality in three participants, namely stereotactic EEG (sEEG) that provides sparse sampling from multiple brain regions, including subcortical regions. To evaluate whether sEEG can also be used to synthesize high-quality audio from neural recordings, we employ a recurrent encoder-decoder model based on modern deep learning methods. We find that speech can indeed be reconstructed with correlations up to 0.8 from these minimally invasive recordings, despite limited amounts of training data.  ( 2 min )
    Examining average and discounted reward optimality criteria in reinforcement learning. (arXiv:2107.01348v2 [cs.LG] UPDATED)
    In reinforcement learning (RL), the goal is to obtain an optimal policy, for which the optimality criterion is fundamentally important. Two major optimality criteria are average and discounted rewards. While the latter is more popular, it is problematic to apply in environments without an inherent notion of discounting. This motivates us to revisit a) the progression of optimality criteria in dynamic programming, b) justification for and complication of an artificial discount factor, and c) benefits of directly maximizing the average reward criterion, which is discounting-free. Our contributions include a thorough examination of the relationship between average and discounted rewards, as well as a discussion of their pros and cons in RL. We emphasize that average-reward RL methods possess the ingredient and mechanism for applying a family of discounting-free optimality criteria (Veinott, 1969) to RL.  ( 2 min )
    Poincare: Recommending Publication Venues via Treatment Effect Estimation. (arXiv:2010.09157v2 [cs.DL] UPDATED)
    Choosing a publication venue for an academic paper is a crucial step in the research process. However, in many cases, decisions are based solely on the experience of researchers, which often leads to suboptimal results. Although there exist venue recommender systems for academic papers, they recommend venues where the paper is expected to be published. In this study, we aim to recommend publication venues from a different perspective. We estimate the number of citations a paper will receive if the paper is published in each venue and recommend the venue where the paper has the most potential impact. However, there are two challenges to this task. First, a paper is published in only one venue, and thus, we cannot observe the number of citations the paper would receive if the paper were published in another venue. Secondly, the contents of a paper and the publication venue are not statistically independent; that is, there exist selection biases in choosing publication venues. In this paper, we formulate the venue recommendation problem as a treatment effect estimation problem. We use a bias correction method to estimate the potential impact of choosing a publication venue effectively and to recommend venues based on the potential impact of papers in each venue. We highlight the effectiveness of our method using paper data from computer science conferences.  ( 3 min )
    GraphTheta: A Distributed Graph Neural Network Learning System With Flexible Training Strategy. (arXiv:2104.10569v2 [cs.LG] UPDATED)
    Graph neural networks (GNNs) have been demonstrated as a powerful tool for analysing non-Euclidean graph data. However, the lack of efficient distributed graph learning (GL) systems severely hinders applications of GNNs, especially when graphs are big and GNNs are relatively deep. Herein, we present GraphTheta, a novel distributed and scalable GL system implemented in vertex-centric graph programming model. GraphTheta is the first GL system built upon distributed graph processing with neural network operators implemented as user-defined functions. This system supports multiple training strategies, and enables efficient and scalable big graph learning on distributed (virtual) machines with low memory each. To facilitate graph convolution implementations, GraphTheta puts forward a new GL abstraction named NN-TGAR to bridge the gap between graph processing and graph deep learning. A distributed graph engine is proposed to conduct the stochastic gradient descent optimization with a hybrid-parallel execution. Moreover, we add support for a new cluster-batched training strategy besides global-batch and mini-batch. We evaluate GraphTheta using a number of datasets with network size ranging from small-, modest- to large-scale. Experimental results show that GraphTheta can scale well to 1,024 workers for training an in-house developed GNN on an industry-scale Alipay dataset of 1.4 billion nodes and 4.1 billion attributed edges, with a cluster of CPU virtual machines (dockers) of small memory each (5$\sim$12GB). Moreover, GraphTheta obtains comparable or better prediction results than the state-of-the-art GNN implementations, demonstrating its capability of learning GNNs as well as existing frameworks, and can outperform DistDGL by up to $2.02\times$ with better scalability. To the best of our knowledge, this work presents the largest edge-attributed GNN learning task conducted in the literature.  ( 3 min )
    Clustering and Structural Robustness in Causal Diagrams. (arXiv:2111.04513v2 [stat.ML] UPDATED)
    Graphs are commonly used to represent and visualize causal relations. For a small number of variables, this approach provides a succinct and clear view of the scenario at hand. As the number of variables under study increases, the graphical approach may become impractical, and the clarity of the representation is lost. Clustering of variables is a natural way to reduce the size of the causal diagram, but it may erroneously change the essential properties of the causal relations if implemented arbitrarily. We define a specific type of cluster, called transit cluster, that is guaranteed to preserve the identifiability properties of causal effects under certain conditions. We provide a sound and complete algorithm for finding all transit clusters in a given graph and demonstrate how clustering can simplify the identification of causal effects. We also study the inverse problem, where one starts with a clustered graph and looks for extended graphs where the identifiability properties of causal effects remain unchanged. We show that this kind of structural robustness is closely related to transit clusters.  ( 2 min )
    A Framework for Supervised Heterogeneous Transfer Learning using Dynamic Distribution Adaptation and Manifold Regularization. (arXiv:2108.12293v2 [cs.LG] UPDATED)
    Transfer learning aims to learn classifiers for a target domain by transferring knowledge from a source domain. However, due to two main issues: feature discrepancy and distribution divergence, transfer learning can be a very difficult problem in practice. In this paper, we present a framework called TLF that builds a classifier for the target domain having only few labeled training records by transferring knowledge from the source domain having many labeled records. While existing methods often focus on one issue and leave the other one for the further work, TLF is capable of handling both issues simultaneously. In TLF, we alleviate feature discrepancy by identifying shared label distributions that act as the pivots to bridge the domains. We handle distribution divergence by simultaneously optimizing the structural risk functional, joint distributions between domains, and the manifold consistency underlying marginal distributions. Moreover, for the manifold consistency we exploit its intrinsic properties by identifying k nearest neighbors of a record, where the value of k is determined automatically in TLF. Furthermore, since negative transfer is not desired, we consider only the source records that are belonging to the source pivots during the knowledge transfer. We evaluate TLF on seven publicly available natural datasets and compare the performance of TLF against the performance of fourteen state-of-the-art techniques. We also evaluate the effectiveness of TLF in some challenging situations. Our experimental results, including statistical sign test and Nemenyi test analyses, indicate a clear superiority of the proposed framework over the state-of-the-art techniques.  ( 3 min )
    Petals: Collaborative Inference and Fine-tuning of Large Models. (arXiv:2209.01188v1 [cs.LG])
    Many NLP tasks benefit from using large language models (LLMs) that often have more than 100 billion parameters. With the release of BLOOM-176B and OPT-175B, everyone can download pretrained models of this scale. Still, using these models requires high-end hardware unavailable to many researchers. In some cases, LLMs can be used more affordably via RAM offloading or hosted APIs. However, these techniques have innate limitations: offloading is too slow for interactive inference, while APIs are not flexible enough for research. In this work, we propose Petals $-$ a system for inference and fine-tuning of large models collaboratively by joining the resources of multiple parties trusted to process client's data. We demonstrate that this strategy significantly outperforms offloading for very large models, running inference of BLOOM-176B on consumer GPUs with $\approx$ 1 step per second. Unlike most inference APIs, Petals also natively exposes the hidden states of served models, allowing its users to train and share custom model extensions based on efficient fine-tuning methods.  ( 2 min )
    XFL: Naming Functions in Binaries with Extreme Multi-label Learning. (arXiv:2107.13404v3 [cs.CR] UPDATED)
    Reverse engineers benefit from the presence of identifiers such as function names in a binary, but usually these are removed for release. Training a machine learning model to predict function names automatically is promising but fundamentally hard: unlike words in natural language, most function names occur only once. In this paper, we address this problem by introducing eXtreme Function Labeling (XFL), an extreme multi-label learning approach to selecting appropriate labels for binary functions. XFL splits function names into tokens, treating each as an informative label akin to the problem of tagging texts in natural language. We relate the semantics of binary code to labels through DEXTER, a novel function embedding that combines static analysis-based features with local context from the call graph and global context from the entire binary. We demonstrate that XFL/DEXTER outperforms the state of the art in function labeling on a dataset of 10,047 binaries from the Debian project, achieving a precision of 83.5%. We also study combinations of XFL with alternative binary embeddings from the literature and show that DEXTER consistently performs best for this task. As a result, we demonstrate that binary function labeling can be effectively phrased in terms of multi-label learning, and that binary function embeddings benefit from including explicit semantic features.  ( 3 min )
    Pareto Navigation Gradient Descent: a First-Order Algorithm for Optimization in Pareto Set. (arXiv:2110.08713v2 [math.OC] UPDATED)
    Many modern machine learning applications, such as multi-task learning, require finding optimal model parameters to trade-off multiple objective functions that may conflict with each other. The notion of the Pareto set allows us to focus on the set of (often infinite number of) models that cannot be strictly improved. But it does not provide an actionable procedure for picking one or a few special models to return to practical users. In this paper, we consider \emph{optimization in Pareto set (OPT-in-Pareto)}, the problem of finding Pareto models that optimize an extra reference criterion function within the Pareto set. This function can either encode a specific preference from the users, or represent a generic diversity measure for obtaining a set of diversified Pareto models that are representative of the whole Pareto set. Unfortunately, despite being a highly useful framework, efficient algorithms for OPT-in-Pareto have been largely missing, especially for large-scale, non-convex, and non-linear objectives in deep learning. A naive approach is to apply Riemannian manifold gradient descent on the Pareto set, which yields a high computational cost due to the need for eigen-calculation of Hessian matrices. We propose a first-order algorithm that approximately solves OPT-in-Pareto using only gradient information, with both high practical efficiency and theoretically guaranteed convergence property. Empirically, we demonstrate that our method works efficiently for a variety of challenging multi-task-related problems.  ( 3 min )
    Deep Learning-based Patient Re-identification Is able to Exploit the Biometric Nature of Medical Chest X-ray Data. (arXiv:2103.08562v4 [cs.CV] UPDATED)
    With the rise and ever-increasing potential of deep learning techniques in recent years, publicly available medical datasets became a key factor to enable reproducible development of diagnostic algorithms in the medical domain. Medical data contains sensitive patient-related information and is therefore usually anonymized by removing patient identifiers, e.g., patient names before publication. To the best of our knowledge, we are the first to show that a well-trained deep learning system is able to recover the patient identity from chest X-ray data. We demonstrate this using the publicly available large-scale ChestX-ray14 dataset, a collection of 112,120 frontal-view chest X-ray images from 30,805 unique patients. Our verification system is able to identify whether two frontal chest X-ray images are from the same person with an AUC of 0.9940 and a classification accuracy of 95.55%. We further highlight that the proposed system is able to reveal the same person even ten and more years after the initial scan. When pursuing a retrieval approach, we observe an mAP@R of 0.9748 and a precision@1 of 0.9963. Furthermore, we achieve an AUC of up to 0.9870 and a precision@1 of up to 0.9444 when evaluating our trained networks on external datasets such as CheXpert and the COVID-19 Image Data Collection. Based on this high identification rate, a potential attacker may leak patient-related information and additionally cross-reference images to obtain more information. Thus, there is a great risk of sensitive content falling into unauthorized hands or being disseminated against the will of the concerned patients. Especially during the COVID-19 pandemic, numerous chest X-ray datasets have been published to advance research. Therefore, such data may be vulnerable to potential attacks by deep learning-based re-identification algorithms.  ( 3 min )
    Support vector machines and Radon's theorem. (arXiv:2011.00617v3 [cs.LG] UPDATED)
    A support vector machine (SVM) is an algorithm that finds a hyperplane which optimally separates labeled data points in $\mathbb{R}^n$ into positive and negative classes. The data points on the margin of this separating hyperplane are called support vectors. We connect the possible configurations of support vectors to Radon's theorem, which provides guarantees for when a set of points can be divided into two classes (positive and negative) whose convex hulls intersect. If the convex hulls of the positive and negative support vectors are projected onto a separating hyperplane, then the projections intersect if and only if the hyperplane is optimal. Further, with a particular type of general position, we show that (a) the projected convex hulls of the support vectors intersect in exactly one point, (b) the support vectors are stable under perturbation, (c) there are at most $n+1$ support vectors, and (d) every number of support vectors from 2 up to $n+1$ is possible. Finally, we perform computer simulations studying the expected number of support vectors, and their configurations, for randomly generated data. We observe that as the distance between classes of points increases for this type of randomly generated data, configurations with fewer support vectors become more likely.  ( 3 min )
    Co-Imitation: Learning Design and Behaviour by Imitation. (arXiv:2209.01207v1 [cs.LG])
    The co-adaptation of robots has been a long-standing research endeavour with the goal of adapting both body and behaviour of a system for a given task, inspired by the natural evolution of animals. Co-adaptation has the potential to eliminate costly manual hardware engineering as well as improve the performance of systems. The standard approach to co-adaptation is to use a reward function for optimizing behaviour and morphology. However, defining and constructing such reward functions is notoriously difficult and often a significant engineering effort. This paper introduces a new viewpoint on the co-adaptation problem, which we call co-imitation: finding a morphology and a policy that allow an imitator to closely match the behaviour of a demonstrator. To this end we propose a co-imitation methodology for adapting behaviour and morphology by matching state distributions of the demonstrator. Specifically, we focus on the challenging scenario with mismatched state- and action-spaces between both agents. We find that co-imitation increases behaviour similarity across a variety of tasks and settings, and demonstrate co-imitation by transferring human walking, jogging and kicking skills onto a simulated humanoid.  ( 2 min )
    First Hitting Diffusion Models. (arXiv:2209.01170v1 [cs.CV])
    We propose a family of First Hitting Diffusion Models (FHDM), deep generative models that generate data with a diffusion process that terminates at a random first hitting time. This yields an extension of the standard fixed-time diffusion models that terminate at a pre-specified deterministic time. Although standard diffusion models are designed for continuous unconstrained data, FHDM is naturally designed to learn distributions on continuous as well as a range of discrete and structure domains. Moreover, FHDM enables instance-dependent terminate time and accelerates the diffusion process to sample higher quality data with fewer diffusion steps. Technically, we train FHDM by maximum likelihood estimation on diffusion trajectories augmented from observed data with conditional first hitting processes (i.e., bridge) derived based on Doob's $h$-transform, deviating from the commonly used time-reversal mechanism. We apply FHDM to generate data in various domains such as point cloud (general continuous distribution), climate and geographical events on earth (continuous distribution on the sphere), unweighted graphs (distribution of binary matrices), and segmentation maps of 2D images (high-dimensional categorical distribution). We observe considerable improvement compared with the state-of-the-art approaches in both quality and speed.  ( 2 min )
    Adaptive Graph Diffusion Networks. (arXiv:2012.15024v2 [cs.LG] UPDATED)
    Graph Neural Networks (GNNs) have received much attention in the graph deep learning domain. However, recent research empirically and theoretically shows that deep GNNs suffer from over-fitting and over-smoothing problems. The usual solutions either cannot solve extensive runtime of deep GNNs or restrict graph convolution in the same feature space. We propose the Adaptive Graph Diffusion Networks (AGDNs) which perform multi-layer generalized graph diffusion in different feature spaces with moderate complexity and runtime. Standard graph diffusion methods combine large and dense powers of the transition matrix with predefined weighting coefficients. Instead, AGDNs combine smaller multi-hop node representations with learnable and generalized weighting coefficients. We propose two scalable mechanisms of weighting coefficients to capture multi-hop information: Hop-wise Attention (HA) and Hop-wise Convolution (HC). We evaluate AGDNs on diverse, challenging Open Graph Benchmark (OGB) datasets with semi-supervised node classification and link prediction tasks. Until the date of submission (Aug 26, 2022), AGDNs achieve top-1 performance on the ogbn-arxiv, ogbn-proteins and ogbl-ddi datasets and top-3 performance on the ogbl-citation2 dataset. On the similar Tesla V100 GPU cards, AGDNs outperform Reversible GNNs (RevGNNs) with 13% complexity and 1% training runtime of RevGNNs on the ogbn-proteins dataset. AGDNs also achieve comparable performance to SEAL with 36% training and 0.2% inference runtime of SEAL on the ogbl-citation2 dataset.  ( 3 min )
    Revisiting Outer Optimization in Adversarial Training. (arXiv:2209.01199v1 [cs.LG])
    Despite the fundamental distinction between adversarial and natural training (AT and NT), AT methods generally adopt momentum SGD (MSGD) for the outer optimization. This paper aims to analyze this choice by investigating the overlooked role of outer optimization in AT. Our exploratory evaluations reveal that AT induces higher gradient norm and variance compared to NT. This phenomenon hinders the outer optimization in AT since the convergence rate of MSGD is highly dependent on the variance of the gradients. To this end, we propose an optimization method called ENGM which regularizes the contribution of each input example to the average mini-batch gradients. We prove that the convergence rate of ENGM is independent of the variance of the gradients, and thus, it is suitable for AT. We introduce a trick to reduce the computational cost of ENGM using empirical observations on the correlation between the norm of gradients w.r.t. the network parameters and input examples. Our extensive evaluations and ablation studies on CIFAR-10, CIFAR-100, and TinyImageNet demonstrate that ENGM and its variants consistently improve the performance of a wide range of AT methods. Furthermore, ENGM alleviates major shortcomings of AT including robust overfitting and high sensitivity to hyperparameter settings.  ( 2 min )
    Hierarchical Relational Learning for Few-Shot Knowledge Graph Completion. (arXiv:2209.01205v1 [cs.LG])
    Knowledge graphs (KGs) are known for their large scale and knowledge inference ability, but are also notorious for the incompleteness associated with them. Due to the long-tail distribution of the relations in KGs, few-shot KG completion has been proposed as a solution to alleviate incompleteness and expand the coverage of KGs. It aims to make predictions for triplets involving novel relations when only a few training triplets are provided as reference. Previous methods have mostly focused on designing local neighbor aggregators to learn entity-level information and/or imposing sequential dependency assumption at the triplet level to learn meta relation information. However, valuable pairwise triplet-level interactions and context-level relational information have been largely overlooked for learning meta representations of few-shot relations. In this paper, we propose a hierarchical relational learning method (HiRe) for few-shot KG completion. By jointly capturing three levels of relational information (entity-level, triplet-level and context-level), HiRe can effectively learn and refine the meta representation of few-shot relations, and consequently generalize very well to new unseen relations. Extensive experiments on two benchmark datasets validate the superiority of HiRe against other state-of-the-art methods.  ( 2 min )
    Estimation of Correlation Matrices from Limited time series Data using Machine Learning. (arXiv:2209.01198v1 [cs.LG])
    Prediction of correlation matrices from given time series data has several applications for a range of problems, such as inferring neuronal connections from spiking data, deducing causal dependencies between genes from expression data, and discovering long spatial range influences in climate variations. Traditional methods of predicting correlation matrices utilize time series data of all the nodes of the underlying networks. Here, we use a supervised machine learning technique to predict the correlation matrix of entire systems from finite time series information of a few randomly selected nodes. The accuracy of the prediction from the model confirms that only a limited time series of a subset of the entire system is enough to make good correlation matrix predictions. Furthermore, using an unsupervised learning algorithm, we provide insights into the success of the predictions from our model. Finally, we apply the machine learning model developed here to real-world data sets.  ( 2 min )
    No-Regret Learning Dynamics for Extensive-Form Correlated Equilibrium. (arXiv:2004.00603v5 [cs.GT] UPDATED)
    The existence of simple, uncoupled no-regret dynamics that converge to correlated equilibria in normal-form games is a celebrated result in the theory of multi-agent systems. Specifically, it has been known for more than 20 years that when all players seek to minimize their internal regret in a repeated normal-form game, the empirical frequency of play converges to a normal-form correlated equilibrium. Extensive-form (that is, tree-form) games generalize normal-form games by modeling both sequential and simultaneous moves, as well as private information. Because of the sequential nature and presence of partial information in the game, extensive-form correlation has significantly different properties than the normal-form counterpart, many of which are still open research directions. Extensive-form correlated equilibrium (EFCE) has been proposed as the natural extensive-form counterpart to normal-form correlated equilibrium. However, it was currently unknown whether EFCE emerges as the result of uncoupled agent dynamics. In this paper, we give the first uncoupled no-regret dynamics that converge to the set of EFCEs in $n$-player general-sum extensive-form games with perfect recall. First, we introduce a notion of trigger regret in extensive-form games, which extends that of internal regret in normal-form games. When each player has low trigger regret, the empirical frequency of play is close to an EFCE. Then, we give an efficient no-trigger-regret algorithm. Our algorithm decomposes trigger regret into local subproblems at each decision point for the player, and constructs a global strategy of the player from the local solutions at each decision point.  ( 3 min )
    "More Than Words": Linking Music Preferences and Moral Values Through Lyrics. (arXiv:2209.01169v1 [cs.CY])
    This study explores the association between music preferences and moral values by applying text analysis techniques to lyrics. Harvesting data from a Facebook-hosted application, we align psychometric scores of 1,386 users to lyrics from the top 5 songs of their preferred music artists as emerged from Facebook Page Likes. We extract a set of lyrical features related to each song's overarching narrative, moral valence, sentiment, and emotion. A machine learning framework was designed to exploit regression approaches and evaluate the predictive power of lyrical features for inferring moral values. Results suggest that lyrics from top songs of artists people like inform their morality. Virtues of hierarchy and tradition achieve higher prediction scores ($.20 \leq r \leq .30$) than values of empathy and equality ($.08 \leq r \leq .11$), while basic demographic variables only account for a small part in the models' explainability. This shows the importance of music listening behaviours, as assessed via lyrical preferences, alone in capturing moral values. We discuss the technological and musicological implications and possible future improvements.  ( 2 min )
    Multi-Step Prediction in Linearized Latent State Spaces for Representation Learning. (arXiv:2209.01127v1 [cs.LG])
    In this paper, we derive a novel method as a generalization over LCEs such as E2C. The method develops the idea of learning a locally linear state space, by adding a multi-step prediction, thus allowing for more explicit control over the curvature. We show, that the method outperforms E2C without drastic model changes which come with other works, such as PCC and P3C. We discuss the relation between E2C and the presented method and derived update equations. We provide empirical evidence, which suggests that by considering the multi-step prediction our method - ms-E2C - allows to learn much better latent state spaces in terms of curvature and next state predictability. Finally, we also discuss certain stability challenges we encounter with multi-step predictions and the ways to mitigate them.
    A lightweight hybrid CNN-LSTM model for ECG-based arrhythmia detection. (arXiv:2209.00988v1 [eess.SP])
    Electrocardiogram (ECG) is the most frequent and routine diagnostic tool used for monitoring heart electrical signals and evaluating its functionality. The human heart can suffer from a variety of diseases, including cardiac arrhythmias. Arrhythmia is an irregular heart rhythm that in severe cases can lead to heart stroke and can be diagnosed via ECG recordings. Since early detection of cardiac arrhythmias is of great importance, computerized and automated classification and identification of these abnormal heart signals have received much attention for the past decades. Methods: This paper introduces a light deep learning approach for high accuracy detection of 8 different cardiac arrhythmias and normal rhythm. To leverage deep learning method, resampling and baseline wander removal techniques are applied to ECG signals. In this study, 500 sample ECG segments were used as model inputs. The rhythm classification was done by an 11-layer network in an end-to-end manner without the need for hand-crafted manual feature extraction. Results: In order to evaluate the proposed technique, ECG signals are chosen from the two physionet databases, the MIT-BIH arrhythmia database and the long-term AF database. The proposed deep learning framework based on the combination of Convolutional Neural Network(CNN) and Long Short Term Memory (LSTM) showed promising results than most of the state-of-the-art methods. The proposed method reaches the mean diagnostic accuracy of 98.24%. Conclusion: A trained model for arrhythmia classification using diverse ECG signals were successfully developed and tested. Significance: Since the present work uses a light classification technique with high diagnostic accuracy compared to other notable methods, it could successfully be implemented in holter monitor devices for arrhythmia detection.
    Reducing The Amortization Gap of Entropy Bottleneck In End-to-End Image Compression. (arXiv:2209.00964v1 [eess.IV])
    End-to-end deep trainable models are about to exceed the performance of the traditional handcrafted compression techniques on videos and images. The core idea is to learn a non-linear transformation, modeled as a deep neural network, mapping input image into latent space, jointly with an entropy model of the latent distribution. The decoder is also learned as a deep trainable network, and the reconstructed image measures the distortion. These methods enforce the latent to follow some prior distributions. Since these priors are learned by optimization over the entire training set, the performance is optimal in average. However, it cannot fit exactly on every single new instance, hence damaging the compression performance by enlarging the bit-stream. In this paper, we propose a simple yet efficient instance-based parameterization method to reduce this amortization gap at a minor cost. The proposed method is applicable to any end-to-end compressing methods, improving the compression bitrate by 1% without any impact on the reconstruction quality.
    Classification of eye-state using EEG recordings: speed-up gains using signal epochs and mutual information measure. (arXiv:2209.01023v1 [eess.SP])
    The classification of electroencephalography (EEG) signals is useful in a wide range of applications such as seizure detection/prediction, motor imagery classification, emotion classification and drug effects diagnosis, amongst others. With the large number of EEG channels acquired, it has become vital that efficient data-reduction methods are developed, with varying importance from one application to another. It is also important that online classification is achieved during EEG recording for many applications, to monitor changes as they happen. In this paper we introduce a method based on Mutual Information (MI), for channel selection. Obtained results show that whilst there is a penalty on classification accuracy scores, promising speed-up gains can be achieved using MI techniques. Using MI with signal epochs (3secs) containing signal transitions enhances these speed-up gains. This work is exploratory and we suggest further research to be carried out for validation and development. Benefits to improving classification speed include improving application in clinical or educational settings.  ( 2 min )
    SATformer: Transformers for SAT Solving. (arXiv:2209.00953v1 [cs.AI])
    In this paper, we propose SATformer, a novel Transformer-based solution for Boolean satisfiability (SAT) solving. Different from existing learning-based SAT solvers that learn at the problem instance level, SATformer learns the minimum unsatisfiable cores (MUC) of unsatisfiable problem instances, which provide rich information for the causality of such problems. Specifically, we apply a graph neural network (GNN) to obtain the embeddings of the clauses in the conjunctive normal format (CNF). A hierarchical Transformer architecture is applied on the clause embeddings to capture the relationships among clauses, and the self-attention weight is learned to be high when those clauses forming UNSAT cores are attended together, and set to be low otherwise. By doing so, SATformer effectively learns the correlations among clauses for SAT prediction. Experimental results show that SATformer is more powerful than existing end-to-end learning-based SAT solvers.  ( 2 min )
    IMG2IMU: Applying Knowledge from Large-Scale Images to IMU Applications via Contrastive Learning. (arXiv:2209.00945v1 [cs.LG])
    Recent advances in machine learning showed that pre-training representations acquired via self-supervised learning could achieve high accuracy on tasks with small training data. Unlike in vision and natural language processing domains, such pre-training for IMU-based applications is challenging, as there are only a few publicly available datasets with sufficient size and diversity to learn generalizable representations. To overcome this problem, we propose IMG2IMU, a novel approach that adapts pre-train representation from large-scale images to diverse few-shot IMU sensing tasks. We convert the sensor data into visually interpretable spectrograms for the model to utilize the knowledge gained from vision. Further, we apply contrastive learning on an augmentation set we designed to learn representations that are tailored to interpreting sensor data. Our extensive evaluations on five different IMU sensing tasks show that IMG2IMU consistently outperforms the baselines, illustrating that vision knowledge can be incorporated into a few-shot learning environment for IMU sensing tasks.  ( 2 min )
    A Class-Aware Representation Refinement Framework for Graph Classification. (arXiv:2209.00936v1 [cs.LG])
    Graph Neural Networks (GNNs) are widely used for graph representation learning. Despite its prevalence, GNN suffers from two drawbacks in the graph classification task, the neglect of graph-level relationships, and the generalization issue. Each graph is treated separately in GNN message passing/graph pooling, and existing methods to address overfitting operate on each individual graph. This makes the graph representations learnt less effective in the downstream classification. In this paper, we propose a Class-Aware Representation rEfinement (CARE) framework for the task of graph classification. CARE computes simple yet powerful class representations and injects them to steer the learning of graph representations towards better class separability. CARE is a plug-and-play framework that is highly flexible and able to incorporate arbitrary GNN backbones without significantly increasing the computational cost. We also theoretically prove that CARE has a better generalization upper bound than its GNN backbone through Vapnik-Chervonenkis (VC) dimension analysis. Our extensive experiments with 10 well-known GNN backbones on 9 benchmark datasets validate the superiority and effectiveness of CARE over its GNN counterparts.  ( 2 min )
    Tweaking Metasploit to Evade Encrypted C2 Traffic Detection. (arXiv:2209.00943v1 [cs.CR])
    Command and Control (C2) communication is a key component of any structured cyber-attack. As such, security operations actively try to detect this type of communication in their networks. This poses a problem for legitimate pentesters that try to remain undetected, since commonly used pentesting tools, such as Metasploit, generate constant traffic patterns that are easily distinguishable from regular web traffic. In this paper we start with these identifiable patterns in Metasploit's C2 traffic and show that a machine learning-based detector is able to detect the presence of such traffic with high accuracy, even when encrypted. We then outline and implement a set of modifications to the Metasploit framework in order to decrease the detection rates of such classifier. To evaluate the performance of these modifications, we use two threat models with increasing awareness of these modifications. We look at the detection evasion performance and at the byte count and runtime overhead of the modifications. Our results show that for the second, increased-awareness threat model the framework-side traffic modifications yield a better detection avoidance rate (90%) than payload-side only modifications (50%). We also show that although the modifications use up to 3 times more TLS payload bytes than the original, the runtime does not significantly change and the total number of bytes (including TLS payload) reduces.  ( 3 min )
    Mind the Gap! Injecting Commonsense Knowledge for Abstractive Dialogue Summarization. (arXiv:2209.00930v1 [cs.CL])
    In this paper, we propose to leverage the unique characteristics of dialogues sharing commonsense knowledge across participants, to resolve the difficulties in summarizing them. We present SICK, a framework that uses commonsense inferences as additional context. Compared to previous work that solely relies on the input dialogue, SICK uses an external knowledge model to generate a rich set of commonsense inferences and selects the most probable one with a similarity-based selection method. Built upon SICK, SICK++ utilizes commonsense as supervision, where the task of generating commonsense inferences is added upon summarizing the dialogue in a multi-task learning setting. Experimental results show that with injected commonsense knowledge, our framework generates more informative and consistent summaries than existing methods.  ( 2 min )
    TB or not TB? Acoustic cough analysis for tuberculosis classification. (arXiv:2209.00934v1 [eess.AS])
    In this work, we explore recurrent neural network architectures for tuberculosis (TB) cough classification. In contrast to previous unsuccessful attempts to implement deep architectures in this domain, we show that a basic bidirectional long short-term memory network (BiLSTM) can achieve improved performance. In addition, we show that by performing greedy feature selection in conjunction with a newly-proposed attention-based architecture that learns patient invariant features, substantially better generalisation can be achieved compared to a baseline and other considered architectures. Furthermore, this attention mechanism allows an inspection of the temporal regions of the audio signal considered to be important for classification to be performed. Finally, we develop a neural style transfer technique to infer idealised inputs which can subsequently be analysed. We find distinct differences between the idealised power spectra of TB and non-TB coughs, which provide clues about the origin of the features in the audio signal.  ( 2 min )
    An Introduction to Machine Unlearning. (arXiv:2209.00939v1 [cs.LG])
    Removing the influence of a specified subset of training data from a machine learning model may be required to address issues such as privacy, fairness, and data quality. Retraining the model from scratch on the remaining data after removal of the subset is an effective but often infeasible option, due to its computational expense. The past few years have therefore seen several novel approaches towards efficient removal, forming the field of "machine unlearning", however, many aspects of the literature published thus far are disparate and lack consensus. In this paper, we summarise and compare seven state-of-the-art machine unlearning algorithms, consolidate definitions of core concepts used in the field, reconcile different approaches for evaluating algorithms, and discuss issues related to applying machine unlearning in practice.  ( 2 min )
    Multi-modal Contrastive Representation Learning for Entity Alignment. (arXiv:2209.00891v1 [cs.CL])
    Multi-modal entity alignment aims to identify equivalent entities between two different multi-modal knowledge graphs, which consist of structural triples and images associated with entities. Most previous works focus on how to utilize and encode information from different modalities, while it is not trivial to leverage multi-modal knowledge in entity alignment because of the modality heterogeneity. In this paper, we propose MCLEA, a Multi-modal Contrastive Learning based Entity Alignment model, to obtain effective joint representations for multi-modal entity alignment. Different from previous works, MCLEA considers task-oriented modality and models the inter-modal relationships for each entity representation. In particular, MCLEA firstly learns multiple individual representations from multiple modalities, and then performs contrastive learning to jointly model intra-modal and inter-modal interactions. Extensive experimental results show that MCLEA outperforms state-of-the-art baselines on public datasets under both supervised and unsupervised settings.  ( 2 min )
    Introducing dynamical constraints into representation learning. (arXiv:2209.00905v1 [cs.LG])
    While representation learning has been central to the rise of machine learning and artificial intelligence, a key problem remains in making the learnt representations meaningful. For this the typical approach is to regularize the learned representation through prior probability distributions. However such priors are usually unavailable or ad hoc. To deal with this, we propose a dynamics-constrained representation learning framework. Instead of using predefined probabilities, we restrict the latent representation to follow specific dynamics, which is a more natural constraint for representation learning in dynamical systems. Our belief stems from a fundamental observation in physics that though different systems can have different marginalized probability distributions, they typically obey the same dynamics, such as Newton's and Schrodinger's equations. We validate our framework for different systems including a real-world fluorescent DNA movie dataset. We show that our algorithm can uniquely identify an uncorrelated, isometric and meaningful latent representation.  ( 3 min )
    Optimistic Optimization of Gaussian Process Samples. (arXiv:2209.00895v1 [cs.LG])
    Bayesian optimization is a popular formalism for global optimization, but its computational costs limit it to expensive-to-evaluate functions. A competing, computationally more efficient, global optimization framework is optimistic optimization, which exploits prior knowledge about the geometry of the search space in form of a dissimilarity function. We investigate to which degree the conceptual advantages of Bayesian Optimization can be combined with the computational efficiency of optimistic optimization. By mapping the kernel to a dissimilarity, we obtain an optimistic optimization algorithm for the Bayesian Optimization setting with a run-time of up to $\mathcal{O}(N \log N)$. As a high-level take-away we find that, when using stationary kernels on objectives of relatively low evaluation cost, optimistic optimization can be strongly preferable over Bayesian optimization, while for strongly coupled and parametric models, good implementations of Bayesian optimization can perform much better, even at low evaluation cost. We argue that there is a new research domain between geometric and probabilistic search, i.e. methods that run drastically faster than traditional Bayesian optimization, while retaining some of the crucial functionality of Bayesian optimization.  ( 2 min )
    Detection of diabetic retinopathy using longitudinal self-supervised learning. (arXiv:2209.00915v1 [cs.CV])
    Longitudinal imaging is able to capture both static anatomical structures and dynamic changes in disease progression towards earlier and better patient-specific pathology management. However, conventional approaches for detecting diabetic retinopathy (DR) rarely take advantage of longitudinal information to improve DR analysis. In this work, we investigate the benefit of exploiting self-supervised learning with a longitudinal nature for DR diagnosis purposes. We compare different longitudinal self-supervised learning (LSSL) methods to model the disease progression from longitudinal retinal color fundus photographs (CFP) to detect early DR severity changes using a pair of consecutive exams. The experiments were conducted on a longitudinal DR screening dataset with or without those trained encoders (LSSL) acting as a longitudinal pretext task. Results achieve an AUC of 0.875 for the baseline (model trained from scratch) and an AUC of 0.96 (95% CI: 0.9593-0.9655 DeLong test) with a p-value < 2.2e-16 on early fusion using a simple ResNet alike architecture with frozen LSSL weights, suggesting that the LSSL latent space enables to encode the dynamic of DR progression.  ( 2 min )
    Instance-Dependent Noisy Label Learning via Graphical Modelling. (arXiv:2209.00906v1 [cs.CV])
    Noisy labels are unavoidable yet troublesome in the ecosystem of deep learning because models can easily overfit them. There are many types of label noise, such as symmetric, asymmetric and instance-dependent noise (IDN), with IDN being the only type that depends on image information. Such dependence on image information makes IDN a critical type of label noise to study, given that labelling mistakes are caused in large part by insufficient or ambiguous information about the visual classes present in images. Aiming to provide an effective technique to address IDN, we present a new graphical modelling approach called InstanceGM, that combines discriminative and generative models. The main contributions of InstanceGM are: i) the use of the continuous Bernoulli distribution to train the generative model, offering significant training advantages, and ii) the exploration of a state-of-the-art noisy-label discriminative classifier to generate clean labels from instance-dependent noisy-label samples. InstanceGM is competitive with current noisy-label learning approaches, particularly in IDN benchmarks using synthetic and real-world datasets, where our method shows better accuracy than the competitors in most experiments.  ( 2 min )
    A Discussion of Discrimination and Fairness in Insurance Pricing. (arXiv:2209.00858v1 [cs.LG])
    Indirect discrimination is an issue of major concern in algorithmic models. This is particularly the case in insurance pricing where protected policyholder characteristics are not allowed to be used for insurance pricing. Simply disregarding protected policyholder information is not an appropriate solution because this still allows for the possibility of inferring the protected characteristics from the non-protected ones. This leads to so-called proxy or indirect discrimination. Though proxy discrimination is qualitatively different from the group fairness concepts in machine learning, these group fairness concepts are proposed to 'smooth out' the impact of protected characteristics in the calculation of insurance prices. The purpose of this note is to share some thoughts about group fairness concepts in the light of insurance pricing and to discuss their implications. We present a statistical model that is free of proxy discrimination, thus, unproblematic from an insurance pricing point of view. However, we find that the canonical price in this statistical model does not satisfy any of the three most popular group fairness axioms. This seems puzzling and we welcome feedback on our example and on the usefulness of these group fairness axioms for non-discriminatory insurance pricing.  ( 2 min )
    PulseDL-II: A System-on-Chip Neural Network Accelerator for Timing and Energy Extraction of Nuclear Detector Signals. (arXiv:2209.00884v1 [physics.ins-det])
    Front-end electronics equipped with high-speed digitizers are being used and proposed for future nuclear detectors. Recent literature reveals that deep learning models, especially one-dimensional convolutional neural networks, are promising when dealing with digital signals from nuclear detectors. Simulations and experiments demonstrate the satisfactory accuracy and additional benefits of neural networks in this area. However, specific hardware accelerating such models for online operations still needs to be studied. In this work, we introduce PulseDL-II, a system-on-chip (SoC) specially designed for applications of event feature (time, energy, etc.) extraction from pulses with deep learning. Based on the previous version, PulseDL-II incorporates a RISC CPU into the system structure for better functional flexibility and integrity. The neural network accelerator in the SoC adopts a three-level (arithmetic unit, processing element, neural network) hierarchical architecture and facilitates parameter optimization of the digital design. Furthermore, we devise a quantization scheme and associated implementation methods (rescale & bit-shift) for full compatibility with deep learning frameworks (e.g., TensorFlow) within a selected subset of layer types. With the current scheme, the quantization-aware training of neural networks is supported, and network models are automatically transformed into software of RISC CPU by dedicated scripts, with nearly no loss of accuracy. We validate PulseDL-II on field programmable gate arrays (FPGA). Finally, system validation is done with an experimental setup made up of a direct digital synthesis (DDS) signal generator and an FPGA development board with analog-to-digital converters (ADC). The proposed system achieved 60 ps time resolution and 0.40% energy resolution with online neural network inference at signal to noise ratio (SNR) of 47.4 dB.  ( 3 min )
    Regret Analysis of Dyadic Search. (arXiv:2209.00885v1 [cs.LG])
    We analyze the cumulative regret of the Dyadic Search algorithm of Bachoc et al. [2022].  ( 2 min )
    Three Learning Stages and Accuracy-Efficiency Tradeoff of Restricted Boltzmann Machines. (arXiv:2209.00873v1 [cs.LG])
    Restricted Boltzmann Machines (RBMs) offer a versatile architecture for unsupervised machine learning that can in principle approximate any target probability distribution with arbitrary accuracy. However, the RBM model is usually not directly accessible due to its computational complexity, and Markov-chain sampling is invoked to analyze the learned probability distribution. For training and eventual applications, it is thus desirable to have a sampler that is both accurate and efficient. We highlight that these two goals generally compete with each other and cannot be achieved simultaneously. More specifically, we identify and quantitatively characterize three regimes of RBM learning: independent learning, where the accuracy improves without losing efficiency; correlation learning, where higher accuracy entails lower efficiency; and degradation, where both accuracy and efficiency no longer improve or even deteriorate. These findings are based on numerical experiments and heuristic arguments.  ( 2 min )
    Diffusion-based Molecule Generation with Informative Prior Bridges. (arXiv:2209.00865v1 [cs.LG])
    AI-based molecule generation provides a promising approach to a large area of biomedical sciences and engineering, such as antibody design, hydrolase engineering, or vaccine development. Because the molecules are governed by physical laws, a key challenge is to incorporate prior information into the training procedure to generate high-quality and realistic molecules. We propose a simple and novel approach to steer the training of diffusion-based generative models with physical and statistics prior information. This is achieved by constructing physically informed diffusion bridges, stochastic processes that guarantee to yield a given observation at the fixed terminal time. We develop a Lyapunov function based method to construct and determine bridges, and propose a number of proposals of informative prior bridges for both high-quality molecule generation and uniformity-promoted 3D point cloud generation. With comprehensive experiments, we show that our method provides a powerful approach to the 3D generation task, yielding molecule structures with better quality and stability scores and more uniformly distributed point clouds of high qualities.  ( 2 min )
    Diffusion Models: A Comprehensive Survey of Methods and Applications. (arXiv:2209.00796v1 [cs.LG])
    Diffusion models are a class of deep generative models that have shown impressive results on various tasks with dense theoretical founding. Although diffusion models have achieved impressive quality and diversity of sample synthesis than other state-of-the-art models, they still suffer from costly sampling procedure and sub-optimal likelihood estimation. Recent studies have shown great enthusiasm on improving the performance of diffusion model. In this article, we present a first comprehensive review of existing variants of the diffusion models. Specifically, we provide a first taxonomy of diffusion models and categorize them variants to three types, namely sampling-acceleration enhancement, likelihood-maximization enhancement and data-generalization enhancement. We also introduce in detail other five generative models (i.e., variational autoencoders, generative adversarial networks, normalizing flow, autoregressive models, and energy-based models), and clarify the connections between diffusion models and these generative models. Then we make a thorough investigation into the applications of diffusion models, including computer vision, natural language processing, waveform signal processing, multi-modal modeling, molecular graph generation, time series modeling, and adversarial purification. Furthermore, we propose new perspectives pertaining to the development of this generative model.  ( 2 min )
    Human Activity Recognition on Microcontrollers with Quantized and Adaptive Deep Neural Networks. (arXiv:2209.00839v1 [cs.LG])
    Human Activity Recognition (HAR) based on inertial data is an increasingly diffused task on embedded devices, from smartphones to ultra low-power sensors. Due to the high computational complexity of deep learning models, most embedded HAR systems are based on simple and not-so-accurate classic machine learning algorithms. This work bridges the gap between on-device HAR and deep learning, proposing a set of efficient one-dimensional Convolutional Neural Networks (CNNs) deployable on general purpose microcontrollers (MCUs). Our CNNs are obtained combining hyper-parameters optimization with sub-byte and mixed-precision quantization, to find good trade-offs between classification results and memory occupation. Moreover, we also leverage adaptive inference as an orthogonal optimization to tune the inference complexity at runtime based on the processed input, hence producing a more flexible HAR system. With experiments on four datasets, and targeting an ultra-low-power RISC-V MCU, we show that (i) We are able to obtain a rich set of Pareto-optimal CNNs for HAR, spanning more than 1 order of magnitude in terms of memory, latency and energy consumption; (ii) Thanks to adaptive inference, we can derive >20 runtime operating modes starting from a single CNN, differing by up to 10% in classification scores and by more than 3x in inference complexity, with a limited memory overhead; (iii) on three of the four benchmarks, we outperform all previous deep learning methods, reducing the memory occupation by more than 100x. The few methods that obtain better performance (both shallow and deep) are not compatible with MCU deployment. (iv) All our CNNs are compatible with real-time on-device HAR with an inference latency <16ms. Their memory occupation varies in 0.05-23.17 kB, and their energy consumption in 0.005 and 61.59 uJ, allowing years of continuous operation on a small battery supply.  ( 3 min )
    Rethinking Efficiency and Redundancy in Training Large-scale Graphs. (arXiv:2209.00800v1 [cs.LG])
    Large-scale graphs are ubiquitous in real-world scenarios and can be trained by Graph Neural Networks (GNNs) to generate representation for downstream tasks. Given the abundant information and complex topology of a large-scale graph, we argue that redundancy exists in such graphs and will degrade the training efficiency. Unfortunately, the model scalability severely restricts the efficiency of training large-scale graphs via vanilla GNNs. Despite recent advances in sampling-based training methods, sampling-based GNNs generally overlook the redundancy issue. It still takes intolerable time to train these models on large-scale graphs. Thereby, we propose to drop redundancy and improve efficiency of training large-scale graphs with GNNs, by rethinking the inherent characteristics in a graph. In this paper, we pioneer to propose a once-for-all method, termed DropReef, to drop the redundancy in large-scale graphs. Specifically, we first conduct preliminary experiments to explore potential redundancy in large-scale graphs. Next, we present a metric to quantify the neighbor heterophily of all nodes in a graph. Based on both experimental and theoretical analysis, we reveal the redundancy in a large-scale graph, i.e., nodes with high neighbor heterophily and a great number of neighbors. Then, we propose DropReef to detect and drop the redundancy in large-scale graphs once and for all, helping reduce the training time while ensuring no sacrifice in the model accuracy. To demonstrate the effectiveness of DropReef, we apply it to recent state-of-the-art sampling-based GNNs for training large-scale graphs, owing to the high precision of such models. With DropReef leveraged, the training efficiency of models can be greatly promoted. DropReef is highly compatible and is offline performed, benefiting the state-of-the-art sampling-based GNNs in the present and future to a significant extent.  ( 3 min )
    An Explainer for Temporal Graph Neural Networks. (arXiv:2209.00807v1 [cs.LG])
    Temporal graph neural networks (TGNNs) have been widely used for modeling time-evolving graph-related tasks due to their ability to capture both graph topology dependency and non-linear temporal dynamic. The explanation of TGNNs is of vital importance for a transparent and trustworthy model. However, the complex topology structure and temporal dependency make explaining TGNN models very challenging. In this paper, we propose a novel explainer framework for TGNN models. Given a time series on a graph to be explained, the framework can identify dominant explanations in the form of a probabilistic graphical model in a time period. Case studies on the transportation domain demonstrate that the proposed approach can discover dynamic dependency structures in a road network for a time period.  ( 2 min )
    TarGF: Learning Target Gradient Field for Object Rearrangement. (arXiv:2209.00853v1 [cs.LG])
    Object Rearrangement is to move objects from an initial state to a goal state. Here, we focus on a more practical setting in object rearrangement, i.e., rearranging objects from shuffled layouts to a normative target distribution without explicit goal specification. However, it remains challenging for AI agents, as it is hard to describe the target distribution (goal specification) for reward engineering or collect expert trajectories as demonstrations. Hence, it is infeasible to directly employ reinforcement learning or imitation learning algorithms to address the task. This paper aims to search for a policy only with a set of examples from a target distribution instead of a handcrafted reward function. We employ the score-matching objective to train a Target Gradient Field (TarGF), indicating a direction on each object to increase the likelihood of the target distribution. For object rearrangement, the TarGF can be used in two ways: 1) For model-based planning, we can cast the target gradient into a reference control and output actions with a distributed path planner; 2) For model-free reinforcement learning, the TarGF is not only used for estimating the likelihood-change as a reward but also provides suggested actions in residual policy learning. Experimental results in ball rearrangement and room rearrangement demonstrate that our method significantly outperforms the state-of-the-art methods in the quality of the terminal state, the efficiency of the control process, and scalability. The code and demo videos are on our project website.  ( 3 min )
    Optimal Diagonal Preconditioning: Theory and Practice. (arXiv:2209.00809v1 [math.OC])
    Preconditioning has been a staple technique in optimization and machine learning. It often reduces the condition number of the matrix it is applied to, thereby speeding up convergence of optimization algorithms. Although there are many popular preconditioning techniques in practice, most lack theoretical guarantees for reductions in condition number. In this paper, we study the problem of optimal diagonal preconditioning to achieve maximal reduction in the condition number of any full-rank matrix by scaling its rows or columns separately or simultaneously. We first reformulate the problem as a quasi-convex problem and provide a baseline bisection algorithm that is easy to implement in practice, where each iteration consists of an SDP feasibility problem. Then we propose a polynomial time potential reduction algorithm with $O(\log(\frac{1}{\epsilon}))$ iteration complexity, where each iteration consists of a Newton update based on the Nesterov-Todd direction. Our algorithm is based on a formulation of the problem which is a generalized version of the Von Neumann optimal growth problem. Next, we specialize to one-sided optimal diagonal preconditioning problems, and demonstrate that they can be formulated as standard dual SDP problems, to which we apply efficient customized solvers and study the empirical performance of our optimal diagonal preconditioners. Our extensive experiments on large matrices demonstrate the practical appeal of optimal diagonal preconditioners at reducing condition numbers compared to heuristics-based preconditioners.  ( 3 min )
    Domain Adaptation from Scratch. (arXiv:2209.00830v1 [cs.CL])
    Natural language processing (NLP) algorithms are rapidly improving but often struggle when applied to out-of-distribution examples. A prominent approach to mitigate the domain gap is domain adaptation, where a model trained on a source domain is adapted to a new target domain. We present a new learning setup, ``domain adaptation from scratch'', which we believe to be crucial for extending the reach of NLP to sensitive domains in a privacy-preserving manner. In this setup, we aim to efficiently annotate data from a set of source domains such that the trained model performs well on a sensitive target domain from which data is unavailable for annotation. Our study compares several approaches for this challenging setup, ranging from data selection and domain adaptation algorithms to active learning paradigms, on two NLP tasks: sentiment analysis and Named Entity Recognition. Our results suggest that using the abovementioned approaches eases the domain gap, and combining them further improves the results.  ( 2 min )
    TypoSwype: An Imaging Approach to Detect Typo-Squatting. (arXiv:2209.00783v1 [cs.CR])
    Typo-squatting domains are a common cyber-attack technique. It involves utilising domain names, that exploit possible typographical errors of commonly visited domains, to carry out malicious activities such as phishing, malware installation, etc. Current approaches typically revolve around string comparison algorithms like the Demaru-Levenschtein Distance (DLD) algorithm. Such techniques do not take into account keyboard distance, which researchers find to have a strong correlation with typical typographical errors and are trying to take account of. In this paper, we present the TypoSwype framework which converts strings to images that take into account keyboard location innately. We also show how modern state of the art image recognition techniques involving Convolutional Neural Networks, trained via either Triplet Loss or NT-Xent Loss, can be applied to learn a mapping to a lower dimensional space where distances correspond to image, and equivalently, textual similarity. Finally, we also demonstrate our method's ability to improve typo-squatting detection over the widely used DLD algorithm, while maintaining the classification accuracy as to which domain the input domain was attempting to typo-squat.  ( 2 min )
    Exact Decomposition of Quantum Channels for Non-IID Quantum Federated Learning. (arXiv:2209.00768v1 [quant-ph])
    Federated learning refers to the task of performing machine learning with decentralized data from multiple clients while protecting data security and privacy. Works have been done to incorporate quantum advantage in such scenarios. However, when the clients' data are not independent and identically distributed (IID), the performance of conventional federated algorithms deteriorates. In this work, we explore this phenomenon in the quantum regime with both theoretical and numerical analysis. We further prove that a global quantum channel can be exactly decomposed into channels trained by each client with the help of local density estimators. It leads to a general framework for quantum federated learning on non-IID data with one-shot communication complexity. We demonstrate it on classification tasks with numerical simulations.  ( 2 min )
    BinImg2Vec: Augmenting Malware Binary Image Classification with Data2Vec. (arXiv:2209.00782v1 [cs.CR])
    Rapid digitalisation spurred by the Covid-19 pandemic has resulted in more cyber crime. Malware-as-a-service is now a booming business for cyber criminals. With the surge in malware activities, it is vital for cyber defenders to understand more about the malware samples they have at hand as such information can greatly influence their next course of actions during a breach. Recently, researchers have shown how malware family classification can be done by first converting malware binaries into grayscale images and then passing them through neural networks for classification. However, most work focus on studying the impact of different neural network architectures on classification performance. In the last year, researchers have shown that augmenting supervised learning with self-supervised learning can improve performance. Even more recently, Data2Vec was proposed as a modality agnostic self-supervised framework to train neural networks. In this paper, we present BinImg2Vec, a framework of training malware binary image classifiers that incorporates both self-supervised learning and supervised learning to produce a model that consistently outperforms one trained only via supervised learning. We were able to achieve a 4% improvement in classification performance and a 0.5% reduction in performance variance over multiple runs. We also show how our framework produces embeddings that can be well clustered, facilitating model explanability.  ( 3 min )
    Structure-Preserving Graph Representation Learning. (arXiv:2209.00793v1 [cs.LG])
    Though graph representation learning (GRL) has made significant progress, it is still a challenge to extract and embed the rich topological structure and feature information in an adequate way. Most existing methods focus on local structure and fail to fully incorporate the global topological structure. To this end, we propose a novel Structure-Preserving Graph Representation Learning (SPGRL) method, to fully capture the structure information of graphs. Specifically, to reduce the uncertainty and misinformation of the original graph, we construct a feature graph as a complementary view via k-Nearest Neighbor method. The feature graph can be used to contrast at node-level to capture the local relation. Besides, we retain the global topological structure information by maximizing the mutual information (MI) of the whole graph and feature embeddings, which is theoretically reduced to exchanging the feature embeddings of the feature and the original graphs to reconstruct themselves. Extensive experiments show that our method has quite superior performance on semi-supervised node classification task and excellent robustness under noise perturbation on graph structure or node features.  ( 2 min )
    Index Tracking via Learning to Predict Market Sensitivities. (arXiv:2209.00780v1 [q-fin.PM])
    A significant number of equity funds are preferred by index funds nowadays, and market sensitivities are instrumental in managing them. Index funds might replicate the index identically, which is, however, cost-ineffective and impractical. Moreover, to utilize market sensitivities to replicate the index partially, they must be predicted or estimated accurately. Accordingly, first, we examine deep learning models to predict market sensitivities. Also, we present pragmatic applications of data processing methods to aid training and generate target data for the prediction. Then, we propose a partial-index-tracking optimization model controlling the net predicted market sensitivities of the portfolios and index to be the same. These processes' efficacy is corroborated by the Korea Stock Price Index 200. Our experiments show a significant reduction of the prediction errors compared with historical estimations, and competitive tracking errors of replicating the index using fewer than half of the entire constituents. Therefore, we show that applying deep learning to predict market sensitivities is promising and that our portfolio construction methods are practically effective. Additionally, to our knowledge, this is the first study that addresses market sensitivities focused on deep learning.  ( 2 min )
    Deep reinforcement learning for quantum multiparameter estimation. (arXiv:2209.00671v1 [quant-ph])
    Estimation of physical quantities is at the core of most scientific research and the use of quantum devices promises to enhance its performances. In real scenarios, it is fundamental to consider that the resources are limited and Bayesian adaptive estimation represents a powerful approach to efficiently allocate, during the estimation process, all the available resources. However, this framework relies on the precise knowledge of the system model, retrieved with a fine calibration that often results computationally and experimentally demanding. Here, we introduce a model-free and deep learning-based approach to efficiently implement realistic Bayesian quantum metrology tasks accomplishing all the relevant challenges, without relying on any a-priori knowledge on the system. To overcome this need, a neural network is trained directly on experimental data to learn the multiparameter Bayesian update. Then, the system is set at its optimal working point through feedbacks provided by a reinforcement learning algorithm trained to reconstruct and enhance experiment heuristics of the investigated quantum sensor. Notably, we prove experimentally the achievement of higher estimation performances than standard methods, demonstrating the strength of the combination of these two black-box algorithms on an integrated photonic circuit. This work represents an important step towards fully artificial intelligence-based quantum metrology.  ( 2 min )
    Self-supervised Representation Learning on Electronic Health Records with Graph Kernel Infomax. (arXiv:2209.00655v1 [cs.LG])
    Learning Electronic Health Records (EHRs) representation is a preeminent yet under-discovered research topic. It benefits various clinical decision support applications, e.g., medication outcome prediction or patient similarity search. Current approaches focus on task-specific label supervision on vectorized sequential EHR, which is not applicable to large-scale unsupervised scenarios. Recently, contrastive learning shows great success on self-supervised representation learning problems. However, complex temporality often degrades the performance. We propose Graph Kernel Infomax, a self-supervised graph kernel learning approach on the graphical representation of EHR, to overcome the previous problems. Unlike the state-of-the-art, we do not change the graph structure to construct augmented views. Instead, we use Kernel Subspace Augmentation to embed nodes into two geometrically different manifold views. The entire framework is trained by contrasting nodes and graph representations on those two manifold views through the commonly used contrastive objectives. Empirically, using publicly available benchmark EHR datasets, our approach yields performance on clinical downstream tasks that exceeds the state-of-the-art. Theoretically, the variation on distance metrics naturally creates different views as data augmentation without changing graph structures.  ( 2 min )
    MIME: Minority Inclusion for Majority Group Enhancement of AI Performance. (arXiv:2209.00746v1 [cs.LG])
    Several papers have rightly included minority groups in artificial intelligence (AI) training data to improve test inference for minority groups and/or society-at-large. A society-at-large consists of both minority and majority stakeholders. A common misconception is that minority inclusion does not increase performance for majority groups alone. In this paper, we make the surprising finding that including minority samples can improve test error for the majority group. In other words, minority group inclusion leads to majority group enhancements (MIME) in performance. A theoretical existence proof of the MIME effect is presented and found to be consistent with experimental results on six different datasets. Project webpage: https://visual.ee.ucla.edu/mime.htm/  ( 2 min )
    Optimizing the Performative Risk under Weak Convexity Assumptions. (arXiv:2209.00771v1 [cs.LG])
    In performative prediction, a predictive model impacts the distribution that generates future data, a phenomenon that is being ignored in classical supervised learning. In this closed-loop setting, the natural measure of performance, denoted the performative risk, captures the expected loss incurred by a predictive model after deployment. The core difficulty of minimizing the performative risk is that the data distribution itself depends on the model parameters. This dependence is governed by the environment and not under the control of the learner. As a consequence, even the choice of a convex loss function can result in a highly non-convex performative risk minimization problem. Prior work has identified a pair of general conditions on the loss and the mapping from model parameters to distributions that implies convexity of the performative risk. In this paper, we relax these assumptions and focus on obtaining weaker notions of convexity, without sacrificing the amenability of the performative risk minimization problem for iterative optimization methods.  ( 2 min )
    Towards Optimization and Model Selection for Domain Generalization: A Mixup-guided Solution. (arXiv:2209.00652v1 [cs.LG])
    The distribution shifts between training and test data typically undermine the performance of deep learning models. In recent years, lots of work pays attention to domain generalization (DG) where distribution shift exists and target data are unseen. Despite the progress in algorithm design, two foundational factors have long been ignored: 1) the optimization for regularization-based objectives (e.g., distribution alignment), and 2) the model selection for DG since no knowledge about the target domain can be utilized. In this paper, we propose Mixup guided optimization and selection techniques for domain generalization. For optimization, we utilize an adapted Mixup to generate an out-of-distribution dataset that can guide the preference direction and optimize with Pareto optimization. For model selection, we generate a validation dataset with a closer distance to the target distribution, and thereby it can better represent the target data. We also present some theoretical insights behind our proposals. Comprehensive experiments on one visual classification benchmark and three time-series benchmarks demonstrate that our model optimization and selection techniques can largely improve the performance of existing domain generalization algorithms and even achieve new state-of-the-art results.  ( 2 min )
    Zero-Shot Multi-Modal Artist-Controlled Retrieval and Exploration of 3D Object Sets. (arXiv:2209.00682v1 [cs.CV])
    When creating 3D content, highly specialized skills are generally needed to design and generate models of objects and other assets by hand. We address this problem through high-quality 3D asset retrieval from multi-modal inputs, including 2D sketches, images and text. We use CLIP as it provides a bridge to higher-level latent features. We use these features to perform a multi-modality fusion to address the lack of artistic control that affects common data-driven approaches. Our approach allows for multi-modal conditional feature-driven retrieval through a 3D asset database, by utilizing a combination of input latent embeddings. We explore the effects of different combinations of feature embeddings across different input types and weighting methods.  ( 2 min )
    Making Intelligence: Ethics, IQ, and ML Benchmarks. (arXiv:2209.00692v1 [cs.LG])
    The ML community recognizes the importance of anticipating and mitigating the potential negative impacts of benchmark research. In this position paper, we argue that more attention needs to be paid to areas of ethical risk that lie at the technical and scientific core of ML benchmarks. We identify overlooked structural similarities between human IQ and ML benchmarks. Human intelligence and ML benchmarks share similarities in setting standards for describing, evaluating and comparing performance on tasks relevant to intelligence. This enables us to unlock lessons from feminist philosophy of science scholarship that need to be considered by the ML benchmark community. Finally, we outline practical recommendations for benchmark research ethics and ethics review.  ( 2 min )
    Exploring traditional machine learning for identification of pathological auscultations. (arXiv:2209.00672v1 [cs.LG])
    Today, data collection has improved in various areas, and the medical domain is no exception. Auscultation, as an important diagnostic technique for physicians, due to the progress and availability of digital stethoscopes, lends itself well to applications of machine learning. Due to the large number of auscultations performed, the availability of data opens up an opportunity for more effective analysis of sounds where prognostic accuracy even among experts remains low. In this study, digital 6-channel auscultations of 45 patients were used in various machine learning scenarios, with the aim of distinguishing between normal and anomalous pulmonary sounds. Audio features (such as fundamental frequencies F0-4, loudness, HNR, DFA, as well as descriptive statistics of log energy, RMS and MFCC) were extracted using the Python library Surfboard. Windowing and feature aggregation and concatenation strategies were used to prepare data for tree-based ensemble models in unsupervised (fair-cut forest) and supervised (random forest) machine learning settings. The evaluation was carried out using 9-fold stratified cross-validation repeated 30 times. Decision fusion by averaging outputs for a subject was tested and found to be useful. Supervised models showed a consistent advantage over unsupervised ones, achieving mean AUC ROC of 0.691 (accuracy 71.11%, Kappa 0.416, F1-score 0.771) in side-based detection and mean AUC ROC of 0.721 (accuracy 68.89%, Kappa 0.371, F1-score 0.650) in patient-based detection.  ( 3 min )
    Recurrent Convolutional Neural Networks Learn Succinct Learning Algorithms. (arXiv:2209.00735v1 [cs.LG])
    Neural Networks (NNs) struggle to efficiently learn certain problems, such as parity problems, even when there are simple learning algorithms for those problems. Can NNs discover learning algorithms on their own? We exhibit a NN architecture that, in polynomial time, learns as well as any efficient learning algorithm describable by a constant-sized learning algorithm. For example, on parity problems, the NN learns as well as row reduction, an efficient algorithm that can be succinctly described. Our architecture combines both recurrent weight-sharing between layers and convolutional weight-sharing to reduce the number of parameters down to a constant, even though the network itself may have trillions of nodes. While in practice the constants in our analysis are too large to be directly meaningful, our work suggests that the synergy of Recurrent and Convolutional NNs (RCNNs) may be more powerful than either alone.  ( 2 min )
    Effective Class-Imbalance learning based on SMOTE and Convolutional Neural Networks. (arXiv:2209.00653v1 [cs.LG])
    Imbalanced Data (ID) is a problem that deters Machine Learning (ML) models for achieving satisfactory results. ID is the occurrence of a situation where the quantity of the samples belonging to one class outnumbers that of the other by a wide margin, making such models learning process biased towards the majority class. In recent years, to address this issue, several solutions have been put forward, which opt for either synthetically generating new data for the minority class or reducing the number of majority classes for balancing the data. Hence, in this paper, we investigate the effectiveness of methods based on Deep Neural Networks (DNNs) and Convolutional Neural Networks (CNNs), mixed with a variety of well-known imbalanced data solutions meaning oversampling and undersampling. To evaluate our methods, we have used KEEL, breast cancer, and Z-Alizadeh Sani datasets. In order to achieve reliable results, we conducted our experiments 100 times with randomly shuffled data distributions. The classification results demonstrate that the mixed Synthetic Minority Oversampling Technique (SMOTE)-Normalization-CNN outperforms different methodologies achieving 99.08% accuracy on the 24 imbalanced datasets. Therefore, the proposed mixed model can be applied to imbalanced binary classification problems on other real datasets.  ( 2 min )
    Temporal Conditional VAE for Distributional Drift Adaptation in Multivariate Time Series. (arXiv:2209.00654v1 [cs.LG])
    Due to the nonstationary nature, the distribution of real-world multivariate time series (MTS) changes over time, which is known as distribution drift. Most existing MTS forecasting models greatly suffer from the distribution drift and degrade the forecasting performance over time. Existing methods address distribution drift via adapting to the latest arrived data or self-correcting per the meta knowledge derived from future data. Despite their great success in MTS forecasting, these methods hardly capture the intrinsic distribution changes especially from a distributional perspective. Accordingly, we propose a novel framework temporal conditional variational autoencoder (TCVAE) to model the dynamic distributional dependencies over time between historical observations and future data in MTS and infer the dependencies as a temporal conditional distribution to leverage latent variables. Specifically, a novel temporal Hawkes attention mechanism represents temporal factors subsequently fed into feed-forward networks to estimate the prior Gaussian distribution of latent variables. The representation of temporal factors further dynamically adjusts the structures of Transformer-based encoder and decoder to distribution changes by leveraging a gated attention mechanism. Moreover, we introduce conditional continuous normalization flow to transform the prior Gaussian to a complex and form-free distribution to facilitate flexible inference of the temporal conditional distribution. Extensive experiments conducted on six real-world MTS datasets demonstrate the TCVAE's superior robustness and effectiveness over the state-of-the-art MTS forecasting baselines. We further illustrate the TCVAE applicability through multifaceted case studies and visualization in real-world scenarios.  ( 3 min )
  • Open

    Refining neural network predictions using background knowledge. (arXiv:2206.04976v2 [cs.AI] UPDATED)
    Recent work has shown logical background knowledge can be used in learning systems to compensate for a lack of labeled training data. Many methods work by creating a loss function that encodes this knowledge. However, often the logic is discarded after training, even if it is still useful at test time. Instead, we ensure neural network predictions satisfy the knowledge by refining the predictions with an extra computation step. We introduce differentiable refinement functions that find a corrected prediction close to the original prediction. We study how to effectively and efficiently compute these refinement functions. Using a new algorithm called Iterative Local Refinement (ILR), we combine refinement functions to find refined predictions for logical formulas of any complexity. ILR finds refinements on complex SAT formulas in significantly fewer iterations and frequently finds solutions where gradient descent can not. Finally, ILR produces competitive results in the MNIST addition task.  ( 2 min )
    Cost-based feature selection for network model choice. (arXiv:2101.07766v3 [stat.ME] UPDATED)
    Selecting a small set of informative features from a large number of possibly noisy candidates is a challenging problem with many applications in machine learning and approximate Bayesian computation. In practice, the cost of computing informative features also needs to be considered. This is particularly important for networks because the computational costs of individual features can span several orders of magnitude. We addressed this issue for the network model selection problem using two approaches. First, we adapted nine feature selection methods to account for the cost of features. We show for two classes of network models that the cost can be reduced by two orders of magnitude without considerably affecting classification accuracy (proportion of correctly identified models). Second, we selected features using pilot simulations with smaller networks. This approach reduced the computational cost by a factor of 50 without affecting classification accuracy. To demonstrate the utility of our approach, we applied it to three different yeast protein interaction networks and identified the best-fitting duplication divergence model.  ( 2 min )
    Clustering and Structural Robustness in Causal Diagrams. (arXiv:2111.04513v2 [stat.ML] UPDATED)
    Graphs are commonly used to represent and visualize causal relations. For a small number of variables, this approach provides a succinct and clear view of the scenario at hand. As the number of variables under study increases, the graphical approach may become impractical, and the clarity of the representation is lost. Clustering of variables is a natural way to reduce the size of the causal diagram, but it may erroneously change the essential properties of the causal relations if implemented arbitrarily. We define a specific type of cluster, called transit cluster, that is guaranteed to preserve the identifiability properties of causal effects under certain conditions. We provide a sound and complete algorithm for finding all transit clusters in a given graph and demonstrate how clustering can simplify the identification of causal effects. We also study the inverse problem, where one starts with a clustered graph and looks for extended graphs where the identifiability properties of causal effects remain unchanged. We show that this kind of structural robustness is closely related to transit clusters.  ( 2 min )
    Remember and Forget Experience Replay for Multi-Agent Reinforcement Learning. (arXiv:2203.13319v3 [cs.LG] UPDATED)
    We present the extension of the Remember and Forget for Experience Replay (ReF-ER) algorithm to Multi-Agent Reinforcement Learning (MARL). ReF-ER was shown to outperform state of the art algorithms for continuous control in problems ranging from the OpenAI Gym to complex fluid flows. In MARL, the dependencies between the agents are included in the state-value estimator and the environment dynamics are modeled via the importance weights used by ReF-ER. In collaborative environments, we find the best performance when the value is estimated using individual rewards and we ignore the effects of other actions on the transition map. We benchmark the performance of ReF-ER MARL on the Stanford Intelligent Systems Laboratory (SISL) environments. We find that employing a single feed-forward neural network for the policy and the value function in ReF-ER MARL, outperforms state of the art algorithms that rely on complex neural network architectures.  ( 2 min )
    Scalable Model-based Policy Optimization for Decentralized Networked Systems. (arXiv:2207.06559v2 [cs.LG] UPDATED)
    Reinforcement learning algorithms require a large amount of samples; this often limits their real-world applications on even simple tasks. Such a challenge is more outstanding in multi-agent tasks, as each step of operation is more costly requiring communications or shifting or resources. This work aims to improve data efficiency of multi-agent control by model-based learning. We consider networked systems where agents are cooperative and communicate only locally with their neighbors, and propose the decentralized model-based policy optimization framework (DMPO). In our method, each agent learns a dynamic model to predict future states and broadcast their predictions by communication, and then the policies are trained under the model rollouts. To alleviate the bias of model-generated data, we restrain the model usage for generating myopic rollouts, thus reducing the compounding error of model generation. To pertain the independence of policy update, we introduce extended value function and theoretically prove that the resulting policy gradient is a close approximation to true policy gradients. We evaluate our algorithm on several benchmarks for intelligent transportation systems, which are connected autonomous vehicle control tasks (Flow and CACC) and adaptive traffic signal control (ATSC). Empirically results show that our method achieves superior data efficiency and matches the performance of model-free methods using true models.  ( 3 min )
    Improving Sequential Query Recommendation with Immediate User Feedback. (arXiv:2205.06297v2 [cs.IR] UPDATED)
    We propose an algorithm for next query recommendation in interactive data exploration settings, like knowledge discovery for information gathering. The state-of-the-art query recommendation algorithms are based on sequence-to-sequence learning approaches that exploit historical interaction data. Due to the supervision involved in the learning process, such approaches fail to adapt to immediate user feedback. We propose to augment the transformer-based causal language models for query recommendations to adapt to the immediate user feedback using multi-armed bandit (MAB) framework. We conduct a large-scale experimental study using log files from a popular online literature discovery service and demonstrate that our algorithm improves the per-round regret substantially, with respect to the state-of-the-art transformer-based query recommendation models, which do not make use of immediate user feedback. Our data model and source code are available at https://github.com/shampp/exp3_ss  ( 2 min )
    Privacy-preserving Data Sharing on Vertically Partitioned Data. (arXiv:2010.09293v2 [cs.LG] UPDATED)
    In this work, we introduce a differentially private method for generating synthetic data from vertically partitioned data, \emph{i.e.}, where data of the same individuals is distributed across multiple data holders or parties. We present a differentially privacy stochastic gradient descent (DP-SGD) algorithm to train a mixture model over such partitioned data using variational inference. We modify a secure multiparty computation (MPC) framework to combine MPC with differential privacy (DP), in order to use differentially private MPC effectively to learn a probabilistic generative model under DP on such vertically partitioned data. Assuming the mixture components contain no dependencies across different parties, the objective function can be factorized into a sum of products of the contributions calculated by the parties. Finally, MPC is used to compute the aggregate between the different contributions. Moreover, we rigorously define the privacy guarantees with respect to the different players in the system. To demonstrate the accuracy of our method, we run our algorithm on the Adult dataset from the UCI machine learning repository, where we obtain comparable results to the non-partitioned case.  ( 2 min )
    Optimal bump functions for shallow ReLU networks: Weight decay, depth separation and the curse of dimensionality. (arXiv:2209.01173v1 [stat.ML])
    In this note, we study how neural networks with a single hidden layer and ReLU activation interpolate data drawn from a radially symmetric distribution with target labels 1 at the origin and 0 outside the unit ball, if no labels are known inside the unit ball. With weight decay regularization and in the infinite neuron, infinite data limit, we prove that a unique radially symmetric minimizer exists, whose weight decay regularizer and Lipschitz constant grow as $d$ and $\sqrt{d}$ respectively. We furthermore show that the weight decay regularizer grows exponentially in $d$ if the label $1$ is imposed on a ball of radius $\varepsilon$ rather than just at the origin. By comparison, a neural networks with two hidden layers can approximate the target function without encountering the curse of dimensionality.  ( 2 min )
    Identifying Transients in the Dark Energy Survey using Convolutional Neural Networks. (arXiv:2203.09908v2 [astro-ph.IM] UPDATED)
    The ability to discover new transients via image differencing without direct human intervention is an important task in observational astronomy. For these kind of image classification problems, machine Learning techniques such as Convolutional Neural Networks (CNNs) have shown remarkable success. In this work, we present the results of an automated transient identification on images with CNNs for an extant dataset from the Dark Energy Survey Supernova program (DES-SN), whose main focus was on using Type Ia supernovae for cosmology. By performing an architecture search of CNNs, we identify networks that efficiently select non-artifacts (e.g. supernovae, variable stars, AGN, etc.) from artifacts (image defects, mis-subtractions, etc.), achieving the efficiency of previous work performed with random Forests, without the need to expend any effort in feature identification. The CNNs also help us identify a subset of mislabeled images. Performing a relabeling of the images in this subset, the resulting classification with CNNs is significantly better than previous results.  ( 2 min )
    An Interpretable and Efficient Infinite-Order Vector Autoregressive Model for High-Dimensional Time Series. (arXiv:2209.01172v1 [stat.ME])
    As a special infinite-order vector autoregressive (VAR) model, the vector autoregressive moving average (VARMA) model can capture much richer temporal patterns than the widely used finite-order VAR model. However, its practicality has long been hindered by its non-identifiability, computational intractability, and relative difficulty of interpretation. This paper introduces a novel infinite-order VAR model that not only avoids the drawbacks of the VARMA model but inherits its favorable temporal patterns. As another attractive feature, the temporal and cross-sectional dependence structures of this model can be interpreted separately, since they are characterized by different sets of parameters. For high-dimensional time series, this separation motivates us to impose sparsity on the parameters determining the cross-sectional dependence. As a result, greater statistical efficiency and interpretability can be achieved without sacrificing any temporal information. We introduce an $\ell_1$-regularized estimator for the proposed model and derive the corresponding non-asymptotic error bounds. An efficient block coordinate descent algorithm and a consistent model order selection method are developed. The merit of the proposed approach is supported by simulation studies and a real-world macroeconomic data analysis.  ( 2 min )
    Poincare: Recommending Publication Venues via Treatment Effect Estimation. (arXiv:2010.09157v2 [cs.DL] UPDATED)
    Choosing a publication venue for an academic paper is a crucial step in the research process. However, in many cases, decisions are based solely on the experience of researchers, which often leads to suboptimal results. Although there exist venue recommender systems for academic papers, they recommend venues where the paper is expected to be published. In this study, we aim to recommend publication venues from a different perspective. We estimate the number of citations a paper will receive if the paper is published in each venue and recommend the venue where the paper has the most potential impact. However, there are two challenges to this task. First, a paper is published in only one venue, and thus, we cannot observe the number of citations the paper would receive if the paper were published in another venue. Secondly, the contents of a paper and the publication venue are not statistically independent; that is, there exist selection biases in choosing publication venues. In this paper, we formulate the venue recommendation problem as a treatment effect estimation problem. We use a bias correction method to estimate the potential impact of choosing a publication venue effectively and to recommend venues based on the potential impact of papers in each venue. We highlight the effectiveness of our method using paper data from computer science conferences.  ( 3 min )
    Tree density estimation. (arXiv:2111.11971v4 [math.ST] UPDATED)
    We study the problem of estimating the density $f(\boldsymbol x)$ of a random vector ${\boldsymbol X}$ in $\mathbb R^d$. For a spanning tree $T$ defined on the vertex set $\{1,\dots ,d\}$, the tree density $f_{T}$ is a product of bivariate conditional densities. An optimal spanning tree minimizes the Kullback-Leibler divergence between $f$ and $f_{T}$. From i.i.d. data we identify an optimal tree $T^*$ and efficiently construct a tree density estimate $f_n$ such that, without any regularity conditions on the density $f$, one has $\lim_{n\to \infty} \int |f_n(\boldsymbol x)-f_{T^*}(\boldsymbol x)|d\boldsymbol x=0$ a.s. For Lipschitz $f$ with bounded support, $\mathbb E \left\{ \int |f_n(\boldsymbol x)-f_{T^*}(\boldsymbol x)|d\boldsymbol x\right\}=O\big(n^{-1/4}\big)$, a dimension-free rate.  ( 2 min )
    Algorithms for Discrepancy, Matchings, and Approximations: Fast, Simple, and Practical. (arXiv:2209.01147v1 [cs.DS])
    We study one of the key tools in data approximation and optimization: low-discrepancy colorings. Formally, given a finite set system $(X,\mathcal S)$, the \emph{discrepancy} of a two-coloring $\chi:X\to\{-1,1\}$ is defined as $\max_{S \in \mathcal S}|{\chi(S)}|$, where $\chi(S)=\sum\limits_{x \in S}\chi(x)$. We propose a randomized algorithm which, for any $d>0$ and $(X,\mathcal S)$ with dual shatter function $\pi^*(k)=O(k^d)$, returns a coloring with expected discrepancy $O\left({\sqrt{|X|^{1-1/d}\log|\mathcal S|}}\right)$ (this bound is tight) in time $\tilde O\left({|\mathcal S|\cdot|X|^{1/d}+|X|^{2+1/d}}\right)$, improving upon the previous-best time of $O\left(|\mathcal S|\cdot|X|^3\right)$ by at least a factor of $|X|^{2-1/d}$ when $|\mathcal S|\geq|X|$. This setup includes many geometric classes, families of bounded dual VC-dimension, and others. As an immediate consequence, we obtain an improved algorithm to construct $\varepsilon$-approximations of sub-quadratic size. Our method uses primal-dual reweighing with an improved analysis of randomly updated weights and exploits the structural properties of the set system via matchings with low crossing number -- a fundamental structure in computational geometry. In particular, we get the same $|X|^{2-1/d}$ factor speed-up on the construction time of matchings with crossing number $O\left({|X|^{1-1/d}}\right)$, which is the first improvement since the 1980s. The proposed algorithms are very simple, which makes it possible, for the first time, to compute colorings with near-optimal discrepancies and near-optimal sized approximations for abstract and geometric set systems in dimensions higher than $2$.  ( 3 min )
    A taxonomy of surprise definitions. (arXiv:2209.01034v1 [q-bio.NC])
    Surprising events trigger measurable brain activity and influence human behavior by affecting learning, memory, and decision-making. Currently there is, however, no consensus on the definition of surprise. Here we identify 18 mathematical definitions of surprise in a unifying framework. We first propose a technical classification of these definitions into three groups based on their dependence on an agent's belief, show how they relate to each other, and prove under what conditions they are indistinguishable. Going beyond this technical analysis, we propose a taxonomy of surprise definitions and classify them into four conceptual categories based on the quantity they measure: (i) 'prediction surprise' measures a mismatch between a prediction and an observation; (ii) 'change-point detection surprise' measures the probability of a change in the environment; (iii) 'confidence-corrected surprise' explicitly accounts for the effect of confidence; and (iv) 'information gain surprise' measures the belief-update upon a new observation. The taxonomy poses the foundation for principled studies of the functional roles and physiological signatures of surprise in the brain.  ( 2 min )
    Normalization effects on deep neural networks. (arXiv:2209.01018v1 [cs.LG])
    We study the effect of normalization on the layers of deep neural networks of feed-forward type. A given layer $i$ with $N_{i}$ hidden units is allowed to be normalized by $1/N_{i}^{\gamma_{i}}$ with $\gamma_{i}\in[1/2,1]$ and we study the effect of the choice of the $\gamma_{i}$ on the statistical behavior of the neural network's output (such as variance) as well as on the test accuracy on the MNIST data set. We find that in terms of variance of the neural network's output and test accuracy the best choice is to choose the $\gamma_{i}$'s to be equal to one, which is the mean-field scaling. We also find that this is particularly true for the outer layer, in that the neural network's behavior is more sensitive in the scaling of the outer layer as opposed to the scaling of the inner layers. The mechanism for the mathematical analysis is an asymptotic expansion for the neural network's output. An important practical consequence of the analysis is that it provides a systematic and mathematically informed way to choose the learning rate hyperparameters. Such a choice guarantees that the neural network behaves in a statistically robust way as the $N_i$ grow to infinity.  ( 2 min )
    Inference and dynamic decision-making for deteriorating systems with probabilistic dependencies through Bayesian networks and deep reinforcement learning. (arXiv:2209.01092v1 [cs.AI])
    In the context of modern environmental and societal concerns, there is an increasing demand for methods able to identify management strategies for civil engineering systems, minimizing structural failure risks while optimally planning inspection and maintenance (I&M) processes. Most available methods simplify the I&M decision problem to the component level due to the computational complexity associated with global optimization methodologies under joint system-level state descriptions. In this paper, we propose an efficient algorithmic framework for inference and decision-making under uncertainty for engineering systems exposed to deteriorating environments, providing optimal management strategies directly at the system level. In our approach, the decision problem is formulated as a factored partially observable Markov decision process, whose dynamics are encoded in Bayesian network conditional structures. The methodology can handle environments under equal or general, unequal deterioration correlations among components, through Gaussian hierarchical structures and dynamic Bayesian networks. In terms of policy optimization, we adopt a deep decentralized multi-agent actor-critic (DDMAC) reinforcement learning approach, in which the policies are approximated by actor neural networks guided by a critic network. By including deterioration dependence in the simulated environment, and by formulating the cost model at the system level, DDMAC policies intrinsically consider the underlying system-effects. This is demonstrated through numerical experiments conducted for both a 9-out-of-10 system and a steel frame under fatigue deterioration. Results demonstrate that DDMAC policies offer substantial benefits when compared to state-of-the-art heuristic approaches. The inherent consideration of system-effects by DDMAC strategies is also interpreted based on the learned policies.  ( 3 min )
    Exploiting Pretrained Biochemical Language Models for Targeted Drug Design. (arXiv:2209.00981v1 [cs.LG])
    Motivation: The development of novel compounds targeting proteins of interest is one of the most important tasks in the pharmaceutical industry. Deep generative models have been applied to targeted molecular design and have shown promising results. Recently, target-specific molecule generation has been viewed as a translation between the protein language and the chemical language. However, such a model is limited by the availability of interacting protein-ligand pairs. On the other hand, large amounts of unlabeled protein sequences and chemical compounds are available and have been used to train language models that learn useful representations. In this study, we propose exploiting pretrained biochemical language models to initialize (i.e. warm start) targeted molecule generation models. We investigate two warm start strategies: (i) a one-stage strategy where the initialized model is trained on targeted molecule generation (ii) a two-stage strategy containing a pre-finetuning on molecular generation followed by target specific training. We also compare two decoding strategies to generate compounds: beam search and sampling. Results: The results show that the warm-started models perform better than a baseline model trained from scratch. The two proposed warm-start strategies achieve similar results to each other with respect to widely used metrics from benchmarks. However, docking evaluation of the generated compounds for a number of novel proteins suggests that the one-stage strategy generalizes better than the two-stage strategy. Additionally, we observe that beam search outperforms sampling in both docking evaluation and benchmark metrics for assessing compound quality. Availability and implementation: The source code is available at https://github.com/boun-tabi/biochemical-lms-for-drug-design and the materials are archived in Zenodo at https://doi.org/10.5281/zenodo.6832145  ( 3 min )
    Log-Gaussian processes for AI-assisted TAS experiments. (arXiv:2209.00980v1 [physics.data-an])
    To understand the origins of materials properties, neutron scattering experiments at three-axes spectrometers (TAS) investigate magnetic and lattice excitations in a sample by measuring intensity distributions in its momentum (Q) and energy (E) space. The high demand and limited availability of beam time for TAS experiments however raise the natural question whether we can improve their efficiency or make better use of the experimenter's time. In fact, using TAS, there are a number of scientific questions that require searching for signals of interest in a particular region of Q-E space, but when done manually, it is time consuming and inefficient since the measurement points may be placed in uninformative regions such as the background. Active learning is a promising general machine learning approach that allows to iteratively detect informative regions of signal autonomously, i.e., without human interference, thus avoiding unnecessary measurements and speeding up the experiment. In addition, the autonomous mode allows experimenters to focus on other relevant tasks in the meantime. The approach that we describe in this article exploits log-Gaussian processes which, due to the log transformation, have the largest approximation uncertainties in regions of signal. Maximizing uncertainty as an acquisition function hence directly yields locations for informative measurements. We demonstrate the benefits of our approach on outcomes of a real neutron experiment at the thermal TAS EIGER (PSI) as well as on results of a benchmark in a synthetic setting including numerous different excitations.  ( 3 min )
    Macroeconomic Predictions using Payments Data and Machine Learning. (arXiv:2209.00948v1 [econ.GN])
    Predicting the economy's short-term dynamics -- a vital input to economic agents' decision-making process -- often uses lagged indicators in linear models. This is typically sufficient during normal times but could prove inadequate during crisis periods. This paper aims to demonstrate that non-traditional and timely data such as retail and wholesale payments, with the aid of nonlinear machine learning approaches, can provide policymakers with sophisticated models to accurately estimate key macroeconomic indicators in near real-time. Moreover, we provide a set of econometric tools to mitigate overfitting and interpretability challenges in machine learning models to improve their effectiveness for policy use. Our models with payments data, nonlinear methods, and tailored cross-validation approaches help improve macroeconomic nowcasting accuracy up to 40\% -- with higher gains during the COVID-19 period. We observe that the contribution of payments data for economic predictions is small and linear during low and normal growth periods. However, the payments data contribution is large, asymmetrical, and nonlinear during strong negative or positive growth periods.  ( 2 min )
    A Discussion of Discrimination and Fairness in Insurance Pricing. (arXiv:2209.00858v1 [cs.LG])
    Indirect discrimination is an issue of major concern in algorithmic models. This is particularly the case in insurance pricing where protected policyholder characteristics are not allowed to be used for insurance pricing. Simply disregarding protected policyholder information is not an appropriate solution because this still allows for the possibility of inferring the protected characteristics from the non-protected ones. This leads to so-called proxy or indirect discrimination. Though proxy discrimination is qualitatively different from the group fairness concepts in machine learning, these group fairness concepts are proposed to 'smooth out' the impact of protected characteristics in the calculation of insurance prices. The purpose of this note is to share some thoughts about group fairness concepts in the light of insurance pricing and to discuss their implications. We present a statistical model that is free of proxy discrimination, thus, unproblematic from an insurance pricing point of view. However, we find that the canonical price in this statistical model does not satisfy any of the three most popular group fairness axioms. This seems puzzling and we welcome feedback on our example and on the usefulness of these group fairness axioms for non-discriminatory insurance pricing.  ( 2 min )
    Exact Decomposition of Quantum Channels for Non-IID Quantum Federated Learning. (arXiv:2209.00768v1 [quant-ph])
    Federated learning refers to the task of performing machine learning with decentralized data from multiple clients while protecting data security and privacy. Works have been done to incorporate quantum advantage in such scenarios. However, when the clients' data are not independent and identically distributed (IID), the performance of conventional federated algorithms deteriorates. In this work, we explore this phenomenon in the quantum regime with both theoretical and numerical analysis. We further prove that a global quantum channel can be exactly decomposed into channels trained by each client with the help of local density estimators. It leads to a general framework for quantum federated learning on non-IID data with one-shot communication complexity. We demonstrate it on classification tasks with numerical simulations.  ( 2 min )
    Optimizing the Performative Risk under Weak Convexity Assumptions. (arXiv:2209.00771v1 [cs.LG])
    In performative prediction, a predictive model impacts the distribution that generates future data, a phenomenon that is being ignored in classical supervised learning. In this closed-loop setting, the natural measure of performance, denoted the performative risk, captures the expected loss incurred by a predictive model after deployment. The core difficulty of minimizing the performative risk is that the data distribution itself depends on the model parameters. This dependence is governed by the environment and not under the control of the learner. As a consequence, even the choice of a convex loss function can result in a highly non-convex performative risk minimization problem. Prior work has identified a pair of general conditions on the loss and the mapping from model parameters to distributions that implies convexity of the performative risk. In this paper, we relax these assumptions and focus on obtaining weaker notions of convexity, without sacrificing the amenability of the performative risk minimization problem for iterative optimization methods.  ( 2 min )
    Optimal Diagonal Preconditioning: Theory and Practice. (arXiv:2209.00809v1 [math.OC])
    Preconditioning has been a staple technique in optimization and machine learning. It often reduces the condition number of the matrix it is applied to, thereby speeding up convergence of optimization algorithms. Although there are many popular preconditioning techniques in practice, most lack theoretical guarantees for reductions in condition number. In this paper, we study the problem of optimal diagonal preconditioning to achieve maximal reduction in the condition number of any full-rank matrix by scaling its rows or columns separately or simultaneously. We first reformulate the problem as a quasi-convex problem and provide a baseline bisection algorithm that is easy to implement in practice, where each iteration consists of an SDP feasibility problem. Then we propose a polynomial time potential reduction algorithm with $O(\log(\frac{1}{\epsilon}))$ iteration complexity, where each iteration consists of a Newton update based on the Nesterov-Todd direction. Our algorithm is based on a formulation of the problem which is a generalized version of the Von Neumann optimal growth problem. Next, we specialize to one-sided optimal diagonal preconditioning problems, and demonstrate that they can be formulated as standard dual SDP problems, to which we apply efficient customized solvers and study the empirical performance of our optimal diagonal preconditioners. Our extensive experiments on large matrices demonstrate the practical appeal of optimal diagonal preconditioners at reducing condition numbers compared to heuristics-based preconditioners.  ( 3 min )
    Recurrent Convolutional Neural Networks Learn Succinct Learning Algorithms. (arXiv:2209.00735v1 [cs.LG])
    Neural Networks (NNs) struggle to efficiently learn certain problems, such as parity problems, even when there are simple learning algorithms for those problems. Can NNs discover learning algorithms on their own? We exhibit a NN architecture that, in polynomial time, learns as well as any efficient learning algorithm describable by a constant-sized learning algorithm. For example, on parity problems, the NN learns as well as row reduction, an efficient algorithm that can be succinctly described. Our architecture combines both recurrent weight-sharing between layers and convolutional weight-sharing to reduce the number of parameters down to a constant, even though the network itself may have trillions of nodes. While in practice the constants in our analysis are too large to be directly meaningful, our work suggests that the synergy of Recurrent and Convolutional NNs (RCNNs) may be more powerful than either alone.  ( 2 min )

  • Open

    "The Unsurprising Effectiveness of Pre-Trained Vision Models for Control", Parisi et al 2022 {FB} (CLIP)
    submitted by /u/gwern [link] [comments]  ( 87 min )
    DQN does not work, but Distributional DQN (C51) does
    I am struggling to understand why the DQN algorithm fails in my environment -after a short rise at the beginning, the reward falls and remains low for more than 1e6 steps- but its distributional version (C51) achieves excellent results. I'm using RLlib implementations, so it's not a bug in my code. I would understand why C51 would learn faster, but what can be the reasons that prevent DQN to learn anything at all? One explanation I gave to myself is that in my environment the value for a given state can be quite noisy, and DQN learns only the expected value of the outcome, while C51 learns the distribution of possible outcomes. However, C51 still uses the expected value to choose the argmax action, so wouldn't the result be the same? Another possible explanation is that in my environment good performance corresponds to a region of small negative rewards, so gradient updates are much larger for low-reward batches, while this doesn't happen in C51. However, gradient clipping doesn't seem to be much effective. Do you know any other possible explanation? submitted by /u/fedetask [link] [comments]  ( 88 min )
    "Semantic Exploration from Language Abstractions and Pretrained Representations", Tam et al 2022 (plugging BERT/CLIP LMs into Impala/R2D2's NGU/RND exploration methods)
    submitted by /u/gwern [link] [comments]  ( 87 min )
    "LID: Pre-Trained Language Models for Interactive Decision-Making", Li et al 2022
    submitted by /u/gwern [link] [comments]  ( 87 min )
    "Housekeep: Tidying Virtual Households using Commonsense Reasoning", Kant et al 2022
    submitted by /u/gwern [link] [comments]  ( 87 min )
    "Awesome-LLM-Robotics": A comprehensive list of papers using large language/multi-modal models for Robotics/RL, including papers, codes, and related websites
    submitted by /u/gwern [link] [comments]  ( 87 min )
    Help with Policy Gradient agent that learns actions resulting in negative rewards
    I have a toy RL project implementing he REINFORCE algorithm with a policy gradient agent that seems to be consistently learning to choose actions that reliably generate negative rewards. Are there any likely reasons this could be so? Additional details below. The rules of the environment are of my own design. Since that makes it obviously hard to diagnose, I'll take whatever high or low granularity suggestions I can get. --------- My environment is a 25x25 grid upon which a bot can move and attempt to acquire a target. Bots can choose the following 17 actions: Do nothing Move 1 space (in any of the 8 directions surrounding it) Attempt to acquire a target 1 space away (in any of the 8 directions surrounding it) The reward system is such that: +1 for moving closer to any targe…  ( 92 min )
    "Improved Policy Optimization for Online Imitation Learning", Lavington et al 2022
    submitted by /u/gwern [link] [comments]  ( 87 min )
    "Learning with Differentiable Algorithms", Petersen 2022 (sorting, top-k/ranking, rendering, logic gates, distances)
    submitted by /u/gwern [link] [comments]  ( 87 min )
    Help with literature recomendation. Variable input state in a Multi-agent Reinforcement learning environment?
    Hi community. I am very new to the field and I find myself in need for some guide to begin a project applied to UAVs. My context is the following. I am experimenting with decision making of autonomous Multi-agent UAVs in a simulated environment. My problem arises when attemting to represent the n number of other agents seen by a particular one in order to make a decision. As far as I have seen, the input size of the state vector is of fixed lenght, so how could one represent an unknown number of agents? (each with it's vector representing it's relative position in space)? I am aware of padding, but I was looking for a more (elegant?) and most importantly scalable solution. Thanks in advance and please forgive my ignorance on the topic. submitted by /u/Bensimon_Joules [link] [comments]  ( 88 min )
    Question about modifying gym environment
    Hi everyone thank you so much in advance. Relative newbie here to open ai gym. I was wondering how I can change an environment like the taxi environment such as adding in more destination points, passengers, larger grid size, etc? Do you have any pointers or recommendations? I greatly appreciate it! submitted by /u/No_Opportunity575 [link] [comments]  ( 102 min )
    stable_baseline3 problems
    Hello all, I am building a custom Reinforcement Learning trading environment using gym.Env for my masters dissertation. I finished developing it and I passed it through the stable_baseline3 check_env function. The environment did not raise any issues. Now I am trying to implement a quick RL algorithm to check if everything is set up correctly. I would really appreciate if anyone can give me some clarity what I am doing wrong. Here is my action space and observation space: self._data = df.copy() self.action_space = gym.spaces.Box(low=np.array([0, 0]), high=np.array([3, 1], dtype=np.float32)) # Store the columns to be shown to the agent self._columns = list(self._data.columns)[1:21] self._low = min(list(self._data[self._columns].min())) self._high = max(list(self._data[self._columns].max…  ( 99 min )
  • Open

    [D] Model fit of training data
    I am trying to learn machine learning. I have one hang up that I cannot find the answer to: How does a model fit inputs from a training data set? For example, I am working with the MNIST training set. When I fit the data using a particular model, how does it do it? All I am finding are articles with the syntax for code to do this. I am trying to understand what is happening one layer deeper? Does this make sense? I am not even sure what keywords should I I use to search for this on goggle. submitted by /u/NetworkPoker [link] [comments]  ( 90 min )
    [P] Peacasso - A Web UI for Stable Diffusion Models
    Peacasso - a UI for interacting with stable diffusion models. As text to image models become smaller, with available weights and generate competitive images (e.g. stable diffusion models), there have been efforts to build interfaces for interacting with these models. Code and instructutions can be found on Github - https://github.com/victordibia/peacasso. ​ https://i.redd.it/t65hvd8tdwl91.gif There are many excellent UI interfaces that have been build for latent diffusion models recently, however some of the things Peacasso brings to the table include: Easy installation. Instead of cobbling together command line scripts, Peacasso provides a pip install flow and a UI that supports a set of carefully selected default operations. UI with good defaults. The current implementation of Peacasso provides a UI for basic operations - text and image based prompting, remixing generated images as prompts, model parameter selection. Also the little things .. like light and dark mode. Python API. While the UI is the focus here, there is an underlying python api which will bake in experimentation features (e.g. saving intermediate images in the sampling loop, exploring model explanations etc. . see roadmap ). ​ Light Mode ​ Dark mode submitted by /u/vykthur [link] [comments]  ( 90 min )
    [Discussion] Twice Differentiable Neural Networks
    Hi, I have a question regarding taking the gradient twice, once wrt the input and once wrt to parameters and what one might have to pay attention to optimize the whole thing. Context: I have a neural network f(x,\theta) with a target y and it's output and gradient wrt to x occur in the loss which is calculated as L{ f(x, \theta), \nabla_x f(x, \theta), y }. The use case are differential equations. This means that to obtain the loss, I have to take gradient of the prediction f(x, \theta) once for the input x. Thereafter I take the gradient wrt to the parameters to do gradient descent as usual. I use twice differentiable activation functions (tanh, gelu, sigmoid etc). Problem: The model works for low dimensions but scales poorly to higher dimensional data. Question: Does anybody have recommendations, resources or links with regard to twice differentiable neural networks and how stochastic gradient descent is done with them, respectively what one has to pay to attention to while optimizing them (learning rates, momenta, weight decay etc) ? A resource with a similar setup is described in the Hamiltonian Neural Network paper by Greydanus et al (https://arxiv.org/pdf/1906.01563.pdf) which also computes the gradients of the parameters after taking the gradient wrt the input. Thanks in advance dear fellow gradient descenter :) submitted by /u/ludixiv [link] [comments]  ( 92 min )
    [D] Intuitive understanding of generative models
    Intuitively, can generative models be understood as learning to "interpolate" between the training data? For example, in the case of score-based models, my understanding is that the minimizer of the loss is any distribution where the score is 0 for each of the noise-perturbed training data - i.e., a distribution with (at least) *n* modes (*n* is training data size). Then, we do a gradient ascent on the distribution and add some Langevin noise so that the final sample (for example, an image) is not just images from the training data. What is the relation of the objects as we see them in this image to the objects in images from the training data? submitted by /u/a1_jakesauce_ [link] [comments]  ( 89 min )
    [D] Stable diffusion and similar platforms are the best compression models the world currently has!
    I think it is important to understand the capabilities of these technologies beyond just making art. What these text to image models are capable of is massive compression of visual data that is orders of magnitude better then what we currently have. Right now if I wanted to share any image I created with stable diffusion, I do not have to share the image with you. I could just give you the metadata for the settings used to generate the said image and save a tremendous amount of bandwidth and hard drive space. Now you might be thinking that's great and all but what's my point. Well imagine if you are running a massive website that hosts generated images. All of those images are taking up a lot of space on server hard drives which you have to pay to host. Also there is massive amount of ban…  ( 112 min )
    [P] - VkFFT now supports Rader's algorithm - A100 and MI250 benchmarks: Part 2
    Hello, I am the creator of the VkFFT - GPU Fast Fourier Transform library for Vulkan/CUDA/HIP/OpenCL and Level Zero. Two weeks ago I made a post about Rader's algorithm implementation in VkFFT, which improved the performance of VkFFT for sequences not decomposable as small primes multiplication. The previous version of VkFFT was doing direct multiplication convolutions of length N-1 to create an FFT kernel of an arbitrary prime length to be used in a regular Stockham FFT algorithm. Direct multiplication convolutions scale as O(N^2) and do not work well for primes after 100. This update brings support for the convolution theorem Rader's algorithm, which no other GPU FFT library currently has. The new version does the Rader algorithm by inlining an FFT convolution in the FFT code - with FF…  ( 113 min )
    [P] Serving ML models at scale using mlflow
    As you know, mlflow is widely used today in the machine learning communiting to manage experimentation and serve models. In this series, I published on medium, I address the problem of scalability that I faced in my company while deploying multiple models in production using mlflow. ​ https://preview.redd.it/jzl3sb5z8vl91.png?width=700&format=png&auto=webp&s=43c15d7167399591e6db13b175530735a8b85039 In this series, I wrote about: Deploying an mlflow tracking instance to experiment Serving ml models as APIs endpoints on kubernetes. Understanding how k8s handles charge through Load testing The last article explains how you can make the deployment scalable, anticipate the computation power needed to handle multiple simultaneous requests in a real world context. submitted by /u/Spirited-Singer-6150 [link] [comments]  ( 89 min )
    [R] Model compression reference request
    I am looking to get into the model compression field, especially in the post-train/fine-tuning phases (i.e., quantization, pruning). What are some canonical papers in the field? Is there any must-read book/survey paper? What are the most glaring problems and the most promising research avenues (in your opinion)? submitted by /u/Disastrous-War-9675 [link] [comments]  ( 88 min )
    [R] mmwave based multi user 3D posture tracking for AR / VR / smart home / fitness tracking
    submitted by /u/SpatialComputing [link] [comments]  ( 90 min )
    [D] What is the SOTA explanation for why deep learning works? I understand backprop and gradient descent, but why should over-parametrized networks actually converge to anything useful?
    Gradient descent is straightforward, you can see it obviously in 2d, getting stuck in local minima, etc. Backprop likewise makes sense in the case of a couple of neurons and a couple of layers, toy problems, etc. But why is it obvious that this process should work in ultra high dimensional space as well? I mean it's obvious there is structure in visual/audio/language data, and it makes sense you would be able to learn a representation of that structure by some method. But it's not at all obvious to me that backprop should "work" in that it should converge to anything generically useful for such data. It bothers me to have that missing intuition. How do you guys mentally resolve this? Do you just say "if it works it works" and get on with it? What is the latest SOTA thinking around a theoretical foundation for deep learning? Not a researcher myself - just a curious data scientist who works with them and occasionally reads papers on weekends. Would be happy to read any papers/blogs/posts on the topic as it stands now in mid 2022. submitted by /u/thunderdome [link] [comments]  ( 119 min )
    [D] Solving TSP Using Diffusion Model
    Could we use a diffusion model to solve the traveling salesman problem in polynomial time? I don't know but I think we could do that by trying to randomize the path of an exactly solved TSP and then try to make the machine learn to reverse the process. If my guess is correct, we could exactly solve the TSP in polynomial time. Seems like the recently published Riemannian diffusion model could do that. submitted by /u/flat_rain [link] [comments]  ( 89 min )
    [P] Apple pencil with the power of Local Stable Diffusion using Gradio Web UI running off a 3090
    submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 92 min )
    [D] Should I focus on breadth or depth during my studies as an undergrad aiming to enter ML?
    Basically I’m majoring in computer science and will very likely double major in math (as I’m more interested in it rather than cs but majoring in math only will bar me from participating in ml or ai classes or research due to my university rules) So my question is should I focus on machine learning solely or try to build some breadth and care about ml later on? I know this might sound like a silly question but it’s hard to decide since I’m afraid of pigeon holing my self into some niche field of ml or some thing like that? I’ll add more points to my post later on once I come back from work but hope the question is understandable and doesn’t break any sub rules thank you. submitted by /u/djssoapappskdid [link] [comments]  ( 92 min )
    [D] Asking about the difference beetwen Automatic Tagging and Tag Recommendation System
    Hello, everyone. I have a question regarding to the tittle. So i have a thesis about Automatic tagging topic and i have to make the project about it. So my lecturer give 2 journal references, they are about automatic tagging and tag recommendation system. And i just confused about them. are they have same goal to make auto-tagging for the object ? why my lecturer give me these references? does tag recommendation system have to put into the automatic tagging ? Anyway i am still in the making of methodology step. Thank you. submitted by /u/TohingWahing [link] [comments]  ( 105 min )
    [D] OpenAI SWE Residency?
    Hey there! Has anyone here gone through the program before? Or are there any OpenAI people who can offer insight into the program? Maybe someone who is willing to share their experience and how to best prep for the program? I've seen various discussions on r/MachineLearning for the research track, but none about this program submitted by /u/ThrowawayTartan [link] [comments]  ( 90 min )
  • Open

    Best free AI generator I can use to create simple cartoon characters?
    every ai out there somehow does a phenomenal job at creating landscapes and scenery and such and even extremely realistic portrays of humans, yet google doesnt give me a clear answer on any ai generator that can create cartoon characters based on the style of X or Y. i kinda wanted to see what i could create using ai and if itd be possible to create channel art or cartoon avatars that look like me for my youtube banner and videos. is there anything out there like this or would i have to find someone to draw something for me or use an online avatar creator that just uses a template? submitted by /u/blxoom [link] [comments]  ( 88 min )
    Stable Diffusion Weekly AI Art Slideshow 4K 9.4.22
    submitted by /u/prfitofthesngularity [link] [comments]  ( 86 min )
    Sunday reading suggestion. This is a good resource to learn and do Data Cleaning tasks quickly.
    submitted by /u/alimhabidi [link] [comments]  ( 87 min )
    MATRIX AI VERSION
    submitted by /u/nalr00n [link] [comments]  ( 86 min )
    A New Deep Learning Approach Developed at MIT Identifies Undiagnosable Cancers by Taking a Closer Look the Gene Expression Programs Related to Early Cell Development and Differentiation
    Cancer is partially a developmental illness, with malignancies named for the cell or tissue from which they originate. However, there is no systematic atlas of tumor sources. Identifying a patient’s precise type of cancer and its main site is the first step in deciding on the best course of treatment. Despite extensive testing, the source of cancer cannot be determined in many situations. Oncologists must employ non-targeted medicines with severe side effects and poor survival rates. Researchers at Massachusetts General Hospital (MGH) and the Koch Institute for Integrative Cancer Research at MIT may help classify cancers of unknown primary. Their work introduces a new deep learning method by closely examining the gene expression patterns related to early cell development and differentiation. Continue reading | Check out the paper submitted by /u/ai-lover [link] [comments]  ( 87 min )
    Critique of a stupid video - Top 10 scariest things that will happen before 2050
    submitted by /u/HumanSeeing [link] [comments]  ( 87 min )
    Best introductory books on artificial intelligence?
    I am have not had much luck finding decent work with my philosophy degree and am considering an MA in AI. As it stands I don't know much at all and am in the very early stages of getting to grips with the topic and deciding whether it's something I could really do. With that in mind, do you guys have any recommendations for introductory books on the topic? There's plenty of content online, but I'd prefer a proper book that's a little more in depth. submitted by /u/Snedwardthe18th [link] [comments]  ( 88 min )
    Ai art starryai
    submitted by /u/AstroFish69 [link] [comments]  ( 86 min )
    "Floraison d'hiver" (Winter Bloom), creating dancing animations with AI
    submitted by /u/Kkrch [link] [comments]  ( 87 min )
    French tax officials use AI to spot 20,000 undeclared pools | France
    submitted by /u/firig1965 [link] [comments]  ( 87 min )
    Please help me train my chatbot.
    I built a simple chatbot with NextJS for my final year project, but I'm having trouble training it. Xalen is a self-learning conversational chatbot designed to hold funny and witty conversations with users through artificial intelligence methods that enable the program to communicate like an actual human being. It can talk about anything (eventually), tell jokes, engage in witty banter, etc. It's generally for entertainment purposes. Xalen learns from each and every conversation, becoming better and more intelligent, improving its ability to hold longer and more meaningful conversations. Due to a lack of sufficient data, its responses aren't perfect right now, but they can definitely improve! I need a lot of people to chat with the bot. The longer the conversation the better. Your help is greatly appreciated... Here's the URL: http://xalen.netlify.app Thank you all in advance! submitted by /u/GameTide [link] [comments]  ( 88 min )
    Has Anyone Considering Making an AI Colab generator for hand drawn animation?
    I honestly would love a Colab Notebook for a hand drawn animation generator where you import the storyboards, animatics, backgrounds, model sheets, time sheet/exposure sheets, etc. Though it wouldn't be a substitute for animating yourself, but could be useful if unmotivated. Also, I would love it to have various styles of studios that have done hand drawn animation before (Kinda like Jukebox AI where it has various styles of composers). Especially, defunct ones. No one has ever done an animation generator like this, so it could be a unique one. A time sheet, aka exposure sheet is a wide document with directions for the animators on what to draw for each frame. Here's some examples by me and a former sheet timer of mine named Emery. They sadly left due to personal stuff. https://drive.google.com/file/d/1wbhT3gmCgWQGJYLtYSfgcdf6Z-A6YOJk/view?usp=drivesdk https://drive.google.com/file/d/1n3D4CHJXx6VEBrKWmcmvUBFUMz4Reff6/view?usp=drivesdk https://drive.google.com/file/d/1ckZPfKHQ0gHvKrSGfH-F8PI5HKbEQDlZ/view?usp=drivesdk https://drive.google.com/file/d/1cTj83apBrFJst4sfISwG_tJFZMP-HJtf/view?usp=drivesdk submitted by /u/Ooglyeye [link] [comments]  ( 87 min )
    Stable Diffusion Animation | Jupiter's Stability | Planetary Path | AI Manifest
    submitted by /u/Available_Tadpole829 [link] [comments]  ( 87 min )
    6 Best Artificial Intelligence courses for Healthcare You should learn 2022 -
    submitted by /u/Lakshmireddys [link] [comments]  ( 86 min )
    How to create AI Art Using Starting Images with Stable Diffusion
    submitted by /u/prfitofthesngularity [link] [comments]  ( 87 min )
  • Open

    Computing VIN checksums
    I’ve had to work a little with VIN numbers lately, and so I looked back at a post I wrote on the subject three years ago. That post goes into the details of Vehicle Identification Numbers and the quirky algorithm used to compute the check sum. This post captures the algorithm in Python code. See […] Computing VIN checksums first appeared on John D. Cook.  ( 5 min )
  • Open

    Lack of Trust Continues to Erode Blockchain Adoption
    Blockchain technology has outgrown from being distributed ledger in financial applications to peer‐to‐peer networks that hold tremendous value in any industry and sector. Bewildering as the growth has been, organizations are engineering their blockchain. The post Lack of Trust Continues to Erode Blockchain Adoption appeared first on Data Science Central.  ( 20 min )
    What is Digital Experience Monitoring (DEM)?
    End-user experience monitoring (EUEM) enables IT professionals to understand issues from the viewpoint of end users, deliver a better customer experience, and fix issues more quickly by constantly capturing failures, breakdowns, page load data, network requests, and other metrics. The post What is Digital Experience Monitoring (DEM)? appeared first on Data Science Central.  ( 19 min )
    Five Steps to Data Profiling for Successful Discovery
    Data profiling focuses on examining and analyzing data, followed by creating a useful summary of that data. The post Five Steps to Data Profiling for Successful Discovery appeared first on Data Science Central.  ( 21 min )
    Will Technology Help Eliminate BS Jobs?
    The definition of a BS job is that the person doing it feels that their contribution to society is meaningless and in a way even harms it The post Will Technology Help Eliminate BS Jobs? appeared first on Data Science Central.  ( 20 min )
    CDN: Does Our Need For Internet Speed Put Sensitive Data at Risk?
    What is CDN, does it endanger the sensitive data of internet users, and how can companies prevent hackers from exploiting its flaws? The post CDN: Does Our Need For Internet Speed Put Sensitive Data at Risk? appeared first on Data Science Central.  ( 21 min )
    Data Science Lessons from Top Gun
    The movie of the summer of 2022 is no doubt “Top Gun: Maverick.”  Fast moving, tons of aerial combat, lots of excitement, and the youngest-looking Tom Cruise I’ve ever seen (not one stitch of grey hair on his head).  There’s gotta be a Data Science lesson in there somewhere… The post Data Science Lessons from Top Gun appeared first on Data Science Central.  ( 25 min )
    How to Use Python to Loop Through HTML Tables and Scrape Tabular Data
    Iterating through HTML tables can be tricky, so we've created this simple guide to help you understand how to use Python to extract tabular data from public HTML tables. The post How to Use Python to Loop Through HTML Tables and Scrape Tabular Data appeared first on Data Science Central.  ( 26 min )

  • Open

    What’s Data-Centric Architecture?
    Data-centric architecture revisits architecture and turns that architecture on its head. Ever since the dawn of client-server computing, applications have been the focus of enterprise IT buyers. The post What’s Data-Centric Architecture? appeared first on Data Science Central.  ( 18 min )
    The AI Vegan – A real use case for NFT/ Blockchain?
    Blockchain has a chequered history. The post The AI Vegan – A real use case for NFT/ Blockchain? appeared first on Data Science Central.  ( 19 min )
    Reaping the Benefits of Having a Data Backup and Recovery Plan
    In today’s data-driven economy, any business needs to make sure that their data is easily recoverable and secured in an emergency. The National Archives and Records Administration suggests that close to 93% of the organizations which witness downtime and data loss for over ten and more days can file bankruptcy in a year. No wonder… Read More »Reaping the Benefits of Having a Data Backup and Recovery Plan The post Reaping the Benefits of Having a Data Backup and Recovery Plan appeared first on Data Science Central.  ( 19 min )
    Data Storytelling: Meshing Narrative Techniques with Data Science Smarts
    Alongside the explosion in enterprise data analytics is the growing realisation that insights, without action, are not enough. The post Data Storytelling: Meshing Narrative Techniques with Data Science Smarts appeared first on Data Science Central.  ( 21 min )
    Reengineer Business Decisions Granularly with Automated Data Collection
    Businesses, whether big or small, know that understanding data is essential to making informed decisions that impact the organization’s bottom line. The post Reengineer Business Decisions Granularly with Automated Data Collection appeared first on Data Science Central.  ( 21 min )
  • Open

    [D] Survey: Opinions on text-to-image AI
    submitted by /u/Ragdoll_X_Furry [link] [comments]  ( 88 min )
    [D] Is there research about sensitivity of models with respect to their hyperparameters?
    Dear students, researchers and practioners, I don't know about you guys but in terms of ML hyperparameter tuning is what "keeps me up at night". Even if your model is convex, closed form finding the (near-)optimal hyperparameters is still always a non-convex problem. Yes you can do a random search search with a large n, cv on each point and then grid around the n best results but I don't always have the time, compute or simply the "mental energy" to do this. This is why I actively avoid neural networks for non-text, vision and even try to wing it with OpenCV if the problem is easy enough. Everything in neural networks is a hyper parameter and for enough datasets there's no clear point you can say "mmkay I'm sure I got everything out of my model" unless you hit 100 % on val, test. Some models on tabular data, e.g. GBDT's, seem not to be sensitive towards hyperparameters unless you have unbalanced data or so. Even better, if I can wing it with LDA/QDA and use 0 hyperparameters I'm happy. Have there been any large scale studies that look at algorithms from this point of view? If not, is this not an interesting domain to research? submitted by /u/Aggressive_Ad_3178 [link] [comments]  ( 91 min )
    [D] Do I need a dedicated graphics card of rtx series for machine learning in long run?
    I m going to buy a new laptop and the only thing that concerns me is whether I am overspending for laptop in GPU. I am doing my BTech in Artificial intelligence. And my task includes running heavy softwares like visual studio, Android studio, matlab, etc. simultaneously. So the question is, do i need to buy a laptop with dedicated GPU for long run? If not, which intel or ryzen processors will do the job? submitted by /u/Relative_Winner_4588 [link] [comments]  ( 92 min )
    [R] Using Emerson AI, augmented by GPT-J (Eleuther) to train Blenderbot 3 about video game level design (virtual assistant) and to illustrate Emerson AI's ideas for a level with Dall-E Mini. The results initially varied but with a little calibration over time the quality of the images improved.
    submitted by /u/swagonflyyyy [link] [comments]  ( 89 min )
    [D] Where does VQ-GAN get its randomness from?
    In VQ-GAN, when sampling new images for unconditional image generation, you run a Transformer to predict the sequence of codebook vectors that define the latent representation of the image, and then you run that latent representation through your decoder to generate an actual image. But does the only randomness come from the sampling of the predicted sequence? If so, I get that the number of possible images to generate is mind shatteringly huge (1024256), but it still seems limited. Like, if you generate a particular image, it seems like you wouldn't be able to generate very similar images that are close enough that they'd still (imperfectly) map to the same codebook entries when run through the first stage encoder. Is there any other noise applied (e.g. slight perturbations to the chosen codebook vectors, Adaptive Instance Normalization in the decoder a la StyleGAN), or does the discrete set of vector sequences really define the full, discrete set of outputs? submitted by /u/say_wot_again [link] [comments]  ( 89 min )
    [D] Accelerate, PyTorch, and Big Model Inference: How Does It All Work?
    submitted by /u/muellerzr12 [link] [comments]  ( 88 min )
    [P] Dual numbers in Python for Automatic Differentiation.
    Hi there, I have set up a basic implementation for Dual Numbers in Python that can be used for automatic differentiation. Here is an example that is also part of the repository: ​ Gradient descent with dual numbers. The implementation is pretty simple and therefore easy to understand. https://github.com/kaifishr/PyDualNumber submitted by /u/neurontwister [link] [comments]  ( 105 min )
    [D] would you consider yourself an artist for using AI arts tools like Dall-e and midjourney?
    I was reading this article. For those of you who got paywalled, a short summary of the controversy is that someone entered one of his results from midjourney to Colorado State Fair’s art competition. He played around with midjourney, and submitted one of the outcomes to the fair with “adequate” credit - Jason Allen via midjourney. (This is also controversial as some judges didn’t know anything about midjourney). He won his division, which is digital art. Obviously, there are some pretty heated arguments on Twitter on this matter among digital artists. Some argue that he’s not an artist that he did bare minimum work. There’s no creativity or leg work to create it. He didn’t write the code. He simply came up with a prompt. Some argue that this is valid, because AI tools are merely tools, and every artists use tools. They think it’s not any different than photoshop. submitted by /u/KimiKimKimKitty [link] [comments]  ( 113 min )
    [N] Introduction to streaming for data scientists
    submitted by /u/fchung [link] [comments]  ( 101 min )
    [D] Most Popular AI Research Aug 2022 - Ranked Based On GitHub Stars
    submitted by /u/cloud_weather [link] [comments]  ( 88 min )
    [P] I made a social media app for generating paintings from your photos using glid-3
    submitted by /u/persianprez [link] [comments]  ( 90 min )
    [D] Human pose estimation rotations and orientation
    Hello, I have a few questions regarding deep learning based human pose estimation that I would appreciate some help with: Firstly, I understand that pitch and yaw of a bone segment, such as the forearm, can be calculated using 2d or 3d joint keypoints but roll cannot. Could someone explain to me why it is not possible to calculate roll of the forearm using wrist and elbow keypoints? Secondly, I am confused on the exact difference between orientation map representations of the human body such as those used in openpose [1] and hierarchal bone vectors such as those used in [2]. My understanding is that openpose learn vector fields between joint keypoints/adjacent limbs, but hierarchical bone vectors are also a way of learning a collection of bone vectors rather as part of a kinematic tree. Is the difference simply that the bone vector methods take the kinematic tree into account, whereas the orientation map methods only care about adjacent segments? Thirdly, I understand that 2d or 3d keypoints are not enough to define orientation but it can be derived from them. Is the derivation of orientation simply how keypoints are connected to adjacent keypoints? For example, the orientation of the elbow is defined relative to its position from the shoulder and wrist? Defined by how the elbow must be rotated to then connect to the wrist and shoulder. Again, any help with these matters would be greatly appreciated. I appreciate that these questions are of an esoteric nature. [1] https://arxiv.org/abs/1812.08008 [2] https://openaccess.thecvf.com/content_CVPR_2020/html/Xu_Deep_Kinematics_Analysis_for_Monocular_3D_Human_Pose_Estimation_CVPR_2020_paper.html submitted by /u/pug____ [link] [comments]  ( 90 min )
    [R] Motorcycle Night Ride (Semantic Segmentation); featuring 6 class labels in 200 frames taken from open media to enable testing for object detection and/or mobility-centered AI solutions
    The dataset is geared towards computer vision-powered motorcycle helmets or other inventive avenues. Get the dataset here: https://www.kaggle.com/datasets/sadhliroomyprime/motorcycle-night-ride-semantic-segmentation Six standard classes are used which are: Undrivable, Road, Lanemark, My bike, Rider, Movable Movable denotes moving objects e.g. vehicles, people etc., undrivable denoting areas where one cannot ride to. Other classes are self-explanatory inclusive of road, lanemark (inclusive of reflectors), rider, and, of course, the bike itself. We have used SuperAnnotate’s pixel editor as the tool for the semantic segmentation. It works on a raster logic as opposed to a vector one. Exporting include the COCO format. We have prepackaged the dataset inclusive of fused images. Dataset is created by Acme AI Ltd. (www.acmeai.tech) and is #openaccess 😊 😊. Use it to your heart's content. submitted by /u/SithisR [link] [comments]  ( 89 min )
    [R] ERNIE-ViLG, a state-of-the-art text-to-image model that generates images from Chinese text
    submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 101 min )
  • Open

    Professional AI whisperers have launched a marketplace for DALL-E prompts
    submitted by /u/firig1965 [link] [comments]  ( 86 min )
    New DeepMind AI Learns Soccer Skills | Boston Dynamics Robotics News
    submitted by /u/kenickh [link] [comments]  ( 90 min )
    AI is getting better at generating porn. We might not be prepared for the consequences
    submitted by /u/magenta_placenta [link] [comments]  ( 87 min )
    Severe case of overfitting in my research
    I'm a MSc student in bioinformatics. What I do is I gather transcriptomic data from many cancer datasets, I conduct some analysis over each dataset sepratly, get important cells and genes as features, and use them in a machine learning model to predict a target variable. The analysis in which I get the cells scores is pretty solid. It is based on the transcriptomic data, and it basically tells me how much is there from each cell type in each sample. In total, I have 38 cell types that I can use as predictive features. For example, CellA gets overall higher scores in responder samples, and a low scores in non-responders. It is informative so I would use it in the model. The aim is to define differences between samples that respond to a therapy (labeled Response) and samples that do not (NoResponse). I tried random forest, gradient boosting machines, XGBoost, logistic regression (with Lasso and Ridge penalties), Kernel SVM, and more. Tree based algorithms are producing AUC = 0.9 in the train set, and AUC = 0.63 in the test set.. something like that. Linear models (logistic regression) is very bad, it has AUC = 0.51 in the test set. I guess they just dont fit my data so I'll use tree based models. I'm using cross validation, I tuned the parameters of each algorithm (like numbers trees, number of nodes...), I tried feature selection, nothing is working. I'm facing an overfitting and it is hurting my brain. What can cause such overfitting? Why is parameter tuning and feature selection not helping at all? could it be that the cells are just not very good predictive features? what do you think please share your thoughts. submitted by /u/ComanConer [link] [comments]  ( 106 min )
    Time Session - Stable Prompts: hourglass with clocks and bright colors inside, 8k hyperdetailed
    submitted by /u/widgia [link] [comments]  ( 87 min )
    got NovelAI
    submitted by /u/roblox22y [link] [comments]  ( 90 min )
    I made a music video using 'AI Technology'. What do you guys think?
    submitted by /u/6Witchy9 [link] [comments]  ( 91 min )
    AI significantly improves early detection of sepsis in hospitals
    submitted by /u/henlo_there_fren [link] [comments]  ( 86 min )
    Yu-Gi-Oh! The Abridged Series Season 1 Upscaled to 4K 60FPS using AI
    submitted by /u/hirep14316 [link] [comments]  ( 87 min )
    Anyone know of a site that does AI text to a text image
    like a starbucks logo based on if i put starbucks give me diff text images?? LMK submitted by /u/Effective_Try_198 [link] [comments]  ( 87 min )
    Stable Diffusion Animation | Aliens & Demons | AI Manifest 4K UHD 60 FPS
    submitted by /u/Available_Tadpole829 [link] [comments]  ( 87 min )
    AI in Competition
    submitted by /u/MassiveReign [link] [comments]  ( 86 min )
    Do you think there will soon be a “text to video or gif” service like midjourney or Imagen is text to pic?
    submitted by /u/ibexVR [link] [comments]  ( 87 min )
    The KoboldAI Horde: AI writing for everyone through mutual aid
    submitted by /u/dbzer0 [link] [comments]  ( 86 min )
    Gothic Fever Dream Stable Diffusion Ai Art Music Video 4k 29FPS
    submitted by /u/prfitofthesngularity [link] [comments]  ( 87 min )
  • Open

    Question about baselines
    I am going through Sergey Levine's course on Deep RL and he has the slide below on baselines. As I understand it his derivation wouldn't work because if we use the average reward as in the slide, this quantity is also itself random so it is not ok to just treat it like a constant and therefore there would still be bias (though perhaps not a very large one). Does this make sense? https://preview.redd.it/imgae2882pl91.png?width=4064&format=png&auto=webp&s=0626ad16a24f2bac82fa22cd2c4bc354075f9483 submitted by /u/randomkolmogorov [link] [comments]  ( 99 min )
    Any rl papers on simply maximizing value error during the learning phase for a separate test policy?
    submitted by /u/jms4607 [link] [comments]  ( 95 min )
    RL finds the optimal result for MIMO beamforming!
    Quite interesting that RL finds the optimal result for the MIMO beamforming problem, which is a nonconvex optimization problem and also NP-hard. For the case of N=4, K=4, and Power = 10, Over 100 random samples,MMSE's sum rate is about 8.3. The Branch-reduction-and-bound algorithm computes the optimal sum rate, which is 9.8. The REINFORCEMENT learning algorithm gets 9.756. https://github.com/AI4Finance-Foundation/RLSolver/tree/main/rlsolver/rlsolver_mimo_beamforming submitted by /u/Character-Meat-9176 [link] [comments]  ( 88 min )
  • Open

    New DeepMind AI Is Learning Soccer
    submitted by /u/kenickh [link] [comments]  ( 87 min )
    Weird result while training NN
    I am getting weird results while training the neural network. I am training the network using the Jupyter. Here, I would train the network, and would save the model on a file if we get better rewards. The thing is that when I train the NN for the first time, I would not get a good result. But, when I re-run the code the next time on the same Jupyter session, it performs extremely well. However, when I re-run the next time, the performance falls. Then, re-running after one or two times on the same session, it again performs extremely well. Also, every time I run the code by restarting the Jupyter session, I would not get the desired output. So, I wonder what is going on? submitted by /u/Icy_Improvement_5527 [link] [comments]  ( 88 min )
    Deep Dive into approaches for handling Noisy Labels with Deep Neural Networks[Article]
    submitted by /u/JoshuaDaD [link] [comments]  ( 86 min )
  • Open

    CircuitNet: An Open-Source Dataset for Machine Learning Applications in Electronic Design Automation (EDA). (arXiv:2208.01040v4 [cs.LG] UPDATED)
    The electronic design automation (EDA) community has been actively exploring machine learning (ML) for very large-scale integrated computer-aided design (VLSI CAD). Many studies explored learning-based techniques for cross-stage prediction tasks in the design flow to achieve faster design convergence. Although building ML models usually requires a large amount of data, most studies can only generate small internal datasets for validation because of the lack of large public datasets. In this essay, we present the first open-source dataset called CircuitNet for ML tasks in VLSI CAD.  ( 2 min )
    Making a Spiking Net Work: Robust brain-like unsupervised machine learning. (arXiv:2208.01204v2 [cs.NE] UPDATED)
    The surge in interest in Artificial Intelligence (AI) over the past decade has been driven almost exclusively by advances in Artificial Neural Networks (ANNs). While ANNs set state-of-the-art performance for many previously intractable problems, the use of global gradient descent necessitates large datasets and computational resources for training, potentially limiting their scalability for real-world domains. Spiking Neural Networks (SNNs) are an alternative to ANNs that use more brain-like artificial neurons and can use local unsupervised learning to rapidly discover sparse recognizable features in the input data. SNNs, however, struggle with dynamical stability and have failed to match the accuracy of ANNs. Here we show how an SNN can overcome many of the shortcomings that have been identified in the literature, including offering a principled solution to the dynamical "vanishing spike problem", to outperform all existing shallow SNNs and equal the performance of an ANN. It accomplishes this while using unsupervised learning with unlabeled data and only 1/50th of the training epochs (labeled data is used only for a simple linear readout layer). This result makes SNNs a viable new method for fast, accurate, efficient, explainable, and re-deployable machine learning with unlabeled data.  ( 3 min )

  • Open

    [R] From Motor Control to Team Play in Simulated Humanoid Football - DeepMind 2022
    Paper: https://arxiv.org/abs/2105.12196 Youtube: https://www.youtube.com/watch?v=KHMwq9pv7mg Abstract: Intelligent behaviour in the physical world exhibits structure at multiple spatial and temporal scales. Although movements are ultimately executed at the level of instantaneous muscle tensions or joint torques, they must be selected to serve goals defined on much longer timescales, and in terms of relations that extend far beyond the body itself, ultimately involving coordination with other agents. Recent research in artificial intelligence has shown the promise of learning-based approaches to the respective problems of complex movement, longer-term planning and multi-agent coordination. However, there is limited research aimed at their integration. We study this problem by training …  ( 107 min )
    Examples of good articles that combines machine learning and health outcomes [Discussion]
    Dear redditors: I am very fond of your post related to articles that uses poorly machine learning on health related outcomes. But, there is any paper that you consider good, and why? submitted by /u/clbustos [link] [comments]  ( 88 min )
    [R] Transformers are Sample Efficient World Models - IRIS - University of Geneva - With the equivalent of only two hours of gameplay in the Atari 100k benchmark, IRIS achieves a mean human normalized score of 1.046, and outperforms humans on 10 out of 26 games!
    Paper: https://arxiv.org/abs/2209.00588 Github: https://github.com/eloialonso/iris Abstract: Deep reinforcement learning agents are notoriously sample inefficient, which considerably limits their application to real-world problems. Recently, many model-based methods have been designed to address this issue, with learning in the imagination of a world model being one of the most prominent approaches. However, while virtually unlimited interaction with a simulated environment sounds appealing, the world model has to be accurate over extended periods of time. Motivated by the success of Transformers in sequence modeling tasks, we introduce IRIS, a data-efficient agent that learns in a world model composed of a discrete autoencoder and an autoregressive Transformer. With the equivalent of only two hours of gameplay in the Atari 100k benchmark, IRIS achieves a mean human normalized score of 1.046, and outperforms humans on 10 out of 26 games. Our approach sets a new state of the art for methods without lookahead search, and even surpasses MuZero. To foster future research on Transformers and world models for sample-efficient reinforcement learning, we release our codebase at this https URL. https://preview.redd.it/70yubh27sil91.jpg?width=981&format=pjpg&auto=webp&s=0d51b9e6d350d6b16b199b80bf5815f5b338123b https://preview.redd.it/k0wh7g27sil91.jpg?width=965&format=pjpg&auto=webp&s=3a956ce30bae95f3e2217f58d4cc96c66f259b52 https://preview.redd.it/xz3vyf27sil91.jpg?width=1369&format=pjpg&auto=webp&s=dac91c84ddaa35b9738bef709d4adfb6f205ee7a submitted by /u/Singularian2501 [link] [comments]  ( 89 min )
    [P] AI Assistant that finds the best restaurant/business/service.
    My fiancee's tire blew out last week and she spent 40 minutes searching through reviews tyring to find a decent mechanic, and then another 40 actually calling to get an appointment. She was understandably frustrated throughout this process. She's a medical resident that works a 100 hours a week and has very little time for this kind of bull. During the pandemic, as I imagine is the case with many couples, we've grown quite close. Her rage during this process became my rage. Her suffering, my suffering. And so, as all engineers must, I donned my remote work cape, marched over to my stand-up desk with the sit-down feature, and began to code with more enthusiasm than a thousand labradors. What emerged at the end was an overengineered single-utility AI assistant that helps you find the best-in-class option for any business or restaurant so no one ever has to see their loved ones suffer as I have. Try it out if you're interested and let me know what you think: +1 (213) 577-2447. Right now it can't actually book an appointment, but every great story needs a sequel... ​ https://preview.redd.it/mxm8x36dqil91.png?width=748&format=png&auto=webp&s=c9cebed1bf60e72d96baf02ceadaea4ddc863673 submitted by /u/SudoSharma [link] [comments]  ( 89 min )
    [Discussion] Tradeoff between specificity of observations and quantity
    How do people think about the tradeoff between specificity (for lack of a better word) of observations and quantity of observations when building models? For an imperfect toy example, suppose we are trying to predict housing prices in 5 different cities across the world, where we have data from each city and a shared feature space. We expect a lot of the same patterns to be useful across cities, but we also suspect each city’s target has a slightly different relationship with the feature space. The two extremes are to (a) include the entire dataset when building one model, or (b) create an individual model for each city. Ideally, we’d like to take advantage of learnings from the entire dataset, while honing in on each particular city’s unique distributions/relationships. I understand tree ensembles (Random Forest, GBDTs) would get at some of this simply by including “City” as an input, but I feel like there could be better approaches. Any thoughts/insights to consider? submitted by /u/Captain___Mutato [link] [comments]  ( 89 min )
    [D] Video: The hidden dangers of loading open-source AI models (ARBITRARY CODE EXPLOIT!)
    https://youtu.be/2ethDz9KnLk Did you know that something as simple as loading a model can execute arbitrary code on your machine? Try the model: https://huggingface.co/ykilcher/totally-harmless-model Get the code: https://github.com/yk/patch-torch-save OUTLINE: 0:00 - Introduction 1:10 - Sponsor: Weights & Biases 3:20 - How Hugging Face models are loaded 5:30 - From PyTorch to pickle 7:10 - Understanding how pickle saves data 13:00 - Executing arbitrary code 15:05 - The final code 17:25 - How can you protect yourself? submitted by /u/ykilcher [link] [comments]  ( 105 min )
    [D] A solution to the reproducibility crisis and model gate keeping in deep learning research
    tl;dr Open source datasets that force weights to be open sourced as well. Current State Right now we have two major issues. The first is big tech wishes to play model gatekeeper. We see this in yesterday's discussion on a Senior Google AI engineer finding it unacceptable that the stable diffusion weights are publicly available. The second issue is that big tech is attempting to set the precedent that any publicly available copywritten material is fair game for training. You can see that in this OpenAI opinion and Nat Friedman's (CEO Github) tweet here. I am of the opinion that they are doing serious mental gymnastics to justify their flagrant disregard of copyright. The reason for that is that fair use in the US is a 4 prong test. These commercial use cases fail all four. Looking at …  ( 98 min )
    [D] In which ML field can I make significant contribution without significant compute?
    I want to do research, but I don't have acess to a lot compute. At max, I can invest ~2000$ in a GPU setup. Which research areas are most fitting for this setting? ( Without considering the individuual prior knowledge, only the required compute) submitted by /u/chipz6174 [link] [comments]  ( 104 min )
    [Discussion] Image download automation
    For training my machine learning, I would like to download a lot of high qulity pictures from google and istock. How can I download high quality pictures from istoch and google? Thank you!! submitted by /u/Agorian1992 [link] [comments]  ( 88 min )
    [D][R] Is there any library/package for Pytorch model's prediction analysis, such as distribution of predicted classes or how confident the predictions are, etc?
    I have a PyTorch model that I want to analyze its predictions. So far, if I want to know what classes it predicts, how frequent are the predictions, confidence of the predictions, etc., I generally have to write code from scratch to do that. Is there any library/package out there that can help me achieving this? Something like what torchmetrics does for some common metrics that PyTorch does not support out-of-the-box compared to TensorFlow. submitted by /u/AerysSk [link] [comments]  ( 106 min )
    [D] Where is AutoML for NNs?
    Hey guys, I was just reading a couple papers like AutoML-Zero and I read about Google’s Cloud AutoML, and I was wondering what the latest research on automatically generating NNs for various tasks is. Does such a thing even exist yet? I can’t seem to find anything. Obviously there isn’t any AutoML that can say generate a transformer or anything complicated, but is there anything that can, say, generate a good solution for a problem like CIFAR-10/100? submitted by /u/sjames898 [link] [comments]  ( 90 min )
    [D] Why DeepMind’s breakthrough ML algorithm was so tricky to train
    First published in 2013, Deep Q-Networks (DQN) learn to take actions that outperform humans in a suite of Atari games, taking only the pixel values of the game as input. Learning such effective behaviour directly from experience is a remarkable achievement. It’s the first hint at a viable path towards Artificial General Intelligence - the ability of computers to get intelligent across a wide range of tasks, similar or greater than humans are! Part of the reason DQN was such an impressive advance is that Q-learning - the subtype of Reinforcement Learning used - is very hard to train. This blog outlines why it's so hard and how DeepMind cracked it https://blog.delta-academy.xyz/why-deepmind-dqn-hard-to-train submitted by /u/Ok-Craft-9908 [link] [comments]  ( 89 min )
    NLP based app reminds you someone's name immediately, if you forget it [P]
    I got laid off recently, and aside from spending time w/ family and casually looking for a new job, I built this creepy, likely illegal, app to remind you someone's name if you forget it. People always say they forget someone's name right away, so I had this gimmicky idea to use Apache OpenNLP and some simple heuristics to recognize human names in detected ambient voice -> text conversations. It's a little janky, but works often, and was fun to make. I'm not sure on the legality of using this w/o prior consent.. but it's a stab at addressing the name problem. Check it out: https://youtu.be/YlSHJylk2Co submitted by /u/GoochCommander [link] [comments]  ( 92 min )
    [D] Laptop For Machine Learning and Deep Learning
    Many places I read, they say that in the end I'll have to rely on online GPUs like Colab and Kaggle for processing and training my Data. Hence I wanted some experienced perspective into it. Confused between buying HP Victus, i7 11th Gen H, GeForce RTX 3060, 16 GB DDR4 RAM, 512 SSD V/s Apple MacBooks M1 pro / M2. submitted by /u/Substantial-Box7478 [link] [comments]  ( 92 min )
    [Discussion] Is chain-of-thought and diffusion model analogous to recurrent NN?
    Are the ideas of "iterative reasoning" in chain-of-thought prompting (as shown with Google PaLM model) and diffusion model (recursive refinement of images) analogous to recurrent NN? Or they are just not comparable? submitted by /u/Miserable_Coast [link] [comments]  ( 101 min )
    [D] Do you expect any large legal issues around AI like DALLE-2?
    Since images generated by a licensed user of Dalle are allowed to be commercialized and the ai is trained on a dataset that includes copyrighted images I could definitely see a large lawsuit looming in the distance for the developers. If you are an artist or represent an artist with a unique art style there would be a vested monetary interest in ensuring that works in their style can’t be easily reproduced without receiving continued compensation. So I could see are large group of designers and artists pushing for it to be illegal to use copyrighted works in the datasets that Ai are trained on, do you think there is legal precedent for or against this? Also which side would you fall on? I’m not super worried that ai is going to replace artist anytime soon since even if anyone can type a sentence to generate art, someone with the technical skill of an artist still is required to make any changes to that art or to make new forms of art not yet in the dataset but I can see the worry that artists would have and why they would think they need to push for the removal of their works from learning datasets. submitted by /u/moopzilla [link] [comments]  ( 104 min )
    [R] Transformers are Sample Efficient World Models: With the equivalent of only two hours of gameplay in the Atari 100k benchmark, IRIS outperforms humans on 10 out of 26 games and surpasses MuZero.
    Saw this paper posted on Twitter earlier: https://arxiv.org/abs/2209.00588 Abstract: DeepRL agents are notoriously sample inefficient, which considerably limits their application to real-world problems. Recently, many model-based methods have been designed to address this issue, with learning in the imagination of a world model being one of the most prominent approaches. However, while virtually unlimited interaction with a simulated environment sounds appealing, the world model has to be accurate over extended periods of time. Motivated by the success of Transformers in sequence modeling tasks, we introduce IRIS, a data-efficient agent that learns in a world model composed of a discrete autoencoder and an autoregressive Transformer. With the equivalent of only two hours of gameplay in the Atari 100k benchmark, IRIS achieves a mean human normalized score of 1.046, and outperforms humans on 10 out of 26 games. Our approach sets a new state of the art for methods without lookahead search, and even surpasses MuZero. To foster future research on Transformers and world models for sample-efficient reinforcement learning, we release our codebase at https://github.com/eloialonso/iris submitted by /u/hardmaru [link] [comments]  ( 106 min )
    [P] Run Stable Diffusion CPU ONLY With Web Interface Install guide
    this video shows you how you can install stable-diffuison on almost any computer regardless of your graphics card and use an easy to navigate website for your creations. It renders slowly but it works. video: https://www.youtube.com/watch?v=iwHfsDTD8U0 github repo: https://github.com/darkhemic/stable-diffusion-cpuonly submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 88 min )
    [P] Run Stable Diffusion on your M1 Mac’s GPU
    A group of open source hackers forked Stable Diffusion on GitHub and optimized the model to run on Apple's M1 chip, enabling images to be generated in ~ 15 seconds (512x512 pixels, 50 diffusion steps). This is only a magnitude slower than NVIDIA GPUs, if we compare with batch processing capabilities (from my experience, I can get a batch of 10-20 images generated in similar time frame). Link to the article for more information: https://replicate.com/blog/run-stable-diffusion-on-m1-mac Link to GitHub fork: https://github.com/magnusviri/stable-diffusion Update, this fork has been merged back into the main 'fork': https://github.com/lstein/stable-diffusion submitted by /u/hardmaru [link] [comments]  ( 89 min )
  • Open

    I'm introducing my students to the concept of AI and am looking for videos/clips that can potentially challenge the conceptions of what's living or not.
    These are going to be Grade 8 (~13yo) students who may have never heard of artificial intelligence. After having gone over the biology unit and defining the characteristics of life, I wanted to have a fun lesson about hypotheticals that could challenge what we understand life to be. For example, I'm planning to have them chat with an AI chatbot. It's meant to just be a pretty chill fun lesson so a video/clip that I could show them even showing AI in media/television or something would work. I saw a clip from the video game Detroit: Become Human that featured a robot begging for its life and being executed without remorse. Something dystopian like that could definitely work (although I'd have to vet them for appropriate violence, etc.) but yeah, pretty much anything that might make a Grade 8 go "huh, maybe the definition of living isn't so clear-cut" or at least "huh, I dunno what I'd do in that situation" kinda thing. Thank you for the help! submitted by /u/Pheophyting [link] [comments]  ( 89 min )
    AI in video decode.
    I'm not aware of a video player that will denoise/interpolate/upscale using AI instead of simple algorithms. Does such a thing exist? Dark scenes especially are not handled very well by compression algorithms and AI could resolve that. It seems like a waste to be watching TV with a powerful GPU that is idle. submitted by /u/sasksean [link] [comments]  ( 87 min )
    AI assistant to help you find the best business/service/restaurant
    submitted by /u/SudoSharma [link] [comments]  ( 89 min )
    My New Workflow :Behind the Scenes #1 creating a song and video with Syn...
    submitted by /u/prfitofthesngularity [link] [comments]  ( 87 min )
    Using AI to decode speech from brain activity
    submitted by /u/estasfuera [link] [comments]  ( 86 min )
    How can AI change your Financial Future?
    submitted by /u/SamuelSmith1416 [link] [comments]  ( 86 min )
    The Footage in This Sci-Fi Movie Project Comes From AI-Generated Images - "Salt" is built entirely with AI-generated art. Visuals are created by tapping Midjourney, Stable Diffusion and DALL-E 2, which can essentially draw anything you want by relying on a mere text description from the user
    submitted by /u/magenta_placenta [link] [comments]  ( 87 min )
    Watch how an AI system learns to play soccer from scratch
    submitted by /u/magenta_placenta [link] [comments]  ( 86 min )
    The Wandering Village 🎵 but with AI generated art ✔ [Wombo Dream and DALL E 2] EQ REMIX
    submitted by /u/FreshRelaxation [link] [comments]  ( 87 min )
    Generating Fashion Using AI
    submitted by /u/magenta_placenta [link] [comments]  ( 87 min )
    AI-Generated Bible Art
    submitted by /u/estasfuera [link] [comments]  ( 86 min )
    Artistic Woman
    submitted by /u/widgia [link] [comments]  ( 87 min )
    In case your plans for this weekend haven't yet solidified! :)
    submitted by /u/techn0_cratic [link] [comments]  ( 87 min )
    Would you still study comp sci given current technological conditions?
    Hey everybody! I work in customer success and I am considering pursuing the MCIT program at Penn to switch careers and become a software engineer. After befriending the engineering team at my company and seeing all the cool stuff they do, I know I want to get into coding so I can actually build something new that I can be proud of. However, because I am a layman with only some webdev experience (html, css, js), I am having trouble separating the wheat from the chaff when it comes to AI and the future of programming. Some people say it will just increase productivity and necessitate more engineers while others claim AI programming will hollow out developer roles. A masters is a big financial investment and I would have to work less while getting it, so I wanted to consult people who actually understand what AI is and is not capable of before seriously commiting more time to my application. Of course, I am talking to people in my life who are involved with programming about this, but I also want to collect some opinions from people who won't "yes man" me. I know no one has a crystal ball, but I would still appreciate some best guesses. Thanks! :) submitted by /u/Jumpy_Ad830 [link] [comments]  ( 92 min )
    Personalizing Text-to-Image Generation using Textual Inversion
    submitted by /u/OnlyProggingForFun [link] [comments]  ( 87 min )
    Need a little guidance
    Hi guys need a little help on a little project I am making, I am making a chatbot using android studio, but the bot's response will be scripted base on my code. What I need help is how can I implement into android studio the scriot I made for the bot. Any help will be helpful submitted by /u/creative-name123 [link] [comments]  ( 91 min )
  • Open

    Little Learning Machines - The Reinforcement Learning game is available to WISHLIST NOW!! 🎉🎉 Take a look at our first teaser trailer below, what would you train these little machines to do?
    submitted by /u/LoveFearLearn [link] [comments]  ( 87 min )
    "Transformers are Sample Efficient World Models", Micheli et al 2022 (w/2h gameplay in the Atari 100k benchmark, IRIS outperforms humans on 10/26 games, and surpasses MuZero)
    submitted by /u/gwern [link] [comments]  ( 97 min )
    "An Exact and Interpretable Solution to Wordle", Bertsimas & Paskov 2022
    submitted by /u/gwern [link] [comments]  ( 87 min )
    Observing pixels and one-hot encoded last action
    Hi! I'm currently training a custom environment using PPO + GRU. In this environment, I believe it is essential to observe the agent's past observation that is one-hot encoded. Basically, the agent cannot observe its own position (for a few rare occasions). A human could infer its position by recalling to past keys that were pressed. The RGB pixel observation of 84x84 is propagated through nature CNN resulting in 3136 features. The one-hot encoded last action is is fed to a hidden layer of size 128. The visual and the one-hot latent features are concatenated and then fed to a GRU layer. During training, the gradient norm of the one-hot encoder gets really small (2e-3). The gradient norm of the visual encoder is at 0.15. This leads me to two assumptions: (a) The one-hot encoded features are so well represented that no further parameters need to be modified. (b) The policy ignores the one-hot input, because it thinks it is not useful. I'd rather go for assumption b, because the agent does not learn a reasonable policy. Does anybody have an idea on what could help to motivate the architecture to put more attention on the one-hot encodings? An alternative would be to somehow render the last action onto the pixels. But still solving such issue might be interesting to other scenarios. submitted by /u/LilHairdy [link] [comments]  ( 89 min )
  • Open

    How The Chefz serves the perfect meal with Amazon Personalize
    This is a guest post by Ramzi Alqrainy, Chief Technology Officer, The Chefz. The Chefz is a Saudi-based online food delivery startup, founded in 2016. At the core of The Chefz’s business model is enabling its customers to order food and sweets from top elite restaurants, bakeries, and chocolate shops. In this post, we explain […]  ( 4 min )
    Distributed training with Amazon EKS and Torch Distributed Elastic
    Distributed deep learning model training is becoming increasingly important as data sizes are growing in many industries. Many applications in computer vision and natural language processing now require training of deep learning models, which are growing exponentially in complexity and are often trained with hundreds of terabytes of data. It then becomes important to use […]  ( 13 min )
  • Open

    How to Write a High-quality Article with Just ONE CLICK Using AI
    As a writer, you know the importance of creating high-quality content. Not only does quality content help your website or blog stand out…  ( 12 min )
  • Open

    7+ Best Books to Learn Neural Networks in 2022 for Beginners (Updated)
    submitted by /u/Lakshmireddys [link] [comments]  ( 86 min )
    Instance segmentation with lightning-flash: error in the annotations?
    Hi, I created a toy dataset for instance segmentation. The grey scaled images were polygon annotated with makesense.ai (coco format). Each image has two instances of the same kind of object: Two instances labeled as \"chromosome\" lightning-flash trained an instance segmentation model on the dataset.: model = InstanceSegmentation( head="mask_rcnn", backbone="resnet18_fpn", num_classes=datamodule.num_classes,pretrained = False, ) and print(datamodule.labels) print(datamodule.num_classes) yields ['background', 'chromosome'] 2 The model prediction is a unique mask: ​ Unique mask (left) Whereas I want a prediction with two masks. Should I need to relabel the images with two labels: chromosome1 and chromosome2 ? For example : Two annotations with different labels. Would it be possible to edit the annotation file without relabeling each image? Thanks for your advices. submitted by /u/jean-pat [link] [comments]  ( 87 min )
    Everything you need to know about Activation Functions[Article]
    submitted by /u/JoshuaDaD [link] [comments]  ( 138 min )
  • Open

    AI explains human quirks
    Humans are weird. The question is, do AI training datasets capture the full scope of our weirdness? I decided to see if GPT-3 could successfully predict other human quirks when prompted with a short list of them. Here's what I gave GPT-3 DaVinci to complete: Human things that  ( 4 min )
    Bonus: Ada explains human quirks
    AI Weirdness: the strange side of machine learning  ( 2 min )
  • Open

    MIND: Maximum Mutual Information Based Neural Decoder. (arXiv:2205.07061v2 [cs.IT] UPDATED)
    We are assisting at a growing interest in the development of learning architectures with application to digital communication systems. Herein, we consider the detection/decoding problem. We aim at developing an optimal neural architecture for such a task. The definition of the optimal criterion is a fundamental step. We propose to use the mutual information (MI) of the channel input-output signal pair, which yields to the minimization of the a-posteriori information of the transmitted codeword given the communication channel output observation. The computation of the a-posteriori information is a formidable task, and for the majority of channels it is unknown. Therefore, it has to be learned. For such an objective, we propose a novel neural estimator based on a discriminative formulation. This leads to the derivation of the mutual information neural decoder (MIND). The developed neural architecture is capable not only to solve the decoding problem in unknown channels, but also to return an estimate of the average MI achieved with the coding scheme, as well as the decoding error probability. Several numerical results are reported and compared with maximum a-posteriori and maximum likelihood decoding strategies.
    Experiments on Anomaly Detection in Autonomous Driving by Forward-Backward Style Transfers. (arXiv:2207.06055v2 [cs.CV] UPDATED)
    Great progress has been achieved in the community of autonomous driving in the past few years. As a safety-critical problem, however, anomaly detection is a huge hurdle towards a large-scale deployment of autonomous vehicles in the real world. While many approaches, such as uncertainty estimation or segmentation-based image resynthesis, are extremely promising, there is more to be explored. Especially inspired by works on anomaly detection based on image resynthesis, we propose a novel approach for anomaly detection through style transfer. We leverage generative models to map an image from its original style domain of road traffic to an arbitrary one and back to generate pixelwise anomaly scores. However, our experiments have proven our hypothesis wrong, and we were unable to produce significant results. Nevertheless, we want to share our findings, so that others can learn from our experiments.
    Learning "best" kernels from data in Gaussian process regression. With application to aerodynamics. (arXiv:2206.02563v2 [stat.ML] UPDATED)
    This paper introduces algorithms to select/design kernels in Gaussian process regression/kriging surrogate modeling techniques. We adopt the setting of kernel method solutions in ad hoc functional spaces, namely Reproducing Kernel Hilbert Spaces (RKHS), to solve the problem of approximating a regular target function given observations of it, i.e. supervised learning. A first class of algorithms is kernel flow, which was introduced in the context of classification in machine learning. It can be seen as a cross-validation procedure whereby a "best" kernel is selected such that the loss of accuracy incurred by removing some part of the dataset (typically half of it) is minimized. A second class of algorithms is called spectral kernel ridge regression, and aims at selecting a "best" kernel such that the norm of the function to be approximated is minimal in the associated RKHS. Within Mercer's theorem framework, we obtain an explicit construction of that "best" kernel in terms of the main features of the target function. Both approaches of learning kernels from data are illustrated by numerical examples on synthetic test functions, and on a classical test case in turbulence modeling validation for transonic flows about a two-dimensional airfoil.
    NeuralNEB -- Neural Networks can find Reaction Paths Fast. (arXiv:2207.09971v3 [physics.comp-ph] UPDATED)
    Quantum mechanical methods like Density Functional Theory (DFT) are used with great success alongside efficient search algorithms for studying kinetics of reactive systems. However, DFT is prohibitively expensive for large scale exploration. Machine Learning (ML) models have turned out to be excellent emulators of small molecule DFT calculations and could possibly replace DFT in such tasks. For kinetics, success relies primarily on the models capability to accurately predict the Potential Energy Surface (PES) around transition-states and Minimal Energy Paths (MEPs). Previously this has not been possible due to scarcity of relevant data in the literature. In this paper we train state of the art equivariant Graph Neural Network (GNN)-based models on around 10.000 elementary reactions from the Transition1x dataset. We apply the models as potentials for the Nudged Elastic Band (NEB) algorithm and achieve a Mean Average Error (MAE) of 0.13+/-0.03 eV on barrier energies on unseen reactions. We compare the results against equivalent models trained on QM9 and ANI1x. We also compare with and outperform Density Functional based Tight Binding (DFTB) on both accuracy and computational resource. The implication is that ML models, given relevant data, are now at a level where they can be applied for downstream tasks in quantum chemistry transcending prediction of simple molecular features.
    History Compression via Language Models in Reinforcement Learning. (arXiv:2205.12258v3 [cs.LG] UPDATED)
    In a partially observable Markov decision process (POMDP), an agent typically uses a representation of the past to approximate the underlying MDP. We propose to utilize a frozen Pretrained Language Transformer (PLT) for history representation and compression to improve sample efficiency. To avoid training of the Transformer, we introduce FrozenHopfield, which automatically associates observations with pretrained token embeddings. To form these associations, a modern Hopfield network stores these token embeddings, which are retrieved by queries that are obtained by a random but fixed projection of observations. Our new method, HELM, enables actor-critic network architectures that contain a pretrained language Transformer for history representation as a memory module. Since a representation of the past need not be learned, HELM is much more sample efficient than competitors. On Minigrid and Procgen environments HELM achieves new state-of-the-art results. Our code is available at https://github.com/ml-jku/helm.
    GraphMLP: A Graph MLP-Like Architecture for 3D Human Pose Estimation. (arXiv:2206.06420v2 [cs.CV] UPDATED)
    Modern multi-layer perceptron (MLP) models have shown competitive results in learning visual representations without self-attention. However, existing MLP models are not good at capturing local details and lack prior knowledge of human body configurations, which limits their modeling power for skeletal representation learning. To address these issues, we propose a simple yet effective graph-reinforced MLP-Like architecture, named GraphMLP, that combines MLPs and graph convolutional networks (GCNs) in a global-local-graphical unified architecture for 3D human pose estimation. GraphMLP incorporates the graph structure of human bodies into an MLP model to meet the domain-specific demand of the 3D human pose, while allowing for both local and global spatial interactions. Furthermore, we propose to flexibly and efficiently extend the GraphMLP to the video domain and show that complex temporal dynamics can be effectively modeled in a simple way with negligible computational cost gains in the sequence length. To the best of our knowledge, this is the first MLP-Like architecture for 3D human pose estimation in a single frame and a video sequence. Extensive experiments show that the proposed GraphMLP achieves state-of-the-art performance on two datasets, i.e., Human3.6M and MPI-INF-3DHP. Our source code will be open-sourced.
    BASED-XAI: Breaking Ablation Studies Down for Explainable Artificial Intelligence. (arXiv:2207.05566v2 [cs.LG] UPDATED)
    Explainable artificial intelligence (XAI) methods lack ground truth. In its place, method developers have relied on axioms to determine desirable properties for their explanations' behavior. For high stakes uses of machine learning that require explainability, it is not sufficient to rely on axioms as the implementation, or its usage, can fail to live up to the ideal. As a result, there exists active research on validating the performance of XAI methods. The need for validation is especially magnified in domains with a reliance on XAI. A procedure frequently used to assess their utility, and to some extent their fidelity, is an ablation study. By perturbing the input variables in rank order of importance, the goal is to assess the sensitivity of the model's performance. Perturbing important variables should correlate with larger decreases in measures of model capability than perturbing less important features. While the intent is clear, the actual implementation details have not been studied rigorously for tabular data. Using five datasets, three XAI methods, four baselines, and three perturbations, we aim to show 1) how varying perturbations and adding simple guardrails can help to avoid potentially flawed conclusions, 2) how treatment of categorical variables is an important consideration in both post-hoc explainability and ablation studies, and 3) how to identify useful baselines for XAI methods and viable perturbations for ablation studies.
    Differentiable WORLD Synthesizer-based Neural Vocoder With Application To End-To-End Audio Style Transfer. (arXiv:2208.07282v2 [eess.AS] UPDATED)
    In this paper, we propose a differentiable WORLD synthesizer and demonstrate its use in end-to-end audio style transfer tasks such as (singing) voice conversion and the DDSP timbre transfer task. Accordingly, our baseline differentiable synthesizer has no model parameters, yet it yields adequate synthesis quality. We can extend the baseline synthesizer by appending lightweight black-box postnets which apply further processing to the baseline output in order to improve fidelity. An alternative differentiable approach considers extraction of the source excitation spectrum directly, which can improve naturalness albeit for a narrower class of style transfer applications. The acoustic feature parameterization used by our approaches has the added benefit that it naturally disentangles pitch and timbral information so that they can be modeled separately. Moreover, as there exists a robust means of estimating these acoustic features from monophonic audio sources, it allows for parameter loss terms to be added to an end-to-end objective function, which can help convergence and/or further stabilize (adversarial) training.
    Towards an Improved Understanding of Software Vulnerability Assessment Using Data-Driven Approaches. (arXiv:2207.11708v2 [cs.SE] UPDATED)
    The thesis advances the field of software security by providing knowledge and automation support for software vulnerability assessment using data-driven approaches. Software vulnerability assessment provides important and multifaceted information to prevent and mitigate dangerous cyber-attacks in the wild. The key contributions include a systematisation of knowledge, along with a suite of novel data-driven techniques and practical recommendations for researchers and practitioners in the area. The thesis results help improve the understanding and inform the practice of assessing ever-increasing vulnerabilities in real-world software systems. This in turn enables more thorough and timely fixing prioritisation and planning of these critical security issues.
    Wild Patterns Reloaded: A Survey of Machine Learning Security against Training Data Poisoning. (arXiv:2205.01992v2 [cs.LG] UPDATED)
    The success of machine learning is fueled by the increasing availability of computing power and large training datasets. The training data is used to learn new models or update existing ones, assuming that it is sufficiently representative of the data that will be encountered at test time. This assumption is challenged by the threat of poisoning, an attack that manipulates the training data to compromise the model's performance at test time. Although poisoning has been acknowledged as a relevant threat in industry applications, and a variety of different attacks and defenses have been proposed so far, a complete systematization and critical review of the field is still missing. In this survey, we provide a comprehensive systematization of poisoning attacks and defenses in machine learning, reviewing more than 100 papers published in the field in the last 15 years. We start by categorizing the current threat models and attacks, and then organize existing defenses accordingly. While we focus mostly on computer-vision applications, we argue that our systematization also encompasses state-of-the-art attacks and defenses for other data modalities. Finally, we discuss existing resources for research in poisoning, and shed light on the current limitations and open research questions in this research field.
    The Selectively Adaptive Lasso. (arXiv:2205.10697v2 [stat.ML] UPDATED)
    Machine learning regression methods allow estimation of functions without unrealistic parametric assumptions. Although they can perform exceptionally in prediction error, most lack theoretical convergence rates necessary for semi-parametric efficient estimation (e.g. TMLE, AIPW) of parameters like average treatment effects. The Highly Adaptive Lasso (HAL) is the only regression method proven to converge quickly enough for a meaningfully large class of functions, independent of the dimensionality of the predictors. Unfortunately, HAL is not computationally scalable. In this paper we build upon the theory of HAL to construct the Selectively Adaptive Lasso (SAL), a new algorithm which retains HAL's dimension-free, nonparametric convergence rate but which also scales computationally to massive datasets. To accomplish this, we prove some general theoretical results pertaining to empirical loss minimization in nested Donsker classes. Our resulting algorithm is a form of gradient tree boosting with an adaptive learning rate, which makes it fast and trivial to implement with off-the-shelf software. Finally, we show that our algorithm retains the performance of standard gradient boosting on a diverse group of real-world datasets. SAL makes semi-parametric efficient estimators practically possible and theoretically justifiable in many big data settings.
    SSLGuard: A Watermarking Scheme for Self-supervised Learning Pre-trained Encoders. (arXiv:2201.11692v4 [cs.CR] UPDATED)
    Self-supervised learning is an emerging machine learning paradigm. Compared to supervised learning which leverages high-quality labeled datasets, self-supervised learning relies on unlabeled datasets to pre-train powerful encoders which can then be treated as feature extractors for various downstream tasks. The huge amount of data and computational resources consumption makes the encoders themselves become the valuable intellectual property of the model owner. Recent research has shown that the machine learning model's copyright is threatened by model stealing attacks, which aim to train a surrogate model to mimic the behavior of a given model. We empirically show that pre-trained encoders are highly vulnerable to model stealing attacks. However, most of the current efforts of copyright protection algorithms such as watermarking concentrate on classifiers. Meanwhile, the intrinsic challenges of pre-trained encoder's copyright protection remain largely unstudied. We fill the gap by proposing SSLGuard, the first watermarking scheme for pre-trained encoders. Given a clean pre-trained encoder, SSLGuard injects a watermark into it and outputs a watermarked version. The shadow training technique is also applied to preserve the watermark under potential model stealing attacks. Our extensive evaluation shows that SSLGuard is effective in watermark injection and verification, and it is robust against model stealing and other watermark removal attacks such as input noising, output perturbing, overwriting, model pruning, and fine-tuning.
    Unsupervised Probabilistic Models for Sequential Electronic Health Records. (arXiv:2204.07292v2 [cs.LG] UPDATED)
    We develop an unsupervised probabilistic model for heterogeneous Electronic Health Record (EHR) data. Utilizing a mixture model formulation, our approach directly models sequences of arbitrary length, such as medications and laboratory results. This allows for subgrouping and incorporation of the dynamics underlying heterogeneous data types. The model consists of a layered set of latent variables that encode underlying structure in the data. These variables represent subject subgroups at the top layer, and unobserved states for sequences in the second layer. We train this model on episodic data from subjects receiving medical care in the Kaiser Permanente Northern California integrated healthcare delivery system. The resulting properties of the trained model generate novel insight from these complex and multifaceted data. In addition, we show how the model can be used to analyze sequences that contribute to assessment of mortality likelihood.
    Centroid Approximation for Bootstrap: Improving Particle Quality at Inference. (arXiv:2110.08720v2 [cs.LG] UPDATED)
    Bootstrap is a principled and powerful frequentist statistical tool for uncertainty quantification. Unfortunately, standard bootstrap methods are computationally intensive due to the need of drawing a large i.i.d. bootstrap sample to approximate the ideal bootstrap distribution; this largely hinders their application in large-scale machine learning, especially deep learning problems. In this work, we propose an efficient method to explicitly \emph{optimize} a small set of high quality ``centroid'' points to better approximate the ideal bootstrap distribution. We achieve this by minimizing a simple objective function that is asymptotically equivalent to the Wasserstein distance to the ideal bootstrap distribution. This allows us to provide an accurate estimation of uncertainty with a small number of bootstrap centroids, outperforming the naive i.i.d. sampling approach. Empirically, we show that our method can boost the performance of bootstrap in a variety of applications.
    Polynomial convergence of iterations of certain random operators in Hilbert space. (arXiv:2202.02266v2 [math.FA] UPDATED)
    We study the convergence of a random iterative sequence of a family of operators on infinite dimensional Hilbert spaces, inspired by the Stochastic Gradient Descent (SGD) algorithm in the case of the noiseless regression, as studied in [1]. We identify conditions that are strictly broader than previously known for polynomial convergence rate in various norms, and characterize the roles the randomness plays in determining the best multiplicative constants. Additionally, we prove almost sure convergence of the sequence.
    Kernel Methods for Unobserved Confounding: Negative Controls, Proxies, and Instruments. (arXiv:2012.10315v3 [stat.ML] UPDATED)
    Negative control is a strategy for learning the causal relationship between treatment and outcome in the presence of unmeasured confounding. The treatment effect can nonetheless be identified if two auxiliary variables are available: a negative control treatment (which has no effect on the actual outcome), and a negative control outcome (which is not affected by the actual treatment). These auxiliary variables can also be viewed as proxies for a traditional set of control variables, and they bear resemblance to instrumental variables. I propose a family of algorithms based on kernel ridge regression for learning nonparametric treatment effects with negative controls. Examples include dose response curves, dose response curves with distribution shift, and heterogeneous treatment effects. Data may be discrete or continuous, and low, high, or infinite dimensional. I prove uniform consistency and provide finite sample rates of convergence. I estimate the dose response curve of cigarette smoking on infant birth weight adjusting for unobserved confounding due to household income, using a data set of singleton births in the state of Pennsylvania between 1989 and 1991.
    Non Asymptotic Bounds for Optimization via Online Multiplicative Stochastic Gradient Descent. (arXiv:2112.07110v8 [stat.ML] UPDATED)
    The gradient noise of Stochastic Gradient Descent (SGD) is considered to play a key role in its properties (e.g. escaping low potential points and regularization). Past research has indicated that the covariance of the SGD error done via minibatching plays a critical role in determining its regularization and escape from low potential points. It is however not much explored how much the distribution of the error influences the behavior of the algorithm. Motivated by some new research in this area, we prove universality results by showing that noise classes that have the same mean and covariance structure of SGD via minibatching have similar properties. We mainly consider the Multiplicative Stochastic Gradient Descent (M-SGD) algorithm as introduced by Wu et al., which has a much more general noise class than the SGD algorithm done via minibatching. We establish nonasymptotic bounds for the M-SGD algorithm mainly with respect to the Stochastic Differential Equation corresponding to SGD via minibatching. We also show that the M-SGD error is approximately a scaled Gaussian distribution with mean $0$ at any fixed point of the M-SGD algorithm. We also establish bounds for the convergence of the M-SGD algorithm in the strongly convex regime.
    Sharp bounds for the number of regions of maxout networks and vertices of Minkowski sums. (arXiv:2104.08135v2 [math.CO] UPDATED)
    We present results on the number of linear regions of the functions that can be represented by artificial feedforward neural networks with maxout units. A rank-k maxout unit is a function computing the maximum of $k$ linear functions. For networks with a single layer of maxout units, the linear regions correspond to the upper vertices of a Minkowski sum of polytopes. We obtain face counting formulas in terms of the intersection posets of tropical hypersurfaces or the number of upper faces of partial Minkowski sums, along with explicit sharp upper bounds for the number of regions for any input dimension, any number of units, and any ranks, in the cases with and without biases. Based on these results we also obtain asymptotically sharp upper bounds for networks with multiple layers.
    Density Encoding Enables Resource-Efficient Randomly Connected Neural Networks. (arXiv:1909.09153v2 [cs.LG] UPDATED)
    The deployment of machine learning algorithms on resource-constrained edge devices is an important challenge from both theoretical and applied points of view. In this article, we focus on resource-efficient randomly connected neural networks known as Random Vector Functional Link (RVFL) networks since their simple design and extremely fast training time make them very attractive for solving many applied classification tasks. We propose to represent input features via the density-based encoding known in the area of stochastic computing and use the operations of binding and bundling from the area of hyperdimensional computing for obtaining the activations of the hidden neurons. Using a collection of 121 real-world datasets from the UCI Machine Learning Repository, we empirically show that the proposed approach demonstrates higher average accuracy than the conventional RVFL. We also demonstrate that it is possible to represent the readout matrix using only integers in a limited range with minimal loss in the accuracy. In this case, the proposed approach operates only on small n-bits integers, which results in a computationally efficient architecture. Finally, through hardware Field-Programmable Gate Array (FPGA) implementations, we show that such an approach consumes approximately eleven times less energy than that of the conventional RVFL.
    Neural PPO-Clip Attains Global Optimality: A Hinge Loss Perspective. (arXiv:2110.13799v4 [cs.LG] UPDATED)
    Policy optimization is a fundamental principle for designing reinforcement learning algorithms, and one example is the proximal policy optimization algorithm with a clipped surrogate objective (PPO-Clip), which has been popularly used in deep reinforcement learning due to its simplicity and effectiveness. Despite its superior empirical performance, PPO-Clip has not been justified via theoretical proof up to date. In this paper, we establish the first global convergence rate of PPO-Clip under neural function approximation. We identify the fundamental challenges of analyzing PPO-Clip and address them with the two core ideas: (i) We reinterpret PPO-Clip from the perspective of hinge loss, which connects policy improvement with solving a large-margin classification problem with hinge loss and offers a generalized version of the PPO-Clip objective. (ii) Based on the above viewpoint, we propose a two-step policy improvement scheme, which facilitates the convergence analysis by decoupling policy search from the complex neural policy parameterization with the help of entropic mirror descent and a regression-based policy update scheme. Moreover, our theoretical results provide the first characterization of the effect of the clipping mechanism on the convergence of PPO-Clip. Through experiments, we empirically validate the reinterpretation of PPO-Clip and the generalized objective with various classifiers on various RL benchmark tasks.
    Fair mapping. (arXiv:2209.00617v1 [cs.LG])
    To mitigate the effects of undesired biases in models, several approaches propose to pre-process the input dataset to reduce the risks of discrimination by preventing the inference of sensitive attributes. Unfortunately, most of these pre-processing methods lead to the generation a new distribution that is very different from the original one, thus often leading to unrealistic data. As a side effect, this new data distribution implies that existing models need to be re-trained to be able to make accurate predictions. To address this issue, we propose a novel pre-processing method, that we coin as fair mapping, based on the transformation of the distribution of protected groups onto a chosen target one, with additional privacy constraints whose objective is to prevent the inference of sensitive attributes. More precisely, we leverage on the recent works of the Wasserstein GAN and AttGAN frameworks to achieve the optimal transport of data points coupled with a discriminator enforcing the protection against attribute inference. Our proposed approach, preserves the interpretability of data and can be used without defining exactly the sensitive groups. In addition, our approach can be specialized to model existing state-of-the-art approaches, thus proposing a unifying view on these methods. Finally, several experiments on real and synthetic datasets demonstrate that our approach is able to hide the sensitive attributes, while limiting the distortion of the data and improving the fairness on subsequent data analysis tasks.
    Tesseract: Parallelize the Tensor Parallelism Efficiently. (arXiv:2105.14500v2 [cs.DC] UPDATED)
    Together with the improvements in state-of-the-art accuracies of various tasks, deep learning models are getting significantly larger. However, it is extremely difficult to implement these large models because limited GPU memory makes it impossible to fit large models into a single GPU or even a GPU server. Besides, it is highly necessary to reduce the training time for large models. Previous methods like Megatron-LM implemented a 1-Dimensional distributed method to use GPUs to speed up the training. However, these methods have a high communication overhead and a low scaling efficiency on large-scale clusters. To solve these problems, we propose Tesseract, a highly scalable tensor parallelism with a novel design. It increases efficiency by reducing communication overhead and lowers the memory required for each GPU. By introducing the novel dimension into tensor parallelism, Tesseract greatly increases the memory capacity of tensor parallelism. Concretely, this new dimension furthermore increases the degree of tensor parallelism. Compared to previous 1-D and 2-D methods, Tesseract manages to reduce the communication cost on each layer, resulting in speedups of 1.38x and 1.53x respectively with strong scaling. In weak scaling experiments, Tesseract achieves a maximum of 4.0/1.7 times inference speedup and 3.4/1.7 times throughput improvement compared to 1-D/2-D methods, respectively. By introducing Tesseract, we offer a more efficient and scalable way to implement large deep learning models with limited GPU resources.
    Classification of Electroencephalograms during Mathematical Calculations Using Deep Learning. (arXiv:2209.00627v1 [q-bio.NC])
    Classifying Electroencephalogram(EEG) signals helps in understanding Brain-Computer Interface (BCI). EEG signals are vital in studying how the human mind functions. In this paper, we have used an Arithmetic Calculation dataset consisting of Before Calculation Signals (BCS) and During Calculation Signals (DCS). The dataset consisted of 36 participants. In order to understand the functioning of neurons in the brain, we classified BCS vs DCS. For this classification, we extracted various features such as Mutual Information (MI), Phase Locking Value (PLV), and Entropy namely Permutation entropy, Spectral entropy, Singular value decomposition entropy, Approximate entropy, Sample entropy. The classification of these features was done using RNN-based classifiers such as LSTM, BLSTM, ConvLSTM, and CNN-LSTM. The model achieved an accuracy of 99.72% when entropy was used as a feature and ConvLSTM as a classifier.
    Optimal Approximations Made Easy. (arXiv:2008.08970v3 [cs.LG] UPDATED)
    The fundamental result of Li, Long, and Srinivasan on approximations of set systems has become a key tool across several communities such as learning theory, algorithms, computational geometry, combinatorics and data analysis. The goal of this paper is to give a modular, self-contained, intuitive proof of this result for finite set systems. The only ingredient we assume is the standard Chernoff's concentration bound. This makes the proof accessible to a wider audience, readers not familiar with techniques from statistical learning theory, and makes it possible to be covered in a single self-contained lecture in a geometry, algorithms or combinatorics course.
    ID and OOD Performance Are Sometimes Inversely Correlated on Real-world Datasets. (arXiv:2209.00613v1 [cs.LG])
    Several studies have empirically compared in-distribution (ID) and out-of-distribution (OOD) performance of various models. They report frequent positive correlations on benchmarks in computer vision and NLP. Surprisingly, they never observe inverse correlations suggesting necessary trade-offs. This matters to determine whether ID performance can serve as a proxy for OOD generalization. This short paper shows that inverse correlations between ID and OOD performance do happen in real-world benchmarks. They may have been missed in past studies because of a biased selection of models. We show an example of the pattern on the WILDS-Camelyon17 dataset, using models from multiple training epochs and random seeds. Our observations are particularly striking on models trained with a regularizer that diversifies the solutions to the ERM objective. We nuance recommendations and conclusions made in past studies. (1) High OOD performance does sometimes require trading off ID performance. (2) Focusing on ID performance alone may not lead to optimal OOD performance: it can lead to diminishing and eventually negative returns in OOD performance. (3) Our example reminds that empirical studies only chart regimes achievable with existing methods: care is warranted in deriving prescriptive recommendations.
    Learning with Differentiable Algorithms. (arXiv:2209.00616v1 [cs.LG])
    Classic algorithms and machine learning systems like neural networks are both abundant in everyday life. While classic computer science algorithms are suitable for precise execution of exactly defined tasks such as finding the shortest path in a large graph, neural networks allow learning from data to predict the most likely answer in more complex tasks such as image classification, which cannot be reduced to an exact algorithm. To get the best of both worlds, this thesis explores combining both concepts leading to more robust, better performing, more interpretable, more computationally efficient, and more data efficient architectures. The thesis formalizes the idea of algorithmic supervision, which allows a neural network to learn from or in conjunction with an algorithm. When integrating an algorithm into a neural architecture, it is important that the algorithm is differentiable such that the architecture can be trained end-to-end and gradients can be propagated back through the algorithm in a meaningful way. To make algorithms differentiable, this thesis proposes a general method for continuously relaxing algorithms by perturbing variables and approximating the expectation value in closed form, i.e., without sampling. In addition, this thesis proposes differentiable algorithms, such as differentiable sorting networks, differentiable renderers, and differentiable logic gate networks. Finally, this thesis presents alternative training strategies for learning with algorithms.
    Deep Dimension Reduction for Supervised Representation Learning. (arXiv:2006.05865v3 [cs.LG] UPDATED)
    The goal of supervised representation learning is to construct effective data representations for prediction. Among all the characteristics of an ideal nonparametric representation of high-dimensional complex data, sufficiency, low dimensionality and disentanglement are some of the most essential ones. We propose a deep dimension reduction approach to learning representations with these characteristics. The proposed approach is a nonparametric generalization of the sufficient dimension reduction method. We formulate the ideal representation learning task as that of finding a nonparametric representation that minimizes an objective function characterizing conditional independence and promoting disentanglement at the population level. We then estimate the target representation at the sample level nonparametrically using deep neural networks. We show that the estimated deep nonparametric representation is consistent in the sense that its excess risk converges to zero. Our extensive numerical experiments using simulated and real benchmark data demonstrate that the proposed methods have better performance than several existing dimension reduction methods and the standard deep learning models in the context of classification and regression.
    Slowly Varying Regression under Sparsity. (arXiv:2102.10773v4 [cs.LG] UPDATED)
    We consider the problem of parameter estimation in slowly varying regression models with sparsity constraints. We formulate the problem as a mixed integer optimization problem and demonstrate that it can be reformulated exactly as a binary convex optimization problem through a novel exact relaxation. The relaxation utilizes a new equality on Moore-Penrose inverses that convexifies the non-convex objective function while coinciding with the original objective on all feasible binary points. This allows us to solve the problem significantly more efficiently and to provable optimality using a cutting plane-type algorithm. We develop a highly optimized implementation of such algorithm, which substantially improves upon the asymptotic computational complexity of a straightforward implementation. We further develop a heuristic method that is guaranteed to produce a feasible solution and, as we empirically illustrate, generates high quality warm-start solutions for the binary optimization problem. We show, on both synthetic and real-world datasets, that the resulting algorithm outperforms competing formulations in comparable times across a variety of metrics including out-of-sample predictive performance, support recovery accuracy, and false positive rate. The algorithm enables us to train models with 10,000s of parameters, is robust to noise, and able to effectively capture the underlying slowly changing support of the data generating process.
    Monotonic Gaussian process for physics-constrained machine learning with materials science applications. (arXiv:2209.00628v1 [cs.LG])
    Physics-constrained machine learning is emerging as an important topic in the field of machine learning for physics. One of the most significant advantages of incorporating physics constraints into machine learning methods is that the resulting model requires significantly less data to train. By incorporating physical rules into the machine learning formulation itself, the predictions are expected to be physically plausible. Gaussian process (GP) is perhaps one of the most common methods in machine learning for small datasets. In this paper, we investigate the possibility of constraining a GP formulation with monotonicity on three different material datasets, where one experimental and two computational datasets are used. The monotonic GP is compared against the regular GP, where a significant reduction in the posterior variance is observed. The monotonic GP is strictly monotonic in the interpolation regime, but in the extrapolation regime, the monotonic effect starts fading away as one goes beyond the training dataset. Imposing monotonicity on the GP comes at a small accuracy cost, compared to the regular GP. The monotonic GP is perhaps most useful in applications where data is scarce and noisy, and monotonicity is supported by strong physical evidence.
    Multimodal Machine Learning for Automated ICD Coding. (arXiv:1810.13348v4 [cs.LG] UPDATED)
    This study presents a multimodal machine learning model to predict ICD-10 diagnostic codes. We developed separate machine learning models that can handle data from different modalities, including unstructured text, semi-structured text and structured tabular data. We further employed an ensemble method to integrate all modality-specific models to generate ICD-10 codes. Key evidence was also extracted to make our prediction more convincing and explainable. We used the Medical Information Mart for Intensive Care III (MIMIC -III) dataset to validate our approach. For ICD code prediction, our best-performing model (micro-F1 = 0.7633, micro-AUC = 0.9541) significantly outperforms other baseline models including TF-IDF (micro-F1 = 0.6721, micro-AUC = 0.7879) and Text-CNN model (micro-F1 = 0.6569, micro-AUC = 0.9235). For interpretability, our approach achieves a Jaccard Similarity Coefficient (JSC) of 0.1806 on text data and 0.3105 on tabular data, where well-trained physicians achieve 0.2780 and 0.5002 respectively.
    Unsupervised Learning of Lagrangian Dynamics from Images for Prediction and Control. (arXiv:2007.01926v3 [cs.LG] UPDATED)
    Recent approaches for modelling dynamics of physical systems with neural networks enforce Lagrangian or Hamiltonian structure to improve prediction and generalization. However, when coordinates are embedded in high-dimensional data such as images, these approaches either lose interpretability or can only be applied to one particular example. We introduce a new unsupervised neural network model that learns Lagrangian dynamics from images, with interpretability that benefits prediction and control. The model infers Lagrangian dynamics on generalized coordinates that are simultaneously learned with a coordinate-aware variational autoencoder (VAE). The VAE is designed to account for the geometry of physical systems composed of multiple rigid bodies in the plane. By inferring interpretable Lagrangian dynamics, the model learns physical system properties, such as kinetic and potential energy, which enables long-term prediction of dynamics in the image space and synthesis of energy-based controllers.
    Heterogeneous Graph Tree Networks. (arXiv:2209.00610v1 [cs.LG])
    Heterogeneous graph neural networks (HGNNs) have attracted increasing research interest in recent three years. Most existing HGNNs fall into two classes. One class is meta-path-based HGNNs which either require domain knowledge to handcraft meta-paths or consume huge amount of time and memory to automatically construct meta-paths. The other class does not rely on meta-path construction. It takes homogeneous convolutional graph neural networks (Conv-GNNs) as backbones and extend them to heterogeneous graphs by introducing node-type- and edge-type-dependent parameters. Regardless of the meta-path dependency, most existing HGNNs employ shallow Conv-GNNs such as GCN and GAT to aggregate neighborhood information, and may have limited capability to capture information from high-order neighborhood. In this work, we propose two heterogeneous graph tree network models: Heterogeneous Graph Tree Convolutional Network (HetGTCN) and Heterogeneous Graph Tree Attention Network (HetGTAN), which do not rely on meta-paths to encode heterogeneity in both node features and graph structure. Extensive experiments on three real-world heterogeneous graph data demonstrate that the proposed HetGTCN and HetGTAN are efficient and consistently outperform all state-of-the-art HGNN baselines on semi-supervised node classification tasks, and can go deep without compromising performance.
    Online Meta-Learning for Model Update Aggregation in Federated Learning for Click-Through Rate Prediction. (arXiv:2209.00629v1 [cs.IR])
    In Federated Learning (FL) of click-through rate (CTR) prediction, users' data is not shared for privacy protection. The learning is performed by training locally on client devices and communicating only model changes to the server. There are two main challenges: (i) the client heterogeneity, making FL algorithms that use the weighted averaging to aggregate model updates from the clients have slow progress and unsatisfactory learning results; and (ii) the difficulty of tuning the server learning rate with trial-and-error methodology due to the big computation time and resources needed for each experiment. To address these challenges, we propose a simple online meta-learning method to learn a strategy of aggregating the model updates, which adaptively weighs the importance of the clients based on their attributes and adjust the step sizes of the update. We perform extensive evaluations on public datasets. Our method significantly outperforms the state-of-the-art in both the speed of convergence and the quality of the final learning results.
    Federated Quantum Natural Gradient Descent for Quantum Federated Learning. (arXiv:2209.00564v1 [quant-ph])
    The heart of Quantum Federated Learning (QFL) is associated with a distributed learning architecture across several local quantum devices and a more efficient training algorithm for the QFL is expected to minimize the communication overhead among different quantum participants. In this work, we put forth an efficient learning algorithm, namely federated quantum natural gradient descent (FQNGD), applied in a QFL framework which consists of the variational quantum circuit (VQC)-based quantum neural networks (QNN). The FQNGD algorithm admits much fewer training iterations for the QFL model to get converged and it can significantly reduce the total communication cost among local quantum devices. Compared with other federated learning algorithms, our experiments on a handwritten digit classification dataset corroborate the effectiveness of the FQNGD algorithm for the QFL in terms of a faster convergence rate on the training dataset and higher accuracy on the test one.
    Sparse Attention Acceleration with Synergistic In-Memory Pruning and On-Chip Recomputation. (arXiv:2209.00606v1 [cs.LG])
    As its core computation, a self-attention mechanism gauges pairwise correlations across the entire input sequence. Despite favorable performance, calculating pairwise correlations is prohibitively costly. While recent work has shown the benefits of runtime pruning of elements with low attention scores, the quadratic complexity of self-attention mechanisms and their on-chip memory capacity demands are overlooked. This work addresses these constraints by architecting an accelerator, called SPRINT, which leverages the inherent parallelism of ReRAM crossbar arrays to compute attention scores in an approximate manner. Our design prunes the low attention scores using a lightweight analog thresholding circuitry within ReRAM, enabling SPRINT to fetch only a small subset of relevant data to on-chip memory. To mitigate potential negative repercussions for model accuracy, SPRINT re-computes the attention scores for the few fetched data in digital. The combined in-memory pruning and on-chip recompute of the relevant attention scores enables SPRINT to transform quadratic complexity to a merely linear one. In addition, we identify and leverage a dynamic spatial locality between the adjacent attention operations even after pruning, which eliminates costly yet redundant data fetches. We evaluate our proposed technique on a wide range of state-of-the-art transformer models. On average, SPRINT yields 7.5x speedup and 19.6x energy reduction when total 16KB on-chip memory is used, while virtually on par with iso-accuracy of the baseline models (on average 0.36% degradation).
    Transformers are Sample Efficient World Models. (arXiv:2209.00588v1 [cs.LG])
    Deep reinforcement learning agents are notoriously sample inefficient, which considerably limits their application to real-world problems. Recently, many model-based methods have been designed to address this issue, with learning in the imagination of a world model being one of the most prominent approaches. However, while virtually unlimited interaction with a simulated environment sounds appealing, the world model has to be accurate over extended periods of time. Motivated by the success of Transformers in sequence modeling tasks, we introduce IRIS, a data-efficient agent that learns in a world model composed of a discrete autoencoder and an autoregressive Transformer. With the equivalent of only two hours of gameplay in the Atari 100k benchmark, IRIS achieves a mean human normalized score of 1.046, and outperforms humans on 10 out of 26 games. Our approach sets a new state of the art for methods without lookahead search, and even surpasses MuZero. To foster future research on Transformers and world models for sample-efficient reinforcement learning, we release our codebase at https://github.com/eloialonso/iris.
    Complexity of Representations in Deep Learning. (arXiv:2209.00525v1 [cs.LG])
    Deep neural networks use multiple layers of functions to map an object represented by an input vector progressively to different representations, and with sufficient training, eventually to a single score for each class that is the output of the final decision function. Ideally, in this output space, the objects of different classes achieve maximum separation. Motivated by the need to better understand the inner working of a deep neural network, we analyze the effectiveness of the learned representations in separating the classes from a data complexity perspective. Using a simple complexity measure, a popular benchmarking task, and a well-known architecture design, we show how the data complexity evolves through the network, how it changes during training, and how it is impacted by the network design and the availability of training samples. We discuss the implications of the observations and the potentials for further studies.
    Unsupervised Simplification of Legal Texts. (arXiv:2209.00557v1 [cs.CL])
    The processing of legal texts has been developing as an emerging field in natural language processing (NLP). Legal texts contain unique jargon and complex linguistic attributes in vocabulary, semantics, syntax, and morphology. Therefore, the development of text simplification (TS) methods specific to the legal domain is of paramount importance for facilitating comprehension of legal text by ordinary people and providing inputs to high-level models for mainstream legal NLP applications. While a recent study proposed a rule-based TS method for legal text, learning-based TS in the legal domain has not been considered previously. Here we introduce an unsupervised simplification method for legal texts (USLT). USLT performs domain-specific TS by replacing complex words and splitting long sentences. To this end, USLT detects complex words in a sentence, generates candidates via a masked-transformer model, and selects a candidate for substitution based on a rank score. Afterward, USLT recursively decomposes long sentences into a hierarchy of shorter core and context sentences while preserving semantic meaning. We demonstrate that USLT outperforms state-of-the-art domain-general TS methods in text simplicity while keeping the semantics intact.
    Incremental Online Learning Algorithms Comparison for Gesture and Visual Smart Sensors. (arXiv:2209.00591v1 [cs.LG])
    Tiny machine learning (TinyML) in IoT systems exploits MCUs as edge devices for data processing. However, traditional TinyML methods can only perform inference, limited to static environments or classes. Real case scenarios usually work in dynamic environments, thus drifting the context where the original neural model is no more suitable. For this reason, pre-trained models reduce accuracy and reliability during their lifetime because the data recorded slowly becomes obsolete or new patterns appear. Continual learning strategies maintain the model up to date, with runtime fine-tuning of the parameters. This paper compares four state-of-the-art algorithms in two real applications: i) gesture recognition based on accelerometer data and ii) image classification. Our results confirm these systems' reliability and the feasibility of deploying them in tiny-memory MCUs, with a drop in the accuracy of a few percentage points with respect to the original models for unconstrained computing platforms.
    Efficient Chemical Space Exploration Using Active Learning Based on Marginalized Graph Kernel: an Application for Predicting the Thermodynamic Properties of Alkanes with Molecular Simulation. (arXiv:2209.00514v1 [cs.LG])
    We introduce an explorative active learning (AL) algorithm based on Gaussian process regression and marginalized graph kernel (GPR-MGK) to explore chemical space with minimum cost. Using high-throughput molecular dynamics simulation to generate data and graph neural network (GNN) to predict, we constructed an active learning molecular simulation framework for thermodynamic property prediction. In specific, targeting 251,728 alkane molecules consisting of 4 to 19 carbon atoms and their liquid physical properties: densities, heat capacities, and vaporization enthalpies, we use the AL algorithm to select the most informative molecules to represent the chemical space. Validation of computational and experimental test sets shows that only 313 (0.124\% of the total) molecules were sufficient to train an accurate GNN model with $\rm R^2 > 0.99$ for computational test sets and $\rm R^2 > 0.94$ for experimental test sets. We highlight two advantages of the presented AL algorithm: compatibility with high-throughput data generation and reliable uncertainty quantification.
    Actor Prioritized Experience Replay. (arXiv:2209.00532v1 [cs.LG])
    A widely-studied deep reinforcement learning (RL) technique known as Prioritized Experience Replay (PER) allows agents to learn from transitions sampled with non-uniform probability proportional to their temporal-difference (TD) error. Although it has been shown that PER is one of the most crucial components for the overall performance of deep RL methods in discrete action domains, many empirical studies indicate that it considerably underperforms actor-critic algorithms in continuous control. We theoretically show that actor networks cannot be effectively trained with transitions that have large TD errors. As a result, the approximate policy gradient computed under the Q-network diverges from the actual gradient computed under the optimal Q-function. Motivated by this, we introduce a novel experience replay sampling framework for actor-critic methods, which also regards issues with stability and recent findings behind the poor empirical performance of PER. The introduced algorithm suggests a new branch of improvements to PER and schedules effective and efficient training for both actor and critic networks. An extensive set of experiments verifies our theoretical claims and demonstrates that the introduced method significantly outperforms the competing approaches and obtains state-of-the-art results over the standard off-policy actor-critic algorithms.
    On Grounded Planning for Embodied Tasks with Language Models. (arXiv:2209.00465v1 [cs.AI])
    Language models (LMs) are shown to have commonsense knowledge of the physical world, which is fundamental for completing tasks in everyday situations. However, it is still an open question whether LMs have the ability to generate grounded, executable plans for embodied tasks. It is very challenging because LMs do not have an "eye" or "hand" to perceive the realistic environment. In this work, we show the first study on this important research question. We first present a novel problem formulation named G-PlanET, which takes as input a high-level goal and a table of objects in a specific environment. The expected output is a plan consisting of step-by-step instructions for agents to execute. To enable the study of this problem, we establish an evaluation protocol and devise a dedicated metric for assessing the quality of plans. In our extensive experiments, we show that adding flattened tables for encoding environments and using an iterative decoding strategy can both improve the LMs' ability for grounded planning. Our analysis of the results also leads to interesting non-trivial findings.
    Holomorphic Equilibrium Propagation Computes Exact Gradients Through Finite Size Oscillations. (arXiv:2209.00530v1 [cs.LG])
    Equilibrium propagation (EP) is an alternative to backpropagation (BP) that allows the training of deep neural networks with local learning rules. It thus provides a compelling framework for training neuromorphic systems and understanding learning in neurobiology. However, EP requires infinitesimal teaching signals, thereby limiting its applicability in noisy physical systems. Moreover, the algorithm requires separate temporal phases and has not been applied to large-scale problems. Here we address these issues by extending EP to holomorphic networks. We show analytically that this extension naturally leads to exact gradients even for finite-amplitude teaching signals. Importantly, the gradient can be computed as the first Fourier coefficient from finite neuronal activity oscillations in continuous time without requiring separate phases. Further, we demonstrate in numerical simulations that our approach permits robust estimation of gradients in the presence of noise and that deeper models benefit from the finite teaching signals. Finally, we establish the first benchmark for EP on the ImageNet 32x32 dataset and show that it matches the performance of an equivalent network trained with BP. Our work provides analytical insights that enable scaling EP to large-scale problems and establishes a formal framework for how oscillations could support learning in biological and neuromorphic systems.
    MSGNN: A Spectral Graph Neural Network Based on a Novel Magnetic Signed Laplacian. (arXiv:2209.00546v1 [stat.ML])
    Signed directed networks are ubiquitous in real-world applications. However, there has been relatively little work proposing spectral graph neural network (GNN) methods for analyzing such networks. Here we introduce a signed directed Laplacian matrix, which we call the magnetic signed Laplacian, as a natural generalization of both the signed Laplacian on signed graphs and the magnetic Laplacian on directed graphs. We then use this matrix to construct a novel spectral GNN architecture and conduct extensive experiments on both node clustering and link prediction tasks. In these experiments, we consider tasks related to signed information, tasks related to directional information, and tasks related to both signed and directional information. We demonstrate that our proposed spectral GNN is effective for incorporating both signed and directional information, and attains leading performance on a wide range of data sets. Additionally, we provide a novel synthetic network model, which we refer to as the signed directed stochastic block model, and a number of novel real-world data sets based on lead-lag relationships in financial time series.
    Multi-Scale Contrastive Co-Training for Event Temporal Relation Extraction. (arXiv:2209.00568v1 [cs.CL])
    Extracting temporal relationships between pairs of events in texts is a crucial yet challenging problem for natural language understanding. Depending on the distance between the events, models must learn to differently balance information from local and global contexts surrounding the event pair for temporal relation prediction. Learning how to fuse this information has proved challenging for transformer-based language models. Therefore, we present MulCo: Multi-Scale Contrastive Co-Training, a technique for the better fusion of local and global contextualized features. Our model uses a BERT-based language model to encode local context and a Graph Neural Network (GNN) to represent global document-level syntactic and temporal characteristics. Unlike previous state-of-the-art methods, which use simple concatenation on multi-view features or select optimal sentences using sophisticated reinforcement learning approaches, our model co-trains GNN and BERT modules using a multi-scale contrastive learning objective. The GNN and BERT modules learn a synergistic parameterization by contrasting GNN multi-layer multi-hop subgraphs (i.e., global context embeddings) and BERT outputs (i.e., local context embeddings) through end-to-end back-propagation. We empirically demonstrate that MulCo provides improved ability to fuse local and global contexts encoded using BERT and GNN compared to the current state-of-the-art. Our experimental results show that MulCo achieves new state-of-the-art results on several temporal relation extraction datasets.
    Self-Supervised Pretraining of Graph Neural Network for the Retrieval of Related Mathematical Expressions in Scientific Articles. (arXiv:2209.00446v1 [cs.IR])
    Given the increase of publications, search for relevant papers becomes tedious. In particular, search across disciplines or schools of thinking is not supported. This is mainly due to the retrieval with keyword queries: technical terms differ in different sciences or at different times. Relevant articles might better be identified by their mathematical problem descriptions. Just looking at the equations in a paper already gives a hint to whether the paper is relevant. Hence, we propose a new approach for retrieval of mathematical expressions based on machine learning. We design an unsupervised representation learning task that combines embedding learning with self-supervised learning. Using graph convolutional neural networks we embed mathematical expression into low-dimensional vector spaces that allow efficient nearest neighbor queries. To train our models, we collect a huge dataset with over 29 million mathematical expressions from over 900,000 publications published on arXiv.org. The math is converted into an XML format, which we view as graph data. Our empirical evaluations involving a new dataset of manually annotated search queries show the benefits of using embedding models for mathematical retrieval. This work was originally published at KDD 2020.
    Models and Benchmarks for Representation Learning of Partially Observed Subgraphs. (arXiv:2209.00508v1 [cs.LG])
    Subgraphs are rich substructures in graphs, and their nodes and edges can be partially observed in real-world tasks. Under partial observation, existing node- or subgraph-level message-passing produces suboptimal representations. In this paper, we formulate a novel task of learning representations of partially observed subgraphs. To solve this problem, we propose Partial Subgraph InfoMax (PSI) framework and generalize existing InfoMax models, including DGI, InfoGraph, MVGRL, and GraphCL, into our framework. These models maximize the mutual information between the partial subgraph's summary and various substructures from nodes to full subgraphs. In addition, we suggest a novel two-stage model with $k$-hop PSI, which reconstructs the representation of the full subgraph and improves its expressiveness from different local-global structures. Under training and evaluation protocols designed for this problem, we conduct experiments on three real-world datasets and demonstrate that PSI models outperform baselines.
    Negation detection in Dutch clinical texts: an evaluation of rule-based and machine learning methods. (arXiv:2209.00470v1 [cs.CL])
    As structured data are often insufficient, labels need to be extracted from free text in electronic health records when developing models for clinical information retrieval and decision support systems. One of the most important contextual properties in clinical text is negation, which indicates the absence of findings. We aimed to improve large scale extraction of labels by comparing three methods for negation detection in Dutch clinical notes. We used the Erasmus Medical Center Dutch Clinical Corpus to compare a rule-based method based on ContextD, a biLSTM model using MedCAT and (finetuned) RoBERTa-based models. We found that both the biLSTM and RoBERTa models consistently outperform the rule-based model in terms of F1 score, precision and recall. In addition, we systematically categorized the classification errors for each model, which can be used to further improve model performance in particular applications. Combining the three models naively was not beneficial in terms of performance. We conclude that the biLSTM and RoBERTa-based models in particular are highly accurate accurate in detecting clinical negations, but that ultimately all three approaches can be viable depending on the use case at hand.
    Let Me Check the Examples: Enhancing Demonstration Learning via Explicit Imitation. (arXiv:2209.00455v1 [cs.LG])
    Demonstration learning aims to guide the prompt prediction via providing answered demonstrations in the few shot settings. Despite achieving promising results, existing work only concatenates the answered examples as demonstrations to the prompt template (including the raw context) without any additional operation, neglecting the prompt-demonstration dependencies. Besides, prior research found that randomly replacing the labels of demonstrations marginally hurts performance, illustrating that the model could not properly learn the knowledge brought by the demonstrations. Inspired by the human learning process, in this paper, we introduce Imitation DEMOnstration Learning (Imitation-Demo) to strengthen demonstration learning via explicitly imitating human review behaviour, which includes: (1) contrastive learning mechanism to concentrate on the similar demonstrations. (2) demonstration-label re-prediction method to consolidate known knowledge. Experiment results show that our proposed method achieves state-of-the-art performance on 11 out of 14 classification corpora. Further studies also prove that Imitation-Demo strengthen the association between prompt and demonstrations, which could provide the basis for exploring how demonstration learning works.
    Quantum-Classical Hybrid Information Processing via a Single Quantum System. (arXiv:2209.00497v1 [quant-ph])
    Current technologies in quantum-based communications bring a new integration of quantum data with classical data for hybrid processing. However, the frameworks of these technologies are restricted to a single classical or quantum task, which limits their flexibility in near-term applications. We propose a quantum reservoir processor to harness quantum dynamics in computational tasks requiring both classical and quantum inputs. This analog processor comprises a network of quantum dots in which quantum data is incident to the network and classical data is encoded via a coherent field exciting the network. We perform a multitasking application of quantum tomography and nonlinear equalization of classical channels. Interestingly, the tomography can be performed in a closed-loop manner via the feedback control of classical data. Therefore, if the classical input comes from a dynamical system, embedding this system in a closed loop enables hybrid processing even if access to the external classical input is interrupted. Finally, we demonstrate preparing quantum depolarizing channels as a novel quantum machine learning technique for quantum data processing.
    An Incremental Learning framework for Large-scale CTR Prediction. (arXiv:2209.00458v1 [cs.IR])
    In this work we introduce an incremental learning framework for Click-Through-Rate (CTR) prediction and demonstrate its effectiveness for Taboola's massive-scale recommendation service. Our approach enables rapid capture of emerging trends through warm-starting from previously deployed models and fine tuning on "fresh" data only. Past knowledge is maintained via a teacher-student paradigm, where the teacher acts as a distillation technique, mitigating the catastrophic forgetting phenomenon. Our incremental learning framework enables significantly faster training and deployment cycles (x12 speedup). We demonstrate a consistent Revenue Per Mille (RPM) lift over multiple traffic segments and a significant CTR increase on newly introduced items.
    Searching for Structure in Unfalsifiable Claims. (arXiv:2209.00495v1 [cs.CL])
    Social media platforms give rise to an abundance of posts and comments on every topic imaginable. Many of these posts express opinions on various aspects of society, but their unfalsifiable nature makes them ill-suited to fact-checking pipelines. In this work, we aim to distill such posts into a small set of narratives that capture the essential claims related to a given topic. Understanding and visualizing these narratives can facilitate more informed debates on social media. As a first step towards systematically identifying the underlying narratives on social media, we introduce PAPYER, a fine-grained dataset of online comments related to hygiene in public restrooms, which contains a multitude of unfalsifiable claims. We present a human-in-the-loop pipeline that uses a combination of machine and human kernels to discover the prevailing narratives and show that this pipeline outperforms recent large transformer models and state-of-the-art unsupervised topic models.
    Generative Personas That Behave and Experience Like Humans. (arXiv:2209.00459v1 [cs.AI])
    Using artificial intelligence (AI) to automatically test a game remains a critical challenge for the development of richer and more complex game worlds and for the advancement of AI at large. One of the most promising methods for achieving that long-standing goal is the use of generative AI agents, namely procedural personas, that attempt to imitate particular playing behaviors which are represented as rules, rewards, or human demonstrations. All research efforts for building those generative agents, however, have focused solely on playing behavior which is arguably a narrow perspective of what a player actually does in a game. Motivated by this gap in the existing state of the art, in this paper we extend the notion of behavioral procedural personas to cater for player experience, thus examining generative agents that can both behave and experience their game as humans would. For that purpose, we employ the Go-Explore reinforcement learning paradigm for training human-like procedural personas, and we test our method on behavior and experience demonstrations of more than 100 players of a racing game. Our findings suggest that the generated agents exhibit distinctive play styles and experience responses of the human personas they were designed to imitate. Importantly, it also appears that experience, which is tied to playing behavior, can be a highly informative driver for better behavioral exploration.
    Ensembling Neural Networks for Improved Prediction and Privacy in Early Diagnosis of Sepsis. (arXiv:2209.00439v1 [cs.LG])
    Ensembling neural networks is a long-standing technique for improving the generalization error of neural networks by combining networks with orthogonal properties via a committee decision. We show that this technique is an ideal fit for machine learning on medical data: First, ensembles are amenable to parallel and asynchronous learning, thus enabling efficient training of patient-specific component neural networks. Second, building on the idea of minimizing generalization error by selecting uncorrelated patient-specific networks, we show that one can build an ensemble of a few selected patient-specific models that outperforms a single model trained on much larger pooled datasets. Third, the non-iterative ensemble combination step is an optimal low-dimensional entry point to apply output perturbation to guarantee the privacy of the patient-specific networks. We exemplify our framework of differentially private ensembles on the task of early prediction of sepsis, using real-life intensive care unit data labeled by clinical experts.
    Interpreting Embedding Spaces by Conceptualization. (arXiv:2209.00445v1 [cs.CL])
    One of the main methods for semantic interpretation of text is mapping it into a vector in some embedding space. Such vectors can then be used for a variety of text processing tasks. Recently, most embedding spaces are a product of training large language models. One major drawback of this type of representation is its incomprehensibility to humans. Understanding the embedding space is crucial for several important needs, including the need to explain the decision of a system that uses the embedding, the need to debug the embedding method and compare it to alternatives, and the need to detect biases hidden in the model. In this paper, we present a novel method of transforming any embedding space into a comprehensible conceptual space. We first present an algorithm for deriving a conceptual space with dynamic on-demand granularity. We then show a method for transferring any vector in the original incomprehensible space to an understandable vector in the conceptual space. We combine human tests with cross-model tests to show that the conceptualized vectors indeed represent the semantics of the original vectors. We also show how the conceptualized vectors can be used for various tasks including identifying weaknesses in the semantics underlying the original spaces and differences in the semantics of alternative models.
    Fair learning with Wasserstein barycenters for non-decomposable performance measures. (arXiv:2209.00427v1 [stat.ML])
    This work provides several fundamental characterizations of the optimal classification function under the demographic parity constraint. In the awareness framework, akin to the classical unconstrained classification case, we show that maximizing accuracy under this fairness constraint is equivalent to solving a corresponding regression problem followed by thresholding at level $1/2$. We extend this result to linear-fractional classification measures (e.g., ${\rm F}$-score, AM measure, balanced accuracy, etc.), highlighting the fundamental role played by the regression problem in this framework. Our results leverage recently developed connection between the demographic parity constraint and the multi-marginal optimal transport formulation. Informally, our result shows that the transition between the unconstrained problems and the fair one is achieved by replacing the conditional expectation of the label by the solution of the fair regression problem. Finally, leveraging our analysis, we demonstrate an equivalence between the awareness and the unawareness setups in the case of two sensitive groups.
    Efficient ML Models for Practical Secure Inference. (arXiv:2209.00411v1 [cs.CR])
    ML-as-a-service continues to grow, and so does the need for very strong privacy guarantees. Secure inference has emerged as a potential solution, wherein cryptographic primitives allow inference without revealing users' inputs to a model provider or model's weights to a user. For instance, the model provider could be a diagnostics company that has trained a state-of-the-art DenseNet-121 model for interpreting a chest X-ray and the user could be a patient at a hospital. While secure inference is in principle feasible for this setting, there are no existing techniques that make it practical at scale. The CrypTFlow2 framework provides a potential solution with its ability to automatically and correctly translate clear-text inference to secure inference for arbitrary models. However, the resultant secure inference from CrypTFlow2 is impractically expensive: Almost 3TB of communication is required to interpret a single X-ray on DenseNet-121. In this paper, we address this outstanding challenge of inefficiency of secure inference with three contributions. First, we show that the primary bottlenecks in secure inference are large linear layers which can be optimized with the choice of network backbone and the use of operators developed for efficient clear-text inference. This finding and emphasis deviates from many recent works which focus on optimizing non-linear activation layers when performing secure inference of smaller networks. Second, based on analysis of a bottle-necked convolution layer, we design a X-operator which is a more efficient drop-in replacement. Third, we show that the fast Winograd convolution algorithm further improves efficiency of secure inference. In combination, these three optimizations prove to be highly effective for the problem of X-ray interpretation trained on the CheXpert dataset.
    Dynamics-Adaptive Continual Reinforcement Learning via Progressive Contextualization. (arXiv:2209.00347v1 [cs.LG])
    A key challenge of continual reinforcement learning (CRL) in dynamic environments is to promptly adapt the RL agent's behavior as the environment changes over its lifetime, while minimizing the catastrophic forgetting of the learned information. To address this challenge, in this article, we propose DaCoRL, i.e., dynamics-adaptive continual RL. DaCoRL learns a context-conditioned policy using progressive contextualization, which incrementally clusters a stream of stationary tasks in the dynamic environment into a series of contexts and opts for an expandable multihead neural network to approximate the policy. Specifically, we define a set of tasks with similar dynamics as an environmental context and formalize context inference as a procedure of online Bayesian infinite Gaussian mixture clustering on environment features, resorting to online Bayesian inference to infer the posterior distribution over contexts. Under the assumption of a Chinese restaurant process prior, this technique can accurately classify the current task as a previously seen context or instantiate a new context as needed without relying on any external indicator to signal environmental changes in advance. Furthermore, we employ an expandable multihead neural network whose output layer is synchronously expanded with the newly instantiated context, and a knowledge distillation regularization term for retaining the performance on learned tasks. As a general framework that can be coupled with various deep RL algorithms, DaCoRL features consistent superiority over existing methods in terms of the stability, overall performance and generalization ability, as verified by extensive experiments on several robot navigation and MuJoCo locomotion tasks.
    Find the Funding: Entity Linking with Incomplete Funding Knowledge Bases. (arXiv:2209.00351v1 [cs.CL])
    Automatic extraction of funding information from academic articles adds significant value to industry and research communities, such as tracking research outcomes by funding organizations, profiling researchers and universities based on the received funding, and supporting open access policies. Two major challenges of identifying and linking funding entities are: (i) sparse graph structure of the Knowledge Base (KB), which makes the commonly used graph-based entity linking approaches suboptimal for the funding domain, (ii) missing entities in KB, which (unlike recent zero-shot approaches) requires marking entity mentions without KB entries as NIL. We propose an entity linking model that can perform NIL prediction and overcome data scarcity issues in a time and data-efficient manner. Our model builds on a transformer-based mention detection and bi-encoder model to perform entity linking. We show that our model outperforms strong existing baselines.
    Review of the AMLAS Methodology for Application in Healthcare. (arXiv:2209.00421v1 [cs.LG])
    In recent years, the number of machine learning (ML) technologies gaining regulatory approval for healthcare has increased significantly allowing them to be placed on the market. However, the regulatory frameworks applied to them were originally devised for traditional software, which has largely rule-based behaviour, compared to the data-driven and learnt behaviour of ML. As the frameworks are in the process of reformation, there is a need to proactively assure the safety of ML to prevent patient safety being compromised. The Assurance of Machine Learning for use in Autonomous Systems (AMLAS) methodology was developed by the Assuring Autonomy International Programme based on well-established concepts in system safety. This review has appraised the methodology by consulting ML manufacturers to understand if it converges or diverges from their current safety assurance practices, whether there are gaps and limitations in its structure and if it is fit for purpose when applied to the healthcare domain. Through this work we offer the view that there is clear utility for AMLAS as a safety assurance methodology when applied to healthcare machine learning technologies, although development of healthcare specific supplementary guidance would benefit those implementing the methodology.
    Self-Supervised Pretraining for 2D Medical Image Segmentation. (arXiv:2209.00314v1 [cs.CV])
    Supervised machine learning provides state-of-the-art solutions to a wide range of computer vision problems. However, the need for copious labelled training data limits the capabilities of these algorithms in scenarios where such input is scarce or expensive. Self-supervised learning offers a way to lower the need for manually annotated data by pretraining models for a specific domain on unlabelled data. In this approach, labelled data are solely required to fine-tune models for downstream tasks. Medical image segmentation is a field where labelling data requires expert knowledge and collecting large labelled datasets is challenging; therefore, self-supervised learning algorithms promise substantial improvements in this field. Despite this, self-supervised learning algorithms are used rarely to pretrain medical image segmentation networks. In this paper, we elaborate and analyse the effectiveness of supervised and self-supervised pretraining approaches on downstream medical image segmentation, focusing on convergence and data efficiency. We find that self-supervised pretraining on natural images and target-domain-specific images leads to the fastest and most stable downstream convergence. In our experiments on the ACDC cardiac segmentation dataset, this pretraining approach achieves 4-5 times faster fine-tuning convergence compared to an ImageNet pretrained model. We also show that this approach requires less than five epochs of pretraining on domain-specific data to achieve such improvement in the downstream convergence time. Finally, we find that, in low-data scenarios, supervised ImageNet pretraining achieves the best accuracy, requiring less than 100 annotated samples to realise close to minimal error.
    B\'ezier Gaussian Processes for Tall and Wide Data. (arXiv:2209.00343v1 [stat.ML])
    Modern approximations to Gaussian processes are suitable for "tall data", with a cost that scales well in the number of observations, but under-performs on ``wide data'', scaling poorly in the number of input features. That is, as the number of input features grows, good predictive performance requires the number of summarising variables, and their associated cost, to grow rapidly. We introduce a kernel that allows the number of summarising variables to grow exponentially with the number of input features, but requires only linear cost in both number of observations and input features. This scaling is achieved through our introduction of the B\'ezier buttress, which allows approximate inference without computing matrix inverses or determinants. We show that our kernel has close similarities to some of the most used kernels in Gaussian process regression, and empirically demonstrate the kernel's ability to scale to both tall and wide datasets.
    Unsupervised EHR-based Phenotyping via Matrix and Tensor Decompositions. (arXiv:2209.00322v1 [cs.LG])
    Computational phenotyping allows for unsupervised discovery of subgroups of patients as well as corresponding co-occurring medical conditions from electronic health records (EHR). Typically, EHR data contains demographic information, diagnoses and laboratory results. Discovering (novel) phenotypes has the potential to be of prognostic and therapeutic value. Providing medical practitioners with transparent and interpretable results is an important requirement and an essential part for advancing precision medicine. Low-rank data approximation methods such as matrix (e.g., non-negative matrix factorization) and tensor decompositions (e.g., CANDECOMP/PARAFAC) have demonstrated that they can provide such transparent and interpretable insights. Recent developments have adapted low-rank data approximation methods by incorporating different constraints and regularizations that facilitate interpretability further. In addition, they offer solutions for common challenges within EHR data such as high dimensionality, data sparsity and incompleteness. Especially extracting temporal phenotypes from longitudinal EHR has received much attention in recent years. In this paper, we provide a comprehensive review of low-rank approximation-based approaches for computational phenotyping. The existing literature is categorized into temporal vs. static phenotyping approaches based on matrix vs. tensor decompositions. Furthermore, we outline different approaches for the validation of phenotypes, i.e., the assessment of clinical significance.
    Versatile Single-Loop Method for Gradient Estimator: First and Second Order Optimality, and its Application to Federated Learning. (arXiv:2209.00361v1 [cs.LG])
    While variance reduction methods have shown great success in solving large scale optimization problems, many of them suffer from accumulated errors and, therefore, should periodically require the full gradient computation. In this paper, we present a single-loop algorithm named SLEDGE (Single-Loop mEthoD for Gradient Estimator) for finite-sum nonconvex optimization, which does not require periodic refresh of the gradient estimator but achieves nearly optimal gradient complexity. Unlike existing methods, SLEDGE has the advantage of versatility; (i) second-order optimality, (ii) exponential convergence in the PL region, and (iii) smaller complexity under less heterogeneity of data. We build an efficient federated learning algorithm by exploiting these favorable properties. We show the first and second-order optimality of the output and also provide analysis under PL conditions. When the local budget is sufficiently large and clients are less (Hessian-)~heterogeneous, the algorithm requires fewer communication rounds then existing methods such as FedAvg, SCAFFOLD, and Mime. The superiority of our method is verified in numerical experiments.
    Trading Off Privacy, Utility and Efficiency in Federated Learning. (arXiv:2209.00230v1 [cs.LG])
    Federated learning (FL) enables participating parties to collaboratively build a global model with boosted utility without disclosing private data information. Appropriate protection mechanisms have to be adopted to fulfill the opposing requirements in preserving \textit{privacy} and maintaining high model \textit{utility}. In addition, it is a mandate for a federated learning system to achieve high \textit{efficiency} in order to enable large-scale model training and deployment. We propose a unified federated learning framework that reconciles horizontal and vertical federated learning. Based on this framework, we formulate and quantify the trade-offs between privacy leakage, utility loss, and efficiency reduction, which leads us to the No-Free-Lunch (NFL) theorem for the federated learning system. NFL indicates that it is unrealistic to expect an FL algorithm to simultaneously provide excellent privacy, utility, and efficiency in certain scenarios. We then analyze the lower bounds for the privacy leakage, utility loss and efficiency reduction for several widely-adopted protection mechanisms including \textit{Randomization}, \textit{Homomorphic Encryption}, \textit{Secret Sharing} and \textit{Compression}. Our analysis could serve as a guide for selecting protection parameters to meet particular requirements.
    Large-Scale Auto-Regressive Modeling Of Street Networks. (arXiv:2209.00281v1 [cs.LG])
    We present a novel generative method for the creation of city-scale road layouts. While the output of recent methods is limited in both size of the covered area and diversity, our framework produces large traversable graphs of high quality consisting of vertices and edges representing complete street networks covering 400 square kilometers or more. While our framework can process general 2D embedded graphs, we focus on street networks due to the wide availability of training data. Our generative framework consists of a transformer decoder that is used in a sliding window manner to predict a field of indices, with each index encoding a representation of the local neighborhood. The semantics of each index is determined by a dictionary of context vectors. The index field is then input to a decoder to compute the street graph. Using data from OpenStreetMap, we train our system on whole cities and even across large countries such as the US, and finally compare it to the state of the art.
    Attention Enhanced Citrinet for Speech Recognition. (arXiv:2209.00261v1 [cs.CL])
    Citrinet is an end-to-end convolutional Connectionist Temporal Classification (CTC) based automatic speech recognition (ASR) model. To capture local and global contextual information, 1D time-channel separable convolutions combined with sub-word encoding and squeeze-and-excitation (SE) are used in Citrinet, making the whole architecture to be as deep as including 23 blocks with 235 convolution layers and 46 linear layers. This pure convolutional and deep architecture makes Critrinet relatively slow at convergence. In this paper, we propose to introduce multi-head attentions together with feed-forward networks in the convolution module in Citrinet blocks while keeping the SE module and residual module unchanged. For speeding up, we remove 8 convolution layers in each attention-enhanced Citrinet block and reduce 23 blocks to 13. Experiments on the Japanese CSJ-500h and Magic-1600h dataset show that the attention-enhanced Citrinet with less layers and blocks and converges faster with lower character error rates than (1) Citrinet with 80\% training time and (2) Conformer with 40\% training time and 29.8\% model size.
    Generating Coherent Drum Accompaniment With Fills And Improvisations. (arXiv:2209.00291v1 [cs.SD])
    Creating a complex work of art like music necessitates profound creativity. With recent advancements in deep learning and powerful models such as transformers, there has been huge progress in automatic music generation. In an accompaniment generation context, creating a coherent drum pattern with apposite fills and improvisations at proper locations in a song is a challenging task even for an experienced drummer. Drum beats tend to follow a repetitive pattern through stanzas with fills or improvisation at section boundaries. In this work, we tackle the task of drum pattern generation conditioned on the accompanying music played by four melodic instruments: Piano, Guitar, Bass, and Strings. We use the transformer sequence to sequence model to generate a basic drum pattern conditioned on the melodic accompaniment to find that improvisation is largely absent, attributed possibly to its expectedly relatively low representation in the training data. We propose a novelty function to capture the extent of improvisation in a bar relative to its neighbors. We train a model to predict improvisation locations from the melodic accompaniment tracks. Finally, we use a novel BERT-inspired in-filling architecture, to learn the structure of both the drums and melody to in-fill elements of improvised music.
    Deep Sparse Conformer for Speech Recognition. (arXiv:2209.00260v1 [cs.CL])
    Conformer has achieved impressive results in Automatic Speech Recognition (ASR) by leveraging transformer's capturing of content-based global interactions and convolutional neural network's exploiting of local features. In Conformer, two macaron-like feed-forward layers with half-step residual connections sandwich the multi-head self-attention and convolution modules followed by a post layer normalization. We improve Conformer's long-sequence representation ability in two directions, \emph{sparser} and \emph{deeper}. We adapt a sparse self-attention mechanism with $\mathcal{O}(L\text{log}L)$ in time complexity and memory usage. A deep normalization strategy is utilized when performing residual connections to ensure our training of hundred-level Conformer blocks. On the Japanese CSJ-500h dataset, this deep sparse Conformer achieves respectively CERs of 5.52\%, 4.03\% and 4.50\% on the three evaluation sets and 4.16\%, 2.84\% and 3.20\% when ensembling five deep sparse Conformer variants from 12 to 16, 17, 50, and finally 100 encoder layers.
    Progressive Fusion for Multimodal Integration. (arXiv:2209.00302v1 [cs.LG])
    Integration of multimodal information from various sources has been shown to boost the performance of machine learning models and thus has received increased attention in recent years. Often such models use deep modality-specific networks to obtain unimodal features which are combined to obtain "late-fusion" representations. However, these designs run the risk of information loss in the respective unimodal pipelines. On the other hand, "early-fusion" methodologies, which combine features early, suffer from the problems associated with feature heterogeneity and high sample complexity. In this work, we present an iterative representation refinement approach, called Progressive Fusion, which mitigates the issues with late fusion representations. Our model-agnostic technique introduces backward connections that make late stage fused representations available to early layers, improving the expressiveness of the representations at those stages, while retaining the advantages of late fusion designs. We test Progressive Fusion on tasks including affective sentiment detection, multimedia analysis, and time series fusion with different models, demonstrating its versatility. We show that our approach consistently improves performance, for instance attaining a 5% reduction in MSE and 40% improvement in robustness on multimodal time series prediction.
    The Geometry and Calculus of Losses. (arXiv:2209.00238v1 [cs.LG])
    Statistical decision problems are the foundation of statistical machine learning. The simplest problems are binary and multiclass classification and class probability estimation. Central to their definition is the choice of loss function, which is the means by which the quality of a solution is evaluated. In this paper we systematically develop the theory of loss functions for such problems from a novel perspective whose basic ingredients are convex sets with a particular structure. The loss function is defined as the subgradient of the support function of the convex set. It is consequently automatically proper (calibrated for probability estimation). This perspective provides three novel opportunities. It enables the development of a fundamental relationship between losses and (anti)-norms that appears to have not been noticed before. Second, it enables the development of a calculus of losses induced by the calculus of convex sets which allows the interpolation between different losses, and thus is a potential useful design tool for tailoring losses to particular problems. In doing this we build upon, and considerably extend, existing results on M-sums of convex sets. Third, the perspective leads to a natural theory of `polar' (or `inverse') loss functions, which are derived from the polar dual of the convex set defining the loss, and which form a natural universal substitution function for Vovk's aggregating algorithm.
    A Transferable Multi-stage Model with Cycling Discrepancy Learning for Lithium-ion Battery State of Health Estimation. (arXiv:2209.00190v1 [cs.LG])
    As a significant ingredient regarding health status, data-driven state-of-health (SOH) estimation has become dominant for lithium-ion batteries (LiBs). To handle data discrepancy across batteries, current SOH estimation models engage in transfer learning (TL), which reserves apriori knowledge gained through reusing partial structures of the offline trained model. However, multiple degradation patterns of a complete life cycle of a battery make it challenging to pursue TL. The concept of the stage is introduced to describe the collection of continuous cycles that present a similar degradation pattern. A transferable multi-stage SOH estimation model is proposed to perform TL across batteries in the same stage, consisting of four steps. First, with identified stage information, raw cycling data from the source battery are reconstructed into the phase space with high dimensions, exploring hidden dynamics with limited sensors. Next, domain invariant representation across cycles in each stage is proposed through cycling discrepancy subspace with reconstructed data. Third, considering the unbalanced discharge cycles among different stages, a switching estimation strategy composed of a lightweight model with the long short-term memory network and a powerful model with the proposed temporal capsule network is proposed to boost estimation accuracy. Lastly, an updating scheme compensates for estimation errors when the cycling consistency of target batteries drifts. The proposed method outperforms its competitive algorithms in various transfer tasks for a run-to-failure benchmark with three batteries.
    Online Data Selection for Federated Learning with Limited Storage. (arXiv:2209.00195v1 [cs.LG])
    Machine learning models have been deployed in mobile networks to deal with the data from different layers to enable automated network management and intelligence on devices. To overcome high communication cost and severe privacy concerns of centralized machine learning, Federated Learning (FL) has been proposed to achieve distributed machine learning among networked devices. While the computation and communication limitation has been widely studied in FL, the impact of on-device storage on the performance of FL is still not explored. Without an efficient and effective data selection policy to filter the abundant streaming data on devices, classical FL can suffer from much longer model training time (more than $4\times$) and significant inference accuracy reduction (more than $7\%$), observed in our experiments. In this work, we take the first step to consider the online data selection for FL with limited on-device storage. We first define a new data valuation metric for data selection in FL: the projection of local gradient over an on-device data sample onto the global gradient over the data from all devices. We further design \textbf{ODE}, a framework of \textbf{O}nline \textbf{D}ata s\textbf{E}lection for FL, to coordinate networked devices to store valuable data samples collaboratively, with theoretical guarantees for speeding up model convergence and enhancing final model accuracy, simultaneously. Experimental results on one industrial task (mobile network traffic classification) and three public tasks (synthetic task, image classification, human activity recognition) show the remarkable advantages of ODE over the state-of-the-art approaches. Particularly, on the industrial dataset, ODE achieves as high as $2.5\times$ speedup of training time and $6\%$ increase in final inference accuracy, and is robust to various factors in the practical environment.
    A Genetic Algorithm-based Framework for Learning Statistical Power Manifold. (arXiv:2209.00215v1 [stat.CO])
    Statistical power is a measure of the goodness/strength of a hypothesis test. Formally, it is the probability of detecting an effect, if there is a true effect present to detect. Hence, optimizing statistical power as a function of some parameters of a hypothesis test is desirable. However, for most hypothesis tests, the explicit functional form of statistical power as a function of those parameters is unknown but calculating statistical power for a given set of values of those parameters is possible using simulated experiments. These simulated experiments are usually computationally expensive. Hence, developing the entire statistical power manifold using simulations can be very time-consuming. Motivated by this, we propose a novel genetic algorithm-based framework for learning statistical power manifold. For a multiple linear regression $F$-test, we show that the proposed algorithm/framework learns the statistical power manifold much faster as compared to a brute-force approach as the number of queries to the power oracle is significantly reduced. We also show that the quality of learning the manifold improves as the number of iterations increases for the genetic algorithm.
    STDEN: Towards Physics-Guided Neural Networks for Traffic Flow Prediction. (arXiv:2209.00225v1 [cs.LG])
    High-performance traffic flow prediction model designing, a core technology of Intelligent Transportation System, is a long-standing but still challenging task for industrial and academic communities. The lack of integration between physical principles and data-driven models is an important reason for limiting the development of this field. In the literature, physics-based methods can usually provide a clear interpretation of the dynamic process of traffic flow systems but are with limited accuracy, while data-driven methods, especially deep learning with black-box structures, can achieve improved performance but can not be fully trusted due to lack of a reasonable physical basis. To bridge the gap between purely data-driven and physics-driven approaches, we propose a physics-guided deep learning model named Spatio-Temporal Differential Equation Network (STDEN), which casts the physical mechanism of traffic flow dynamics into a deep neural network framework. Specifically, we assume the traffic flow on road networks is driven by a latent potential energy field (like water flows are driven by the gravity field), and model the spatio-temporal dynamic process of the potential energy field as a differential equation network. STDEN absorbs both the performance advantage of data-driven models and the interpretability of physics-based models, so is named a physics-guided prediction model. Experiments on three real-world traffic datasets in Beijing show that our model outperforms state-of-the-art baselines by a significant margin. A case study further verifies that STDEN can capture the mechanism of urban traffic and generate accurate predictions with physical meaning. The proposed framework of differential equation network modeling may also cast light on other similar applications.
    Variational Sparse Coding with Learned Thresholding. (arXiv:2205.03665v2 [cs.LG] UPDATED)
    Sparse coding strategies have been lauded for their parsimonious representations of data that leverage low dimensional structure. However, inference of these codes typically relies on an optimization procedure with poor computational scaling in high-dimensional problems. For example, sparse inference in the representations learned in the high-dimensional intermediary layers of deep neural networks (DNNs) requires an iterative minimization to be performed at each training step. As such, recent, quick methods in variational inference have been proposed to infer sparse codes by learning a distribution over the codes with a DNN. In this work, we propose a new approach to variational sparse coding that allows us to learn sparse distributions by thresholding samples, avoiding the use of problematic relaxations. We first evaluate and analyze our method by training a linear generator, showing that it has superior performance, statistical efficiency, and gradient estimation compared to other sparse distributions. We then compare to a standard variational autoencoder using a DNN generator on the Fashion MNIST and CelebA datasets
    The Infinitesimal Jackknife and Combinations of Models. (arXiv:2209.00147v1 [stat.ML])
    The Infinitesimal Jackknife is a general method for estimating variances of parametric models, and more recently also for some ensemble methods. In this paper we extend the Infinitesimal Jackknife to estimate the covariance between any two models. This can be used to quantify uncertainty for combinations of models, or to construct test statistics for comparing different models or ensembles of models fitted using the same training dataset. Specific examples in this paper use boosted combinations of models like random forests and M-estimators. We also investigate its application on neural networks and ensembles of XGBoost models. We illustrate the efficacy of variance estimates through extensive simulations and its application to the Beijing Housing data, and demonstrate the theoretical consistency of the Infinitesimal Jackknife covariance estimate.
    SketchBetween: Video-to-Video Synthesis for Sprite Animation via Sketches. (arXiv:2209.00185v1 [cs.GR])
    2D animation is a common factor in game development, used for characters, effects and background art. It involves work that takes both skill and time, but parts of which are repetitive and tedious. Automated animation approaches exist, but are designed without animators in mind. The focus is heavily on real-life video, which follows strict laws of how objects move, and does not account for the stylistic movement often present in 2D animation. We propose a problem formulation that more closely adheres to the standard workflow of animation. We also demonstrate a model, SketchBetween, which learns to map between keyframes and sketched in-betweens to rendered sprite animations. We demonstrate that our problem formulation provides the required information for the task and that our model outperforms an existing method.
    Federated Learning with Label Distribution Skew via Logits Calibration. (arXiv:2209.00189v1 [cs.LG])
    Traditional federated optimization methods perform poorly with heterogeneous data (ie, accuracy reduction), especially for highly skewed data. In this paper, we investigate the label distribution skew in FL, where the distribution of labels varies across clients. First, we investigate the label distribution skew from a statistical view. We demonstrate both theoretically and empirically that previous methods based on softmax cross-entropy are not suitable, which can result in local models heavily overfitting to minority classes and missing classes. Additionally, we theoretically introduce a deviation bound to measure the deviation of the gradient after local update. At last, we propose FedLC (\textbf {Fed} erated learning via\textbf {L} ogits\textbf {C} alibration), which calibrates the logits before softmax cross-entropy according to the probability of occurrence of each class. FedLC applies a fine-grained calibrated cross-entropy loss to local update by adding a pairwise label margin. Extensive experiments on federated datasets and real-world datasets demonstrate that FedLC leads to a more accurate global model and much improved performance. Furthermore, integrating other FL methods into our approach can further enhance the performance of the global model.
    Feynman on Artificial Intelligence and Machine Learning, with Updates. (arXiv:2209.00083v1 [cs.AI])
    I present my recollections of Richard Feynman's mid-1980s interest in artificial intelligence and neural networks, set in the technical context of the physics-related approaches to neural networks of that time. I attempt to evaluate his ideas in the light of the substantial advances in the field since then, and vice versa. There are aspects of Feynman's interests that I think have been largely achieved and others that remain excitingly open, notably in computational science, and potentially including the revival of symbolic methods therein.
    Continuous-time Particle Filtering for Latent Stochastic Differential Equations. (arXiv:2209.00173v1 [cs.LG])
    Particle filtering is a standard Monte-Carlo approach for a wide range of sequential inference tasks. The key component of a particle filter is a set of particles with importance weights that serve as a proxy of the true posterior distribution of some stochastic process. In this work, we propose continuous latent particle filters, an approach that extends particle filtering to the continuous-time domain. We demonstrate how continuous latent particle filters can be used as a generic plug-in replacement for inference techniques relying on a learned variational posterior. Our experiments with different model families based on latent neural stochastic differential equations demonstrate superior performance of continuous-time particle filtering in inference tasks like likelihood estimation and sequential prediction for a variety of stochastic processes.
    Partial Counterfactual Identification for Infinite Horizon Partially Observable Markov Decision Process. (arXiv:2209.00137v1 [cs.LG])
    This paper investigates the problem of bounding possible output from a counterfactual query given a set of observational data. While various works of literature have described methodologies to generate efficient algorithms that provide an optimal bound for the counterfactual query, all of them assume a finite-horizon causal diagram. This paper aims to extend the previous work by modifying Q-learning algorithm to provide informative bounds of a causal query given an infinite-horizon causal diagram. Through simulations, our algorithms are proven to perform better compared to existing algorithm.
    Supervised Contrastive Learning with Hard Negative Samples. (arXiv:2209.00078v1 [cs.LG])
    Unsupervised contrastive learning (UCL) is a self-supervised learning technique that aims to learn a useful representation function by pulling positive samples close to each other while pushing negative samples far apart in the embedding space. To improve the performance of UCL, several works introduced hard-negative unsupervised contrastive learning (H-UCL) that aims to select the "hard" negative samples in contrast to a random sampling strategy used in UCL. In another approach, under the assumption that the label information is available, supervised contrastive learning (SCL) has developed recently by extending the UCL to a fully-supervised setting. In this paper, motivated by the effectiveness of hard-negative sampling strategies in H-UCL and the usefulness of label information in SCL, we propose a contrastive learning framework called hard-negative supervised contrastive learning (H-SCL). Our numerical results demonstrate the effectiveness of H-SCL over both SCL and H-UCL on several image datasets. In addition, we theoretically prove that, under certain conditions, the objective function of H-SCL can be bounded by the objective function of H-UCL but not by the objective function of UCL. Thus, minimizing the H-UCL loss can act as a proxy to minimize the H-SCL loss while minimizing UCL loss cannot. As we numerically showed that H-SCL outperforms other contrastive learning methods, our theoretical result (bounding H-SCL loss by H-UCL loss) helps to explain why H-UCL outperforms UCL in practice.
    A DeepParticle method for learning and generating aggregation patterns in multi-dimensional Keller-Segel chemotaxis systems. (arXiv:2209.00109v1 [physics.comp-ph])
    We study a regularized interacting particle method for computing aggregation patterns and near singular solutions of a Keller-Segal (KS) chemotaxis system in two and three space dimensions, then further develop DeepParticle (DP) method to learn and generate solutions under variations of physical parameters. The KS solutions are approximated as empirical measures of particles which self-adapt to the high gradient part of solutions. We utilize the expressiveness of deep neural networks (DNNs) to represent the transform of samples from a given initial (source) distribution to a target distribution at finite time T prior to blowup without assuming invertibility of the transforms. In the training stage, we update the network weights by minimizing a discrete 2-Wasserstein distance between the input and target empirical measures. To reduce computational cost, we develop an iterative divide-and-conquer algorithm to find the optimal transition matrix in the Wasserstein distance. We present numerical results of DP framework for successful learning and generation of KS dynamics in the presence of laminar and chaotic flows. The physical parameter in this work is either the small diffusivity of chemo-attractant or the reciprocal of the flow amplitude in the advection-dominated regime.
    An evaluation framework for comparing causal inference models. (arXiv:2209.00115v1 [stat.ML])
    Estimation of causal effects is the core objective of many scientific disciplines. However, it remains a challenging task, especially when the effects are estimated from observational data. Recently, several promising machine learning models have been proposed for causal effect estimation. The evaluation of these models has been based on the mean values of the error of the Average Treatment Effect (ATE) as well as of the Precision in Estimation of Heterogeneous Effect (PEHE). In this paper, we propose to complement the evaluation of causal inference models using concrete statistical evidence, including the performance profiles of Dolan and Mor{\'e}, as well as non-parametric and post-hoc statistical tests. The main motivation behind this approach is the elimination of the influence of a small number of instances or simulation on the benchmarking process, which in some cases dominate the results. We use the proposed evaluation methodology to compare several state-of-the-art causal effect estimation models.
    The Impact of Batch Learning in Stochastic Linear Bandits. (arXiv:2202.06657v2 [cs.LG] UPDATED)
    We consider a special case of bandit problems, named batched bandits, in which an agent observes batches of responses over a certain time period. Unlike previous work, we consider a more practically relevant batch-centric scenario of batch learning. That is to say, we provide a policy-agnostic regret analysis and demonstrate upper and lower bounds for the regret of a candidate policy. Our main theoretical results show that the impact of batch learning is a multiplicative factor of batch size relative to the regret of online behavior. Primarily, we study two settings of the stochastic linear bandits: bandits with finitely and infinitely many arms. While the regret bounds are the same for both settings, the former setting results hold under milder assumptions. Also, we provide a more robust result for the 2-armed bandit problem as an important insight. Finally, we demonstrate the consistency of theoretical results by conducting empirical experiments and reflect on optimal batch size choice.
    Incorporating Task-specific Concept Knowledge into Script Learning. (arXiv:2209.00068v1 [cs.CL])
    In this paper, we present Tetris, a new task of Goal-Oriented Script Completion. Unlike previous work, it considers a more realistic and more general setting, where the input includes not only the goal but also additional user context, including preferences and history. To address the problem using a knowledge-based approach, we introduce Task Concept Graph, an automatically constructed knowledge base from instructional websites. Different from Commonsense Knowledge Base like ConceptNet, the task concept graph schema introduces various types of noun phrases-based nodes specifically for accomplishing a task. To integrate such graphs into script learning, we devise two methods that acquire concepts from the knowledge base as prompts to downstream script completion. On our WikiHow-based dataset, we find that incorporating concepts from the Task Concept Graph consistently improves performance, demonstrating the benefit of Task Concept Graph for this task. Furthermore, the model with gold-standard concepts as prompt outperforms the baseline significantly, further confirming the need for task-specific knowledge in goal-oriented script completion. The dataset, repository, models, and demo will be publicly available to facilitate further research on this new task.
    Evaluating generative audio systems and their metrics. (arXiv:2209.00130v1 [cs.SD])
    Recent years have seen considerable advances in audio synthesis with deep generative models. However, the state-of-the-art is very difficult to quantify; different studies often use different evaluation methodologies and different metrics when reporting results, making a direct comparison to other systems difficult if not impossible. Furthermore, the perceptual relevance and meaning of the reported metrics in most cases unknown, prohibiting any conclusive insights with respect to practical usability and audio quality. This paper presents a study that investigates state-of-the-art approaches side-by-side with (i) a set of previously proposed objective metrics for audio reconstruction, and with (ii) a listening study. The results indicate that currently used objective metrics are insufficient to describe the perceptual quality of current systems.
    Be Your Own Neighborhood: Detecting Adversarial Example by the Neighborhood Relations Built on Self-Supervised Learning. (arXiv:2209.00005v1 [cs.LG])
    Deep Neural Networks (DNNs) have achieved excellent performance in various fields. However, DNNs' vulnerability to Adversarial Examples (AE) hinders their deployments to safety-critical applications. This paper presents a novel AE detection framework, named BEYOND, for trustworthy predictions. BEYOND performs the detection by distinguishing the AE's abnormal relation with its augmented versions, i.e. neighbors, from two prospects: representation similarity and label consistency. An off-the-shelf Self-Supervised Learning (SSL) model is used to extract the representation and predict the label for its highly informative representation capacity compared to supervised learning models. For clean samples, their representations and predictions are closely consistent with their neighbors, whereas those of AEs differ greatly. Furthermore, we explain this observation and show that by leveraging this discrepancy BEYOND can effectively detect AEs. We develop a rigorous justification for the effectiveness of BEYOND. Furthermore, as a plug-and-play model, BEYOND can easily cooperate with the Adversarial Trained Classifier (ATC), achieving the state-of-the-art (SOTA) robustness accuracy. Experimental results show that BEYOND outperforms baselines by a large margin, especially under adaptive attacks. Empowered by the robust relation net built on SSL, we found that BEYOND outperforms baselines in terms of both detection ability and speed. Our code will be publicly available.
    Catastrophic Interference in Reinforcement Learning: A Solution Based on Context Division and Knowledge Distillation. (arXiv:2109.00525v2 [cs.LG] UPDATED)
    The powerful learning ability of deep neural networks enables reinforcement learning agents to learn competent control policies directly from continuous environments. In theory, to achieve stable performance, neural networks assume i.i.d. inputs, which unfortunately does no hold in the general reinforcement learning paradigm where the training data is temporally correlated and non-stationary. This issue may lead to the phenomenon of "catastrophic interference" and the collapse in performance. In this paper, we present IQ, i.e., interference-aware deep Q-learning, to mitigate catastrophic interference in single-task deep reinforcement learning. Specifically, we resort to online clustering to achieve on-the-fly context division, together with a multi-head network and a knowledge distillation regularization term for preserving the policy of learned contexts. Built upon deep Q networks, IQ consistently boosts the stability and performance when compared to existing methods, verified with extensive experiments on classic control and Atari tasks. The code is publicly available at: https://github.com/Sweety-dm/Interference-aware-Deep-Q-learning.
    Single-level Adversarial Data Synthesis based on Neural Tangent Kernels. (arXiv:2204.04090v5 [cs.LG] UPDATED)
    Generative adversarial networks (GANs) have achieved impressive performance in data synthesis and have driven the development of many applications. However, GANs are known to be hard to train due to their bilevel objective, which leads to the problems of convergence, mode collapse, and gradient vanishing. In this paper, we propose a new generative model called the generative adversarial NTK (GA-NTK) that has a single-level objective. The GA-NTK keeps the spirit of adversarial learning (which helps generate plausible data) while avoiding the training difficulties of GANs. This is done by modeling the discriminator as a Gaussian process with a neural tangent kernel (NTK-GP) whose training dynamics can be completely described by a closed-form formula. We analyze the convergence behavior of GA-NTK trained by gradient descent and give some sufficient conditions for convergence. We also conduct extensive experiments to study the advantages and limitations of GA-NTK and propose some techniques that make GA-NTK more practical.
    Proper-Composite Loss Functions in Arbitrary Dimensions. (arXiv:1902.06881v3 [cs.LG] UPDATED)
    The study of a machine learning problem is in many ways is difficult to separate from the study of the loss function being used. One avenue of inquiry has been to look at these loss functions in terms of their properties as scoring rules via the proper-composite representation, in which predictions are mapped to probability distributions which are then scored via a scoring rule. However, recent research so far has primarily been concerned with analysing the (typically) finite-dimensional conditional risk problem on the output space, leaving aside the larger total risk minimisation. We generalise a number of these results to an infinite dimensional setting and in doing so we are able to exploit the familial resemblance of density and conditional density estimation to provide a simple characterisation of the canonical link.
    Non-trivial symmetries in quantum landscapes and their resilience to quantum noise. (arXiv:2011.08763v3 [quant-ph] UPDATED)
    Very little is known about the cost landscape for parametrized Quantum Circuits (PQCs). Nevertheless, PQCs are employed in Quantum Neural Networks and Variational Quantum Algorithms, which may allow for near-term quantum advantage. Such applications require good optimizers to train PQCs. Recent works have focused on quantum-aware optimizers specifically tailored for PQCs. However, ignorance of the cost landscape could hinder progress towards such optimizers. In this work, we analytically prove two results for PQCs: (1) We find an exponentially large symmetry in PQCs, yielding an exponentially large degeneracy of the minima in the cost landscape. Alternatively, this can be cast as an exponential reduction in the volume of relevant hyperparameter space. (2) We study the resilience of the symmetries under noise, and show that while it is conserved under unital noise, non-unital channels can break these symmetries and lift the degeneracy of minima, leading to multiple new local minima. Based on these results, we introduce an optimization method called Symmetry-based Minima Hopping (SYMH), which exploits the underlying symmetries in PQCs. Our numerical simulations show that SYMH improves the overall optimizer performance in the presence of non-unital noise at a level comparable to current hardware. Overall, this work derives large-scale circuit symmetries from local gate transformations, and uses them to construct a noise-aware optimization method.
    Go-Explore Complex 3D Game Environments for Automated Reachability Testing. (arXiv:2209.00570v1 [cs.AI])
    Modern AAA video games feature huge game levels and maps which are increasingly hard for level testers to cover exhaustively. As a result, games often ship with catastrophic bugs such as the player falling through the floor or being stuck in walls. We propose an approach specifically targeted at reachability bugs in simulated 3D environments based on the powerful exploration algorithm, Go-Explore, which saves unique checkpoints across the map and then identifies promising ones to explore from. We show that when coupled with simple heuristics derived from the game's navigation mesh, Go-Explore finds challenging bugs and comprehensively explores complex environments without the need for human demonstration or knowledge of the game dynamics. Go-Explore vastly outperforms more complicated baselines including reinforcement learning with intrinsic curiosity in both covering the navigation mesh and number of unique positions across the map discovered. Finally, due to our use of parallel agents, our algorithm can fully cover a vast 1.5km x 1.5km game world within 10 hours on a single machine making it extremely promising for continuous testing suites.
    The alignment problem from a deep learning perspective. (arXiv:2209.00626v1 [cs.AI])
    Within the coming decades, artificial general intelligence (AGI) may surpass human capabilities at a wide range of important tasks. This report makes a case for why, without substantial action to prevent it, AGIs will likely use their intelligence to pursue goals which are very undesirable (in other words, misaligned) from a human perspective, with potentially catastrophic consequences. The report aims to cover the key arguments motivating concern about the alignment problem in a way that's as succinct, concrete and technically-grounded as possible. I argue that realistic training processes plausibly lead to the development of misaligned goals in AGIs, in particular because neural networks trained via reinforcement learning will learn to plan towards achieving a range of goals; gain more reward by deceptively pursuing misaligned goals; and generalize in ways which undermine obedience. As in an earlier report from Cotra (2022), I explain my claims with reference to an illustrative AGI training process, then outline possible research directions for addressing different aspects of the problem.
    Model Transparency and Interpretability : Survey and Application to the Insurance Industry. (arXiv:2209.00562v1 [stat.ML])
    The use of models, even if efficient, must be accompanied by an understanding at all levels of the process that transforms data (upstream and downstream). Thus, needs increase to define the relationships between individual data and the choice that an algorithm could make based on its analysis (e.g. the recommendation of one product or one promotional offer, or an insurance rate representative of the risk). Model users must ensure that models do not discriminate and that it is also possible to explain their results. This paper introduces the importance of model interpretation and tackles the notion of model transparency. Within an insurance context, it specifically illustrates how some tools can be used to enforce the control of actuarial models that can nowadays leverage on machine learning. On a simple example of loss frequency estimation in car insurance, we show the interest of some interpretability methods to adapt explanation to the target audience.
    The Neural Process Family: Survey, Applications and Perspectives. (arXiv:2209.00517v1 [cs.LG])
    The standard approaches to neural network implementation yield powerful function approximation capabilities but are limited in their abilities to learn meta representations and reason probabilistic uncertainties in their predictions. Gaussian processes, on the other hand, adopt the Bayesian learning scheme to estimate such uncertainties but are constrained by their efficiency and approximation capacity. The Neural Processes Family (NPF) intends to offer the best of both worlds by leveraging neural networks for meta-learning predictive uncertainties. Such potential has brought substantial research activity to the family in recent years. Therefore, a comprehensive survey of NPF models is needed to organize and relate their motivation, methodology, and experiments. This paper intends to address this gap while digging deeper into the formulation, research themes, and applications concerning the family members. We shed light on their potential to bring several recent advances in other deep learning domains under one umbrella. We then provide a rigorous taxonomy of the family and empirically demonstrate their capabilities for modeling data generating functions operating on 1-d, 2-d, and 3-d input domains. We conclude by discussing our perspectives on the promising directions that can fuel the research advances in the field. Code for our experiments will be made available at https://github.com/srvCodes/neural-processes-survey.
    Physically-primed deep-neural-networks for generalized undersampled MRI reconstruction. (arXiv:2209.00462v1 [eess.IV])
    A plethora of deep-neural-networks (DNN) based methods were proposed over the past few years to address the challenging ill-posed inverse problem of MRI reconstruction from undersampled "k-space" (Fourier domain) data. However, instability against variations in the acquisition process and the anatomical distribution, indicates a poor generalization of the relevant physical models by the DNN architectures compared to their classical counterparts. The poor generalization effectively precludes DNN applicability for undersampled MRI reconstruction in the clinical setting. We improve the generalization capacity of DNN methods for undersampled MRI reconstruction by introducing a physically-primed DNN architecture and training approach. Our architecture encodes the undersampling mask in addition to the observed data in the model architecture and employs an appropriate training approach that uses data generated with various undersampling masks to encourage the model to generalize the undersampled MRI reconstruction problem. We demonstrated the added-value of our approach through extensive experimentation with the publicly available Fast-MRI dataset. Our physically-primed approach achieved an enhanced generalization capacity which resulted in significantly improved robustness against variations in the acquisition process and in the anatomical distribution, especially in pathological regions, compared to both vanilla DNN methods and DNN trained with undersampling mask augmentation. Trained models and code to replicate our experiments will become available for research purposes upon acceptance.
    Group Activity Recognition in Basketball Tracking Data -- Neural Embeddings in Team Sports (NETS). (arXiv:2209.00451v1 [cs.LG])
    Like many team sports, basketball involves two groups of players who engage in collaborative and adversarial activities to win a game. Players and teams are executing various complex strategies to gain an advantage over their opponents. Defining, identifying, and analyzing different types of activities is an important task in sports analytics, as it can lead to better strategies and decisions by the players and coaching staff. The objective of this paper is to automatically recognize basketball group activities from tracking data representing locations of players and the ball during a game. We propose a novel deep learning approach for group activity recognition (GAR) in team sports called NETS. To efficiently model the player relations in team sports, we combined a Transformer-based architecture with LSTM embedding, and a team-wise pooling layer to recognize the group activity. Training such a neural network generally requires a large amount of annotated data, which incurs high labeling cost. To address scarcity of manual labels, we generate weak-labels and pretrain the neural network on a self-supervised trajectory prediction task. We used a large tracking data set from 632 NBA games to evaluate our approach. The results show that NETS is capable of learning group activities with high accuracy, and that self- and weak-supervised training in NETS have a positive impact on GAR accuracy.
    Optimal Regularized Online Convex Allocation by Adaptive Re-Solving. (arXiv:2209.00399v1 [cs.LG])
    This paper introduces a dual-based algorithm framework for solving the regularized online resource allocation problems, which have cumulative convex rewards, hard resource constraints, and a non-separable regularizer. Under a strategy of adaptively updating the resource constraints, the proposed framework only requests an approximate solution to the empirical dual problem up to a certain accuracy, and yet delivers an optimal logarithmic regret under a locally strongly convex assumption. Surprisingly, a delicate analysis of dual objective function enables us to eliminate the notorious loglog factor in regret bound. The flexible framework renders renowned and computationally fast algorithms immediately applicable, e.g., dual gradient descent and stochastic gradient descent. A worst-case square-root regret lower bound is established if the resource constraints are not adaptively updated during dual optimization, which underscores the critical role of adaptive dual variable update. Comprehensive numerical experiments and real data application demonstrate the merits of proposed algorithm framework.
    ContrastVAE: Contrastive Variational AutoEncoder for Sequential Recommendation. (arXiv:2209.00456v1 [cs.IR])
    Aiming at exploiting the rich information in user behaviour sequences, sequential recommendation has been widely adopted in real-world recommender systems. However, current methods suffer from the following issues: 1) sparsity of user-item interactions, 2) uncertainty of sequential records, 3) long-tail items. In this paper, we propose to incorporate contrastive learning into the framework of Variational AutoEncoders to address these challenges simultaneously. Firstly, we introduce ContrastELBO, a novel training objective that extends the conventional single-view ELBO to two-view case and theoretically builds a connection between VAE and contrastive learning from a two-view perspective. Then we propose Contrastive Variational AutoEncoder (ContrastVAE in short), a two-branched VAE model with contrastive regularization as an embodiment of ContrastELBO for sequential recommendation. We further introduce two simple yet effective augmentation strategies named model augmentation and variational augmentation to create a second view of a sequence and thus making contrastive learning possible. Experiments on four benchmark datasets demonstrate the effectiveness of ContrastVAE and the proposed augmentation methods. Codes are available at https://github.com/YuWang-1024/ContrastVAE
    SemSegDepth: A Combined Model for Semantic Segmentation and Depth Completion. (arXiv:2209.00381v1 [cs.CV])
    Holistic scene understanding is pivotal for the performance of autonomous machines. In this paper we propose a new end-to-end model for performing semantic segmentation and depth completion jointly. The vast majority of recent approaches have developed semantic segmentation and depth completion as independent tasks. Our approach relies on RGB and sparse depth as inputs to our model and produces a dense depth map and the corresponding semantic segmentation image. It consists of a feature extractor, a depth completion branch, a semantic segmentation branch and a joint branch which further processes semantic and depth information altogether. The experiments done on Virtual KITTI 2 dataset, demonstrate and provide further evidence, that combining both tasks, semantic segmentation and depth completion, in a multi-task network can effectively improve the performance of each task. Code is available at https://github.com/juanb09111/semantic depth.
    Learning Generative Embeddings using an Optimal Subsampling Policy for Tensor Sketching. (arXiv:2209.00372v1 [cs.LG])
    Data tensors of orders 3 and greater are routinely being generated. These data collections are increasingly huge and growing. They are either tensor fields (e.g., images, videos, geographic data) in which each location of data contains important information or permutation invariant general tensors (e.g., unsupervised latent space learning, graph network analysis, recommendation systems, etc.). Directly accessing such large data tensor collections for information has become increasingly prohibitive. We learn approximate full-rank and compact tensor sketches with decompositive representations providing compact space, time and spectral embeddings of both tensor fields (P-SCT) and general tensors (P-SCT-Permute). All subsequent information querying with high accuracy is performed on the generative sketches. We produce optimal rank-r Tucker decompositions of arbitrary order data tensors by building tensor sketches from a sample-efficient sub-sampling of tensor slices. Our sample efficient policy is learned via an adaptable stochastic Thompson sampling using Dirichlet distributions with conjugate priors.
    Safe reinforcement learning for multi-energy management systems with known constraint functions. (arXiv:2207.03830v4 [eess.SY] UPDATED)
    Reinforcement learning (RL) is a promising optimal control technique for multi-energy management systems. It does not require a model a priori - reducing the upfront and ongoing project-specific engineering effort and is capable of learning better representations of the underlying system dynamics. However, vanilla RL does not provide constraint satisfaction guarantees - resulting in various potentially unsafe interactions within its safety-critical environment. In this paper, we present two novel safe RL methods, namely SafeFallback and GiveSafe, where the safety constraint formulation is decoupled from the RL formulation. These provide hard-constraint, rather than soft- and chance-constraint, satisfaction guarantees both during training a (near) optimal policy (which involves exploratory and exploitative, i.e. greedy, steps) as well as during deployment of any policy (e.g. random agents or offline trained RL agents). This without the need of solving a mathematical program, resulting in less computational power requirements and a more flexible constraint function formulation (no derivative information is required). In a simulated multi-energy systems case study we have shown that both methods start with a significantly higher utility (i.e. useful policy) compared to a vanilla RL benchmark and Optlayer benchmark (94,6% and 82,8% compared to 35,5% and 77,8%) and that the proposed SafeFallback method even can outperform the vanilla RL benchmark (102,9% to 100%). We conclude that both methods are viably safety constraint handling techniques applicable beyond RL, as demonstrated with random policies while still providing hard-constraint guarantees.
    CPS Attack Detection under Limited Local Information in Cyber Security: A Multi-node Multi-class Classification Ensemble Approach. (arXiv:2209.00170v1 [cs.CR])
    Cybersecurity breaches are the common anomalies for distributed cyber-physical systems (CPS). However, the cyber security breach classification is still a difficult problem, even using cutting-edge artificial intelligence (AI) approaches. In this paper, we study the multi-class classification problem in cyber security for attack detection. A challenging multi-node data-censoring case is considered. In such a case, data within each data center/node cannot be shared while the local data is incomplete. Particularly, local nodes contain only a part of the multiple classes. In order to train a global multi-class classifier without sharing the raw data across all nodes, the main result of our study is designing a multi-node multi-class classification ensemble approach. By gathering the estimated parameters of the binary classifiers and data densities from each local node, the missing information for each local node is completed to build the global multi-class classifier. Numerical experiments are given to validate the effectiveness of the proposed approach under the multi-node data-censoring case. Under such a case, we even show the out-performance of the proposed approach over the full-data approach.
    Tree-Based Adaptive Model Learning. (arXiv:2209.00122v1 [cs.FL])
    We extend the Kearns-Vazirani learning algorithm to be able to handle systems that change over time. We present a new learning algorithm that can reuse and update previously learned behavior, implement it in the LearnLib library, and evaluate it on large examples, to which we make small adjustments between two runs of the algorithm. In these experiments our algorithm significantly outperforms both the classic Kearns-Vazirani learning algorithm and the current state-of-the-art adaptive algorithm.
    Hermes: Accelerating Long-Latency Load Requests via Perceptron-Based Off-Chip Load Prediction. (arXiv:2209.00188v1 [cs.AR])
    Long-latency load requests continue to limit the performance of high-performance processors. To increase the latency tolerance of a processor, architects have primarily relied on two key techniques: sophisticated data prefetchers and large on-chip caches. In this work, we show that: 1) even a sophisticated state-of-the-art prefetcher can only predict half of the off-chip load requests on average across a wide range of workloads, and 2) due to the increasing size and complexity of on-chip caches, a large fraction of the latency of an off-chip load request is spent accessing the on-chip cache hierarchy. The goal of this work is to accelerate off-chip load requests by removing the on-chip cache access latency from their critical path. To this end, we propose a new technique called Hermes, whose key idea is to: 1) accurately predict which load requests might go off-chip, and 2) speculatively fetch the data required by the predicted off-chip loads directly from the main memory, while also concurrently accessing the cache hierarchy for such loads. To enable Hermes, we develop a new lightweight, perceptron-based off-chip load prediction technique that learns to identify off-chip load requests using multiple program features (e.g., sequence of program counters). For every load request, the predictor observes a set of program features to predict whether or not the load would go off-chip. If the load is predicted to go off-chip, Hermes issues a speculative request directly to the memory controller once the load's physical address is generated. If the prediction is correct, the load eventually misses the cache hierarchy and waits for the ongoing speculative request to finish, thus hiding the on-chip cache hierarchy access latency from the critical path of the off-chip load. Our evaluation shows that Hermes significantly improves performance of a state-of-the-art baseline. We open-source Hermes.
    Computational design of antimicrobial active surfaces via automated Bayesian optimization. (arXiv:2209.00055v1 [physics.bio-ph])
    Biofilms pose significant problems for engineers in diverse fields, such as marine science, bioenergy, and biomedicine, where effective biofilm control is a long-term goal. The adhesion and surface mechanics of biofilms play crucial roles in generating and removing biofilm. Designing customized nano-surfaces with different surface topologies can alter the adhesive properties to remove biofilms more easily and greatly improve long-term biofilm control. To rapidly design such topologies, we employ individual-based modeling and Bayesian optimization to automate the design process and generate different active surfaces for effective biofilm removal. Our framework successfully generated ideal nano-surfaces for biofilm removal through applied shear and vibration. Densely distributed short pillar topography is the optimal geometry to prevent biofilm formation. Under fluidic shearing, the optimal topography is to sparsely distribute tall, slim, pillar-like structures. When subjected to either vertical or lateral vibrations, thick trapezoidal cones are found to be optimal. Optimizing the vibrational loading indicates a small vibration magnitude with relatively low frequencies is more efficient in removing biofilm. Our results provide insights into various engineering fields that require surface-mediated biofilm control. Our framework can also be applied to more general materials design and optimization.
    RecLight: A Recurrent Neural Network Accelerator with Integrated Silicon Photonics. (arXiv:2209.00084v1 [cs.LG])
    Recurrent Neural Networks (RNNs) are used in applications that learn dependencies in data sequences, such as speech recognition, human activity recognition, and anomaly detection. In recent years, newer RNN variants, such as GRUs and LSTMs, have been used for implementing these applications. As many of these applications are employed in real-time scenarios, accelerating RNN/LSTM/GRU inference is crucial. In this paper, we propose a novel photonic hardware accelerator called RecLight for accelerating simple RNNs, GRUs, and LSTMs. Simulation results indicate that RecLight achieves 37x lower energy-per-bit and 10% better throughput compared to the state-of-the-art.
  • Open

    Learning "best" kernels from data in Gaussian process regression. With application to aerodynamics. (arXiv:2206.02563v2 [stat.ML] UPDATED)
    This paper introduces algorithms to select/design kernels in Gaussian process regression/kriging surrogate modeling techniques. We adopt the setting of kernel method solutions in ad hoc functional spaces, namely Reproducing Kernel Hilbert Spaces (RKHS), to solve the problem of approximating a regular target function given observations of it, i.e. supervised learning. A first class of algorithms is kernel flow, which was introduced in the context of classification in machine learning. It can be seen as a cross-validation procedure whereby a "best" kernel is selected such that the loss of accuracy incurred by removing some part of the dataset (typically half of it) is minimized. A second class of algorithms is called spectral kernel ridge regression, and aims at selecting a "best" kernel such that the norm of the function to be approximated is minimal in the associated RKHS. Within Mercer's theorem framework, we obtain an explicit construction of that "best" kernel in terms of the main features of the target function. Both approaches of learning kernels from data are illustrated by numerical examples on synthetic test functions, and on a classical test case in turbulence modeling validation for transonic flows about a two-dimensional airfoil.
    Bias and Priors in Machine Learning Calibrations for High Energy Physics. (arXiv:2205.05084v2 [hep-ph] UPDATED)
    Machine learning offers an exciting opportunity to improve the calibration of nearly all reconstructed objects in high-energy physics detectors. However, machine learning approaches often depend on the spectra of examples used during training, an issue known as prior dependence. This is an undesirable property of a calibration, which needs to be applicable in a variety of environments. The purpose of this paper is to explicitly highlight the prior dependence of some machine learning-based calibration strategies. We demonstrate how some recent proposals for both simulation-based and data-based calibrations inherit properties of the sample used for training, which can result in biases for downstream analyses. In the case of simulation-based calibration, we argue that our recently proposed Gaussian Ansatz approach can avoid some of the pitfalls of prior dependence, whereas prior-independent data-based calibration remains an open problem.
    The Selectively Adaptive Lasso. (arXiv:2205.10697v2 [stat.ML] UPDATED)
    Machine learning regression methods allow estimation of functions without unrealistic parametric assumptions. Although they can perform exceptionally in prediction error, most lack theoretical convergence rates necessary for semi-parametric efficient estimation (e.g. TMLE, AIPW) of parameters like average treatment effects. The Highly Adaptive Lasso (HAL) is the only regression method proven to converge quickly enough for a meaningfully large class of functions, independent of the dimensionality of the predictors. Unfortunately, HAL is not computationally scalable. In this paper we build upon the theory of HAL to construct the Selectively Adaptive Lasso (SAL), a new algorithm which retains HAL's dimension-free, nonparametric convergence rate but which also scales computationally to massive datasets. To accomplish this, we prove some general theoretical results pertaining to empirical loss minimization in nested Donsker classes. Our resulting algorithm is a form of gradient tree boosting with an adaptive learning rate, which makes it fast and trivial to implement with off-the-shelf software. Finally, we show that our algorithm retains the performance of standard gradient boosting on a diverse group of real-world datasets. SAL makes semi-parametric efficient estimators practically possible and theoretically justifiable in many big data settings.
    Centroid Approximation for Bootstrap: Improving Particle Quality at Inference. (arXiv:2110.08720v2 [cs.LG] UPDATED)
    Bootstrap is a principled and powerful frequentist statistical tool for uncertainty quantification. Unfortunately, standard bootstrap methods are computationally intensive due to the need of drawing a large i.i.d. bootstrap sample to approximate the ideal bootstrap distribution; this largely hinders their application in large-scale machine learning, especially deep learning problems. In this work, we propose an efficient method to explicitly \emph{optimize} a small set of high quality ``centroid'' points to better approximate the ideal bootstrap distribution. We achieve this by minimizing a simple objective function that is asymptotically equivalent to the Wasserstein distance to the ideal bootstrap distribution. This allows us to provide an accurate estimation of uncertainty with a small number of bootstrap centroids, outperforming the naive i.i.d. sampling approach. Empirically, we show that our method can boost the performance of bootstrap in a variety of applications.
    Kernel Methods for Unobserved Confounding: Negative Controls, Proxies, and Instruments. (arXiv:2012.10315v3 [stat.ML] UPDATED)
    Negative control is a strategy for learning the causal relationship between treatment and outcome in the presence of unmeasured confounding. The treatment effect can nonetheless be identified if two auxiliary variables are available: a negative control treatment (which has no effect on the actual outcome), and a negative control outcome (which is not affected by the actual treatment). These auxiliary variables can also be viewed as proxies for a traditional set of control variables, and they bear resemblance to instrumental variables. I propose a family of algorithms based on kernel ridge regression for learning nonparametric treatment effects with negative controls. Examples include dose response curves, dose response curves with distribution shift, and heterogeneous treatment effects. Data may be discrete or continuous, and low, high, or infinite dimensional. I prove uniform consistency and provide finite sample rates of convergence. I estimate the dose response curve of cigarette smoking on infant birth weight adjusting for unobserved confounding due to household income, using a data set of singleton births in the state of Pennsylvania between 1989 and 1991.
    Variational Sparse Coding with Learned Thresholding. (arXiv:2205.03665v2 [cs.LG] UPDATED)
    Sparse coding strategies have been lauded for their parsimonious representations of data that leverage low dimensional structure. However, inference of these codes typically relies on an optimization procedure with poor computational scaling in high-dimensional problems. For example, sparse inference in the representations learned in the high-dimensional intermediary layers of deep neural networks (DNNs) requires an iterative minimization to be performed at each training step. As such, recent, quick methods in variational inference have been proposed to infer sparse codes by learning a distribution over the codes with a DNN. In this work, we propose a new approach to variational sparse coding that allows us to learn sparse distributions by thresholding samples, avoiding the use of problematic relaxations. We first evaluate and analyze our method by training a linear generator, showing that it has superior performance, statistical efficiency, and gradient estimation compared to other sparse distributions. We then compare to a standard variational autoencoder using a DNN generator on the Fashion MNIST and CelebA datasets
    Geometry-informed irreversible perturbations for accelerated convergence of Langevin dynamics. (arXiv:2108.08247v2 [stat.ME] UPDATED)
    We introduce a novel geometry-informed irreversible perturbation that accelerates convergence of the Langevin algorithm for Bayesian computation. It is well documented that there exist perturbations to the Langevin dynamics that preserve its invariant measure while accelerating its convergence. Irreversible perturbations and reversible perturbations (such as Riemannian manifold Langevin dynamics (RMLD)) have separately been shown to improve the performance of Langevin samplers. We consider these two perturbations simultaneously by presenting a novel form of irreversible perturbation for RMLD that is informed by the underlying geometry. Through numerical examples, we show that this new irreversible perturbation can improve estimation performance over irreversible perturbations that do not take the geometry into account. Moreover we demonstrate that irreversible perturbations generally can be implemented in conjunction with the stochastic gradient version of the Langevin algorithm. Lastly, while continuous-time irreversible perturbations cannot impair the performance of a Langevin estimator, the situation can sometimes be more complicated when discretization is considered. To this end, we describe a discrete-time example in which irreversibility increases both the bias and variance of the resulting estimator.
    Asymptotic Normality of Log Likelihood Ratio and Fundamental Limit of the Weak Detection for Spiked Wigner Matrices. (arXiv:2203.00821v3 [math.ST] UPDATED)
    We consider the problem of detecting the presence of a signal in a rank-one spiked Wigner model. For general non-Gaussian noise, assuming that the signal is drawn from the Rademacher prior, we prove that the log likelihood ratio (LR) of the spiked model against the null model converges to a Gaussian when the signal-to-noise ratio is below a certain threshold. The threshold is optimal in the sense that the reliable detection is possible by a transformed principal component analysis (PCA) above it. From the mean and the variance of the limiting Gaussian for the log LR, we compute the limit of the sum of the Type-I error and the Type-II error of the likelihood ratio test. We also prove similar results for a rank-one spiked IID model where the noise is asymmetric but the signal is symmetric.
    Density Encoding Enables Resource-Efficient Randomly Connected Neural Networks. (arXiv:1909.09153v2 [cs.LG] UPDATED)
    The deployment of machine learning algorithms on resource-constrained edge devices is an important challenge from both theoretical and applied points of view. In this article, we focus on resource-efficient randomly connected neural networks known as Random Vector Functional Link (RVFL) networks since their simple design and extremely fast training time make them very attractive for solving many applied classification tasks. We propose to represent input features via the density-based encoding known in the area of stochastic computing and use the operations of binding and bundling from the area of hyperdimensional computing for obtaining the activations of the hidden neurons. Using a collection of 121 real-world datasets from the UCI Machine Learning Repository, we empirically show that the proposed approach demonstrates higher average accuracy than the conventional RVFL. We also demonstrate that it is possible to represent the readout matrix using only integers in a limited range with minimal loss in the accuracy. In this case, the proposed approach operates only on small n-bits integers, which results in a computationally efficient architecture. Finally, through hardware Field-Programmable Gate Array (FPGA) implementations, we show that such an approach consumes approximately eleven times less energy than that of the conventional RVFL.
    Non-trivial symmetries in quantum landscapes and their resilience to quantum noise. (arXiv:2011.08763v3 [quant-ph] UPDATED)
    Very little is known about the cost landscape for parametrized Quantum Circuits (PQCs). Nevertheless, PQCs are employed in Quantum Neural Networks and Variational Quantum Algorithms, which may allow for near-term quantum advantage. Such applications require good optimizers to train PQCs. Recent works have focused on quantum-aware optimizers specifically tailored for PQCs. However, ignorance of the cost landscape could hinder progress towards such optimizers. In this work, we analytically prove two results for PQCs: (1) We find an exponentially large symmetry in PQCs, yielding an exponentially large degeneracy of the minima in the cost landscape. Alternatively, this can be cast as an exponential reduction in the volume of relevant hyperparameter space. (2) We study the resilience of the symmetries under noise, and show that while it is conserved under unital noise, non-unital channels can break these symmetries and lift the degeneracy of minima, leading to multiple new local minima. Based on these results, we introduce an optimization method called Symmetry-based Minima Hopping (SYMH), which exploits the underlying symmetries in PQCs. Our numerical simulations show that SYMH improves the overall optimizer performance in the presence of non-unital noise at a level comparable to current hardware. Overall, this work derives large-scale circuit symmetries from local gate transformations, and uses them to construct a noise-aware optimization method.
    Deep Dimension Reduction for Supervised Representation Learning. (arXiv:2006.05865v3 [cs.LG] UPDATED)
    The goal of supervised representation learning is to construct effective data representations for prediction. Among all the characteristics of an ideal nonparametric representation of high-dimensional complex data, sufficiency, low dimensionality and disentanglement are some of the most essential ones. We propose a deep dimension reduction approach to learning representations with these characteristics. The proposed approach is a nonparametric generalization of the sufficient dimension reduction method. We formulate the ideal representation learning task as that of finding a nonparametric representation that minimizes an objective function characterizing conditional independence and promoting disentanglement at the population level. We then estimate the target representation at the sample level nonparametrically using deep neural networks. We show that the estimated deep nonparametric representation is consistent in the sense that its excess risk converges to zero. Our extensive numerical experiments using simulated and real benchmark data demonstrate that the proposed methods have better performance than several existing dimension reduction methods and the standard deep learning models in the context of classification and regression.
    Slowly Varying Regression under Sparsity. (arXiv:2102.10773v4 [cs.LG] UPDATED)
    We consider the problem of parameter estimation in slowly varying regression models with sparsity constraints. We formulate the problem as a mixed integer optimization problem and demonstrate that it can be reformulated exactly as a binary convex optimization problem through a novel exact relaxation. The relaxation utilizes a new equality on Moore-Penrose inverses that convexifies the non-convex objective function while coinciding with the original objective on all feasible binary points. This allows us to solve the problem significantly more efficiently and to provable optimality using a cutting plane-type algorithm. We develop a highly optimized implementation of such algorithm, which substantially improves upon the asymptotic computational complexity of a straightforward implementation. We further develop a heuristic method that is guaranteed to produce a feasible solution and, as we empirically illustrate, generates high quality warm-start solutions for the binary optimization problem. We show, on both synthetic and real-world datasets, that the resulting algorithm outperforms competing formulations in comparable times across a variety of metrics including out-of-sample predictive performance, support recovery accuracy, and false positive rate. The algorithm enables us to train models with 10,000s of parameters, is robust to noise, and able to effectively capture the underlying slowly changing support of the data generating process.
    Asynchronous speedup in decentralized optimization. (arXiv:2106.03585v2 [math.OC] UPDATED)
    In decentralized optimization, nodes of a communication network each possess a local objective function, and communicate using gossip-based methods in order to minimize the average of these per-node functions. While synchronous algorithms are heavily impacted by a few slow nodes or edges in the graph (the \emph{straggler problem}), their asynchronous counterparts are notoriously harder to parametrize. Indeed, their convergence properties for networks with heterogeneous communication and computation delays have defied analysis so far. In this paper, we use a \emph{ continuized} framework to analyze asynchronous algorithms in networks with delays. Our approach yields a precise characterization of convergence time and of its dependency on heterogeneous delays in the network. Our continuized framework benefits from the best of both continuous and discrete worlds: the algorithms it applies to are based on event-driven updates. They are thus essentially discrete and hence readily implementable. Yet their analysis is essentially in continuous time, relying in part on the theory of delayed ODEs. Our algorithms moreover achieve an \emph{asynchronous speedup}: their rate of convergence is controlled by the eigengap of the network graph weighted by local delays, instead of the network-wide worst-case delay as in previous analyses. Our methods thus enjoy improved robustness to stragglers.
    Unsupervised Learning of Lagrangian Dynamics from Images for Prediction and Control. (arXiv:2007.01926v3 [cs.LG] UPDATED)
    Recent approaches for modelling dynamics of physical systems with neural networks enforce Lagrangian or Hamiltonian structure to improve prediction and generalization. However, when coordinates are embedded in high-dimensional data such as images, these approaches either lose interpretability or can only be applied to one particular example. We introduce a new unsupervised neural network model that learns Lagrangian dynamics from images, with interpretability that benefits prediction and control. The model infers Lagrangian dynamics on generalized coordinates that are simultaneously learned with a coordinate-aware variational autoencoder (VAE). The VAE is designed to account for the geometry of physical systems composed of multiple rigid bodies in the plane. By inferring interpretable Lagrangian dynamics, the model learns physical system properties, such as kinetic and potential energy, which enables long-term prediction of dynamics in the image space and synthesis of energy-based controllers.
    Optimal Approximations Made Easy. (arXiv:2008.08970v3 [cs.LG] UPDATED)
    The fundamental result of Li, Long, and Srinivasan on approximations of set systems has become a key tool across several communities such as learning theory, algorithms, computational geometry, combinatorics and data analysis. The goal of this paper is to give a modular, self-contained, intuitive proof of this result for finite set systems. The only ingredient we assume is the standard Chernoff's concentration bound. This makes the proof accessible to a wider audience, readers not familiar with techniques from statistical learning theory, and makes it possible to be covered in a single self-contained lecture in a geometry, algorithms or combinatorics course.
    Non Asymptotic Bounds for Optimization via Online Multiplicative Stochastic Gradient Descent. (arXiv:2112.07110v8 [stat.ML] UPDATED)
    The gradient noise of Stochastic Gradient Descent (SGD) is considered to play a key role in its properties (e.g. escaping low potential points and regularization). Past research has indicated that the covariance of the SGD error done via minibatching plays a critical role in determining its regularization and escape from low potential points. It is however not much explored how much the distribution of the error influences the behavior of the algorithm. Motivated by some new research in this area, we prove universality results by showing that noise classes that have the same mean and covariance structure of SGD via minibatching have similar properties. We mainly consider the Multiplicative Stochastic Gradient Descent (M-SGD) algorithm as introduced by Wu et al., which has a much more general noise class than the SGD algorithm done via minibatching. We establish nonasymptotic bounds for the M-SGD algorithm mainly with respect to the Stochastic Differential Equation corresponding to SGD via minibatching. We also show that the M-SGD error is approximately a scaled Gaussian distribution with mean $0$ at any fixed point of the M-SGD algorithm. We also establish bounds for the convergence of the M-SGD algorithm in the strongly convex regime.
    History Compression via Language Models in Reinforcement Learning. (arXiv:2205.12258v3 [cs.LG] UPDATED)
    In a partially observable Markov decision process (POMDP), an agent typically uses a representation of the past to approximate the underlying MDP. We propose to utilize a frozen Pretrained Language Transformer (PLT) for history representation and compression to improve sample efficiency. To avoid training of the Transformer, we introduce FrozenHopfield, which automatically associates observations with pretrained token embeddings. To form these associations, a modern Hopfield network stores these token embeddings, which are retrieved by queries that are obtained by a random but fixed projection of observations. Our new method, HELM, enables actor-critic network architectures that contain a pretrained language Transformer for history representation as a memory module. Since a representation of the past need not be learned, HELM is much more sample efficient than competitors. On Minigrid and Procgen environments HELM achieves new state-of-the-art results. Our code is available at https://github.com/ml-jku/helm.
    Model Transparency and Interpretability : Survey and Application to the Insurance Industry. (arXiv:2209.00562v1 [stat.ML])
    The use of models, even if efficient, must be accompanied by an understanding at all levels of the process that transforms data (upstream and downstream). Thus, needs increase to define the relationships between individual data and the choice that an algorithm could make based on its analysis (e.g. the recommendation of one product or one promotional offer, or an insurance rate representative of the risk). Model users must ensure that models do not discriminate and that it is also possible to explain their results. This paper introduces the importance of model interpretation and tackles the notion of model transparency. Within an insurance context, it specifically illustrates how some tools can be used to enforce the control of actuarial models that can nowadays leverage on machine learning. On a simple example of loss frequency estimation in car insurance, we show the interest of some interpretability methods to adapt explanation to the target audience.  ( 2 min )
    A Unified Framework for Estimation of High-dimensional Conditional Factor Models. (arXiv:2209.00391v1 [econ.EM])
    This paper develops a general framework for estimation of high-dimensional conditional factor models via nuclear norm regularization. We establish large sample properties of the estimators, and provide an efficient computing algorithm for finding the estimators as well as a cross validation procedure for choosing the regularization parameter. The general framework allows us to estimate a variety of conditional factor models in a unified way and quickly deliver new asymptotic results. We apply the method to analyze the cross section of individual US stock returns, and find that imposing homogeneity may improve the model's out-of-sample predictability.  ( 2 min )
    Multimodal Machine Learning for Automated ICD Coding. (arXiv:1810.13348v4 [cs.LG] UPDATED)
    This study presents a multimodal machine learning model to predict ICD-10 diagnostic codes. We developed separate machine learning models that can handle data from different modalities, including unstructured text, semi-structured text and structured tabular data. We further employed an ensemble method to integrate all modality-specific models to generate ICD-10 codes. Key evidence was also extracted to make our prediction more convincing and explainable. We used the Medical Information Mart for Intensive Care III (MIMIC -III) dataset to validate our approach. For ICD code prediction, our best-performing model (micro-F1 = 0.7633, micro-AUC = 0.9541) significantly outperforms other baseline models including TF-IDF (micro-F1 = 0.6721, micro-AUC = 0.7879) and Text-CNN model (micro-F1 = 0.6569, micro-AUC = 0.9235). For interpretability, our approach achieves a Jaccard Similarity Coefficient (JSC) of 0.1806 on text data and 0.3105 on tabular data, where well-trained physicians achieve 0.2780 and 0.5002 respectively.  ( 3 min )
    Versatile Single-Loop Method for Gradient Estimator: First and Second Order Optimality, and its Application to Federated Learning. (arXiv:2209.00361v1 [cs.LG])
    While variance reduction methods have shown great success in solving large scale optimization problems, many of them suffer from accumulated errors and, therefore, should periodically require the full gradient computation. In this paper, we present a single-loop algorithm named SLEDGE (Single-Loop mEthoD for Gradient Estimator) for finite-sum nonconvex optimization, which does not require periodic refresh of the gradient estimator but achieves nearly optimal gradient complexity. Unlike existing methods, SLEDGE has the advantage of versatility; (i) second-order optimality, (ii) exponential convergence in the PL region, and (iii) smaller complexity under less heterogeneity of data. We build an efficient federated learning algorithm by exploiting these favorable properties. We show the first and second-order optimality of the output and also provide analysis under PL conditions. When the local budget is sufficiently large and clients are less (Hessian-)~heterogeneous, the algorithm requires fewer communication rounds then existing methods such as FedAvg, SCAFFOLD, and Mime. The superiority of our method is verified in numerical experiments.  ( 2 min )
    Fair mapping. (arXiv:2209.00617v1 [cs.LG])
    To mitigate the effects of undesired biases in models, several approaches propose to pre-process the input dataset to reduce the risks of discrimination by preventing the inference of sensitive attributes. Unfortunately, most of these pre-processing methods lead to the generation a new distribution that is very different from the original one, thus often leading to unrealistic data. As a side effect, this new data distribution implies that existing models need to be re-trained to be able to make accurate predictions. To address this issue, we propose a novel pre-processing method, that we coin as fair mapping, based on the transformation of the distribution of protected groups onto a chosen target one, with additional privacy constraints whose objective is to prevent the inference of sensitive attributes. More precisely, we leverage on the recent works of the Wasserstein GAN and AttGAN frameworks to achieve the optimal transport of data points coupled with a discriminator enforcing the protection against attribute inference. Our proposed approach, preserves the interpretability of data and can be used without defining exactly the sensitive groups. In addition, our approach can be specialized to model existing state-of-the-art approaches, thus proposing a unifying view on these methods. Finally, several experiments on real and synthetic datasets demonstrate that our approach is able to hide the sensitive attributes, while limiting the distortion of the data and improving the fairness on subsequent data analysis tasks.  ( 3 min )
    MSGNN: A Spectral Graph Neural Network Based on a Novel Magnetic Signed Laplacian. (arXiv:2209.00546v1 [stat.ML])
    Signed directed networks are ubiquitous in real-world applications. However, there has been relatively little work proposing spectral graph neural network (GNN) methods for analyzing such networks. Here we introduce a signed directed Laplacian matrix, which we call the magnetic signed Laplacian, as a natural generalization of both the signed Laplacian on signed graphs and the magnetic Laplacian on directed graphs. We then use this matrix to construct a novel spectral GNN architecture and conduct extensive experiments on both node clustering and link prediction tasks. In these experiments, we consider tasks related to signed information, tasks related to directional information, and tasks related to both signed and directional information. We demonstrate that our proposed spectral GNN is effective for incorporating both signed and directional information, and attains leading performance on a wide range of data sets. Additionally, we provide a novel synthetic network model, which we refer to as the signed directed stochastic block model, and a number of novel real-world data sets based on lead-lag relationships in financial time series.  ( 2 min )
    TokenCut: Segmenting Objects in Images and Videos with Self-supervised Transformer and Normalized Cut. (arXiv:2209.00383v1 [cs.CV])
    In this paper, we describe a graph-based algorithm that uses the features obtained by a self-supervised transformer to detect and segment salient objects in images and videos. With this approach, the image patches that compose an image or video are organised into a fully connected graph, where the edge between each pair of patches is labeled with a similarity score between patches using features learned by the transformer. Detection and segmentation of salient objects is then formulated as a graph-cut problem and solved using the classical Normalized Cut algorithm. Despite the simplicity of this approach, it achieves state-of-the-art results on several common image and video detection and segmentation tasks. For unsupervised object discovery, this approach outperforms the competing approaches by a margin of 6.1%, 5.7%, and 2.6%, respectively, when tested with the VOC07, VOC12, and COCO20K datasets. For the unsupervised saliency detection task in images, this method improves the score for Intersection over Union (IoU) by 4.4%, 5.6% and 5.2%. When tested with the ECSSD, DUTS, and DUT-OMRON datasets, respectively, compared to current state-of-the-art techniques. This method also achieves competitive results for unsupervised video object segmentation tasks with the DAVIS, SegTV2, and FBMS datasets.  ( 3 min )
    Fair learning with Wasserstein barycenters for non-decomposable performance measures. (arXiv:2209.00427v1 [stat.ML])
    This work provides several fundamental characterizations of the optimal classification function under the demographic parity constraint. In the awareness framework, akin to the classical unconstrained classification case, we show that maximizing accuracy under this fairness constraint is equivalent to solving a corresponding regression problem followed by thresholding at level $1/2$. We extend this result to linear-fractional classification measures (e.g., ${\rm F}$-score, AM measure, balanced accuracy, etc.), highlighting the fundamental role played by the regression problem in this framework. Our results leverage recently developed connection between the demographic parity constraint and the multi-marginal optimal transport formulation. Informally, our result shows that the transition between the unconstrained problems and the fair one is achieved by replacing the conditional expectation of the label by the solution of the fair regression problem. Finally, leveraging our analysis, we demonstrate an equivalence between the awareness and the unawareness setups in the case of two sensitive groups.  ( 2 min )
    Negation detection in Dutch clinical texts: an evaluation of rule-based and machine learning methods. (arXiv:2209.00470v1 [cs.CL])
    As structured data are often insufficient, labels need to be extracted from free text in electronic health records when developing models for clinical information retrieval and decision support systems. One of the most important contextual properties in clinical text is negation, which indicates the absence of findings. We aimed to improve large scale extraction of labels by comparing three methods for negation detection in Dutch clinical notes. We used the Erasmus Medical Center Dutch Clinical Corpus to compare a rule-based method based on ContextD, a biLSTM model using MedCAT and (finetuned) RoBERTa-based models. We found that both the biLSTM and RoBERTa models consistently outperform the rule-based model in terms of F1 score, precision and recall. In addition, we systematically categorized the classification errors for each model, which can be used to further improve model performance in particular applications. Combining the three models naively was not beneficial in terms of performance. We conclude that the biLSTM and RoBERTa-based models in particular are highly accurate accurate in detecting clinical negations, but that ultimately all three approaches can be viable depending on the use case at hand.  ( 3 min )
    Proper-Composite Loss Functions in Arbitrary Dimensions. (arXiv:1902.06881v3 [cs.LG] UPDATED)
    The study of a machine learning problem is in many ways is difficult to separate from the study of the loss function being used. One avenue of inquiry has been to look at these loss functions in terms of their properties as scoring rules via the proper-composite representation, in which predictions are mapped to probability distributions which are then scored via a scoring rule. However, recent research so far has primarily been concerned with analysing the (typically) finite-dimensional conditional risk problem on the output space, leaving aside the larger total risk minimisation. We generalise a number of these results to an infinite dimensional setting and in doing so we are able to exploit the familial resemblance of density and conditional density estimation to provide a simple characterisation of the canonical link.  ( 2 min )
    Two-stage Hypothesis Tests for Variable Interactions with FDR Control. (arXiv:2209.00077v1 [stat.ME])
    In many scenarios such as genome-wide association studies where dependences between variables commonly exist, it is often of interest to infer the interaction effects in the model. However, testing pairwise interactions among millions of variables in complex and high-dimensional data suffers from low statistical power and huge computational cost. To address these challenges, we propose a two-stage testing procedure with false discovery rate (FDR) control, which is known as a less conservative multiple-testing correction. Theoretically, the difficulty in the FDR control dues to the data dependence among test statistics in two stages, and the fact that the number of hypothesis tests conducted in the second stage depends on the screening result in the first stage. By using the Cram\'er type moderate deviation technique, we show that our procedure controls FDR at the desired level asymptotically in the generalized linear model (GLM), where the model is allowed to be misspecified. In addition, the asymptotic power of the FDR control procedure is rigorously established. We demonstrate via comprehensive simulation studies that our two-stage procedure is computationally more efficient than the classical BH procedure, with a comparable or improved statistical power. Finally, we apply the proposed method to a bladder cancer data from dbGaP where the scientific goal is to identify genetic susceptibility loci for bladder cancer.  ( 3 min )
    Unsupervised EHR-based Phenotyping via Matrix and Tensor Decompositions. (arXiv:2209.00322v1 [cs.LG])
    Computational phenotyping allows for unsupervised discovery of subgroups of patients as well as corresponding co-occurring medical conditions from electronic health records (EHR). Typically, EHR data contains demographic information, diagnoses and laboratory results. Discovering (novel) phenotypes has the potential to be of prognostic and therapeutic value. Providing medical practitioners with transparent and interpretable results is an important requirement and an essential part for advancing precision medicine. Low-rank data approximation methods such as matrix (e.g., non-negative matrix factorization) and tensor decompositions (e.g., CANDECOMP/PARAFAC) have demonstrated that they can provide such transparent and interpretable insights. Recent developments have adapted low-rank data approximation methods by incorporating different constraints and regularizations that facilitate interpretability further. In addition, they offer solutions for common challenges within EHR data such as high dimensionality, data sparsity and incompleteness. Especially extracting temporal phenotypes from longitudinal EHR has received much attention in recent years. In this paper, we provide a comprehensive review of low-rank approximation-based approaches for computational phenotyping. The existing literature is categorized into temporal vs. static phenotyping approaches based on matrix vs. tensor decompositions. Furthermore, we outline different approaches for the validation of phenotypes, i.e., the assessment of clinical significance.  ( 2 min )
    B\'ezier Gaussian Processes for Tall and Wide Data. (arXiv:2209.00343v1 [stat.ML])
    Modern approximations to Gaussian processes are suitable for "tall data", with a cost that scales well in the number of observations, but under-performs on ``wide data'', scaling poorly in the number of input features. That is, as the number of input features grows, good predictive performance requires the number of summarising variables, and their associated cost, to grow rapidly. We introduce a kernel that allows the number of summarising variables to grow exponentially with the number of input features, but requires only linear cost in both number of observations and input features. This scaling is achieved through our introduction of the B\'ezier buttress, which allows approximate inference without computing matrix inverses or determinants. We show that our kernel has close similarities to some of the most used kernels in Gaussian process regression, and empirically demonstrate the kernel's ability to scale to both tall and wide datasets.  ( 2 min )
    An evaluation framework for comparing causal inference models. (arXiv:2209.00115v1 [stat.ML])
    Estimation of causal effects is the core objective of many scientific disciplines. However, it remains a challenging task, especially when the effects are estimated from observational data. Recently, several promising machine learning models have been proposed for causal effect estimation. The evaluation of these models has been based on the mean values of the error of the Average Treatment Effect (ATE) as well as of the Precision in Estimation of Heterogeneous Effect (PEHE). In this paper, we propose to complement the evaluation of causal inference models using concrete statistical evidence, including the performance profiles of Dolan and Mor{\'e}, as well as non-parametric and post-hoc statistical tests. The main motivation behind this approach is the elimination of the influence of a small number of instances or simulation on the benchmarking process, which in some cases dominate the results. We use the proposed evaluation methodology to compare several state-of-the-art causal effect estimation models.  ( 2 min )
    A general framework for the analysis of kernel-based tests. (arXiv:2209.00124v1 [math.ST])
    Kernel-based tests provide a simple yet effective framework that use the theory of reproducing kernel Hilbert spaces to design non-parametric testing procedures. In this paper we propose new theoretical tools that can be used to study the asymptotic behaviour of kernel-based tests in several data scenarios, and in many different testing problems. Unlike current approaches, our methods avoid using lengthy $U$ and $V$ statistics expansions and limit theorems, that commonly appear in the literature, and works directly with random functionals on Hilbert spaces. Therefore, our framework leads to a much simpler and clean analysis of kernel tests, only requiring mild regularity conditions. Furthermore, we show that, in general, our analysis cannot be improved by proving that the regularity conditions required by our methods are both sufficient and necessary. To illustrate the effectiveness of our approach we present a new kernel-test for the conditional independence testing problem, as well as new analyses for already known kernel-based tests.  ( 2 min )
    The Geometry and Calculus of Losses. (arXiv:2209.00238v1 [cs.LG])
    Statistical decision problems are the foundation of statistical machine learning. The simplest problems are binary and multiclass classification and class probability estimation. Central to their definition is the choice of loss function, which is the means by which the quality of a solution is evaluated. In this paper we systematically develop the theory of loss functions for such problems from a novel perspective whose basic ingredients are convex sets with a particular structure. The loss function is defined as the subgradient of the support function of the convex set. It is consequently automatically proper (calibrated for probability estimation). This perspective provides three novel opportunities. It enables the development of a fundamental relationship between losses and (anti)-norms that appears to have not been noticed before. Second, it enables the development of a calculus of losses induced by the calculus of convex sets which allows the interpolation between different losses, and thus is a potential useful design tool for tailoring losses to particular problems. In doing this we build upon, and considerably extend, existing results on M-sums of convex sets. Third, the perspective leads to a natural theory of `polar' (or `inverse') loss functions, which are derived from the polar dual of the convex set defining the loss, and which form a natural universal substitution function for Vovk's aggregating algorithm.  ( 3 min )
    The Infinitesimal Jackknife and Combinations of Models. (arXiv:2209.00147v1 [stat.ML])
    The Infinitesimal Jackknife is a general method for estimating variances of parametric models, and more recently also for some ensemble methods. In this paper we extend the Infinitesimal Jackknife to estimate the covariance between any two models. This can be used to quantify uncertainty for combinations of models, or to construct test statistics for comparing different models or ensembles of models fitted using the same training dataset. Specific examples in this paper use boosted combinations of models like random forests and M-estimators. We also investigate its application on neural networks and ensembles of XGBoost models. We illustrate the efficacy of variance estimates through extensive simulations and its application to the Beijing Housing data, and demonstrate the theoretical consistency of the Infinitesimal Jackknife covariance estimate.  ( 2 min )
    CPS Attack Detection under Limited Local Information in Cyber Security: A Multi-node Multi-class Classification Ensemble Approach. (arXiv:2209.00170v1 [cs.CR])
    Cybersecurity breaches are the common anomalies for distributed cyber-physical systems (CPS). However, the cyber security breach classification is still a difficult problem, even using cutting-edge artificial intelligence (AI) approaches. In this paper, we study the multi-class classification problem in cyber security for attack detection. A challenging multi-node data-censoring case is considered. In such a case, data within each data center/node cannot be shared while the local data is incomplete. Particularly, local nodes contain only a part of the multiple classes. In order to train a global multi-class classifier without sharing the raw data across all nodes, the main result of our study is designing a multi-node multi-class classification ensemble approach. By gathering the estimated parameters of the binary classifiers and data densities from each local node, the missing information for each local node is completed to build the global multi-class classifier. Numerical experiments are given to validate the effectiveness of the proposed approach under the multi-node data-censoring case. Under such a case, we even show the out-performance of the proposed approach over the full-data approach.  ( 3 min )
  • Open

    Square root mnemonics
    Here’s a cute little poem: I wish I knew The root of two. Oh charmed was he To know root three. So we now strive To find root five. The beginning of each stanza is a mnemonic for the number mentioned in the following line. √ 2 = 1.414 √ 3 = 1.732 √ 5 […] Square root mnemonics first appeared on John D. Cook.  ( 4 min )

  • Open

    [N] Check out this talk by Nvidia's Self-Driving Architect Siddha Ganju: "How citizen scientists can tackle floods and disaster relief!"
    Hi, this talk looks like another good example of how we can help prevent climate change. It looks like there are 25 other amazing talks too at PyBay2022, happening in-person in San Francisco AND ONLINE, NEXT SATURDAY Sept 10. https://pybay.com/talklist/ submitted by /u/g_law [link] [comments]  ( 106 min )
    [D] Senior research scientist at GoogleAI, Negar Rostamzadeh: “Can't believe Stable Diffusion is out there for public use and that's considered as ‘ok’!!!”
    What do you all think? Is the solution of keeping it all for internal use, like Imagen, or having a controlled API like Dall-E 2 a better solution? Source: https://twitter.com/negar_rz/status/1565089741808500736 submitted by /u/wei_jok [link] [comments]  ( 94 min )
    [project] using Picsart and neural blender to make an edit. Results are below
    submitted by /u/Ok-Psychology-7318 [link] [comments]  ( 88 min )
    [D] What is Your suggestions??
    Hello. I am new in this community. I have been learning Python for six months. I want to learn the basics of machine learning and work in this field. Which resources I can follow such as books, courses? submitted by /u/Successful-Ad-1522 [link] [comments]  ( 88 min )
    [D] A/B Testing Classification Problems
    Hello, Currently working on a classification problem where we are predicting whether an event will happen for an imbalanced target variable. We are using the F1 score as the metric for evaluation. If I wanted to conduct an A/B test between two different models, how would we actually conduct this test? Are there any recommended resources I could read to understand this better? submitted by /u/martin1285 [link] [comments]  ( 89 min )
    [D] From which DL paper did you learn a lot?
    I started trying to implement and read in detail some DL papers. I was reading the paper "Deep Image Matting" and found it quite insightful. I was wondering if you guys know papers that are worth reading/re-implementing because of how well they are written and the information they contain? submitted by /u/jthat92 [link] [comments]  ( 89 min )
    [P] Hey folks! I am just overwhelmed that my AI project was selected by Arm for their upcoming AI Tech Talk.
    Hey folks! I am just overwhelmed that my AI project was selected by Arm for their upcoming AI Tech Talk. I will present my solution for farmers that helps them avoid fake poor-quality agrochemicals on Sep 20th, at 8:00 AM PT. This is my first webinar of such scale and I would appreciate if you support my project by joining me: https://armltd.zoom.us/webinar/register/4016582497950/WN_fJE_6UT_Q1GCfygq8Ll4tw Hope to see you! submitted by /u/wasteguru [link] [comments]  ( 89 min )
    [P] Deploying Transformers with Apple's Core ML
    Excited to share that my Hugging Face colleague Matthijs Hollemans has built a nifty library that lets you convert models like BERT, GPT-2 and ViT to Apple's Core ML format with a *single line of code* 🤯 https://preview.redd.it/iaerro3bw8l91.png?width=3080&format=png&auto=webp&s=cb39dd7782c304287af492bb22af5f3c5f164799 Core ML is Apple's software library for fast on-device model inference with neural networks and other types of machine learning models. It can be used on macOS, iOS, tvOS, and watchOS, and is optimized for using the CPU, GPU, and Apple Neural Engine. Give it a try and leave a ⭐️ if you find it useful 👉: https://github.com/huggingface/exporters I leave you with Stable Diffusion's interpretation of what the end result looks like 😁 https://preview.redd.it/zjwe1e1nw8l91.png?width=1420&format=png&auto=webp&s=bb2c761884633ced550a702fd7e1647aac969e25 submitted by /u/lewtun [link] [comments]  ( 103 min )
    [P] Tutorial on using LLVM to JIT PyTorch graphs to native code (x86/arm/RISC-V)
    Just made a tutorial on how to create a simple JIT compiler for PyTorch graphs using LLVM for those who might be interested in compilers for ML. I did a small detour talking about PyTorch's NNC and how to generate native code for different architectures such as x86, ARM and RISC-V for those who are interested here is the link. submitted by /u/perone [link] [comments]  ( 105 min )
    [News] Bosch, Intel and Edgeless Systems have collaborated on AI scalable pipeline to process sensitive data from connected vehicles.
    Excellent case study regarding the collection and processing of large amounts of data while complying with GDPR in the autonomous driving industry: https://www.youtube.com/watch?v=DPeqEeSKTYA&t=22s submitted by /u/laramontoyalaske [link] [comments]  ( 103 min )
    [D] Diffusion models - why is the reverse process also Gaussian?
    Denoising Diffusion Probabilistic Models states that if the diffusion step sizes are small enough, the reverse process is also Gaussian. They cite Deep Unsupervised Learning using Nonequilibrium Thermodynamics, and they in turn cite ON THE THEORY OF STOCHASTIC PROCESSES, WITH PARTICULAR REFERENCE TO APPLICATIONS to support this statement. This paper is somewhat physics oriented so I don't quite get why the statement is true. Can someone help me with this? Why is it that for small enough diffusion step sizes, the reverse process is also Gaussian? How could I prove that this is true? submitted by /u/dineNshine [link] [comments]  ( 90 min )
    [D] Reading Group: NLP against a very large web corpus
    ​ https://preview.redd.it/uou5710bi7l91.png?width=1200&format=png&auto=webp&s=5006534121c9b2f25b002c3a500929f616507ee8 More info at https://outsystems-ai-reading-group.github.io/ submitted by /u/JClub [link] [comments]  ( 88 min )
    [D] The pains of a lengthy review process
    My colleges and I submitted a manuscript to one of the Hindawi journals (Q2). The paper is about applying new machine learning approaches to file carving (a computer forensics field). We submitted the manuscript in January and got rejected recently (after 8 long months) mainly for not citing recent work, which was literally published in the same month we submitted out manuscript (or even after). I’m not writing this to rant about the journal or the unfairness of the long review time. But rather I’m asking for help with the following issue I frequently face: I work hard to write up a manuscript that has competing results, submit it to a decent journal , wait for months or sometimes over a year just to get rejected. After that long wait, obviously the manuscript is no longer showing the latest state of the art and is useless - thus no other journal will publish it since in the mean time new better results surface. If I send the manuscript to multiple journals in the same time and they somehow find out, I get blacklisted and will probably never be able to publish again. When I look at some of the papers in decent journals (Q2 and Q1) I’m often astound that someone actually managed to publish that low quality of work… What am I missing???? submitted by /u/sillyscienceguy [link] [comments]  ( 91 min )
    [D] Paper lost and found: Is there ever a paper about a method called "cultural evolution"?
    I vaguely remember from the depth of my head once reading about a method roughly named cultural evolution. The method is a low-communication distributed training strategy, where you do distributed data parallel without gradient all-reduce. Instead, a small proportion of data in the batch is fed to every worker, and the outputs for these shared examples are synchronized across workers. (Not all-reduced, so each worker gets a copy of the outputs for every other worker.) Then, each worker is supervised to predict similar outputs on those shared examples to the averaged output of other workers by a consistency loss. Is this a real method ever proposed by someone? Is there a paper about it? Or did I hallucinate it and probably should actually try it myself? submitted by /u/CMS_Flash [link] [comments]  ( 90 min )
    [P]THUNET: A neural network framework supports multiple python versions.
    What is THUNET? A deep learning net/framework named "TsingHua University NET", short for "THUNET", is for non-commercial, educational, scientific purpose for the deep learning community. How to build a neural network with THUNET? Next, I will explain how to use THUNET to build a model to compute XOR operator. Tutorial-1: XOR operand The XOR operand is that: it tells two operands if they are different or not. Specifically, 1 xor 0 makes 1, and 0 xor 0 makes 0. What You Can Learn? In this tutorial, I am telling how to build a model to act as 'XOR' operator, which is a common problem in Computer Science. After finishing the lesson, readers should be able to create Activation, Loss, Layer, Scheduler and Optimizer objects, which are key components of the deep learning network. Readers wi…  ( 92 min )
    [P] We built a tool to auto generate image & object-detection datasets
    Hello! We developed a tool called flockfysh, a command-line tool that utilizes AI + web-scraping to auto-generate or auto-annotate image / object detection datasets. We strived to make it as simple as possible for users to (1) express what kind of data they want, (2) be as seamless as possible, from the people who want to customize lower-level settings to those who prefer a hands off approach, (3) write workflows that are as understandable as possible. You only need three things to run flockfysh - A small sample (~recommended minimum 50 images per label for best results, but definitely can go lower) images annotated in YOLOv5, enough for flockfysh to understand the kinds of images desired, in the train-val-test folder format (place all images in train) - A YAML file specifying the workflow you'd like (see repo for sample, easily adaptable workflows) Here's the link to the tool: https://github.com/teamnebulaco/flockfysh We'd love to hear some feedback on how we can improve the tool! :)) If there's any issues, please, please, pleeease help us out by opening an issue so that we can fix it for everyone. We're also looking for some collaborators to further upgrade the tool - if you're interested, please DM me! Thanks, and have an awesome day! :D submitted by /u/WeAreNebula [link] [comments]  ( 89 min )
    [R] A new large-scale image-text pair dataset just released!
    A new large-scale image-text pair dataset (747M) just released! Kakaobrain just released COYO-700M at github and huggingfact dataset: Github: https://github.com/kakaobrain/coyo-dataset Huggingface Dataset: https://huggingface.co/datasets/kakaobrain/coyo-700m COYO-700M is a large-scale dataset that contains 747M image-text pairs as well as many other meta-attributes to increase the usability to train various models. To evaluate the quality of the dataset, three popular models (ALIGN, unCLIP, and ViT) were trained on COYO-700M or its subsets from scratch. ​ https://preview.redd.it/m5leeyq2f5l91.png?width=1226&format=png&auto=webp&s=5931b1f68831b21ad7f35006eb91e41c609a9a17 This dataset is freely available for research projects. And, more details on the dataset and models could be found in the up-coming technical report! submitted by /u/Automatic-Car-285 [link] [comments]  ( 88 min )
    Fitting training data perfectly [D]
    Current wisdom is that, in practice, deep over-parametrised networks can converge to zero loss on the training data. However, on side projects I often don’t reach zero training loss. My nets often converge at some loss above zero. Given I only have limited compute, I usually only run Adam with a learning rate of 0.001 as I’ve had good experiences with that. Am I just not running enough experiments with sufficiently many different optimisation strategies and hyperparameter settings or has anyone else faced trouble reaching zero training loss reliably, easily and consistently across projects? submitted by /u/lemlo100 [link] [comments]  ( 106 min )
  • Open

    AI Dream 73 - Best Trip of your Life
    submitted by /u/LordPewPew777 [link] [comments]  ( 91 min )
    Recommendations for a crash course in data structures in preparation for a class on AI
    I have an introductory class on AI starting in a few days. The class is an upper division elective and is very project-heavy. The official prerequisite is just a year of discrete structures (so logic and probability). However, I took their equivalent courses from the math department and so my CS skills are a bit lacking. I’ve only taken intro to CS, which used Java. Most of my seniors recommend taking data structures and, if possible, algorithms before taking the class on AI. The professor said I definitely don’t need to have taken algorithms but I do need to be comfortable at programming and have basic knowledge of data structures. He also said that the class mostly builds off of discrete structures and said that me having taken their equivalents from the math department probably puts me in a good place in preparation for the mathematical/theory side of AI. However, I still wanna take the next few days to bridge the gap and learn a bit about data structures before the class starts. Hence, does anyone have any recommendations for a crash course in data structures or something that doable in a few days? I’m not looking for anything too in-depth, just the basics so that I have a working knowledge. There is no specific language requirement for the course but I hear Python will be the most useful. I have some experience with Python but not a lot. I’d appreciate any input or suggestions though! submitted by /u/mowa0199 [link] [comments]  ( 88 min )
    AI-Generated Bible Art - 1k + images generated by DALL·E 2
    submitted by /u/magenta_placenta [link] [comments]  ( 86 min )
    AI And The Limits Of Language | NOEMA
    submitted by /u/estasfuera [link] [comments]  ( 86 min )
    Why DeepMind’s breakthrough ML algorithm is so tricky to train
    submitted by /u/Ok-Craft-9908 [link] [comments]  ( 86 min )
    Model explainability for 🤗 Diffusers. Get explanations for your generated images with the latest stable diffusion model!
    https://github.com/JoaoLages/diffusers-interpret All images generated during the diffusion process for the phrase \"A cute corgi with the Eiffel Tower in the background\" Generated image for the phrase \"A cute corgi with the Eiffel Tower in the background\" Word importances for the selected region in the image above submitted by /u/JClub [link] [comments]  ( 87 min )
    Steamed Hams but every line is an AI Generated Image...
    submitted by /u/Wingman143 [link] [comments]  ( 87 min )
    I animated a video of my favorite contemporary drummer Greyson Nekrutman with AI.
    submitted by /u/nalr00n [link] [comments]  ( 87 min )
    AI Assistant to Reduce Decision Fatigue: +1 (213) 577-2447
    submitted by /u/SudoSharma [link] [comments]  ( 87 min )
    New Graphene Synapses Advance Future Technology | New Google Deepmind AI Makes Videos From A Photo
    submitted by /u/kenickh [link] [comments]  ( 93 min )
    Training Physics In 10 Seconds... (My first machine learner)
    submitted by /u/TheRPGGamerMan [link] [comments]  ( 93 min )
    Announcement of Fellows selected for Univ.AI’s MLOps Fellowship!!
    Greetings on behalf of team Univ.AI!! We are excited to have gotten a huge number of responses for our first public MLOps Fellowship offer. We are happy to announce that we have made our first round of selections. Please note that we have tended to choose those with intermediate ML skills or higher for this round of selections. For those of you whom we have not been able to select this time around, we will have future editions of this program, and we will check with you at the time if you still remain interested. If your name is on the list below, Congratulations! Dr. Pavlos Protopapus and Dr. Rahul Dave are pleased to welcome you to the GHF Fellows cohort of September 2022 for MLOps. The aggregate selection rate of the program is under 5% - among an applicant pool that comprises of extremely talented graduates and professionals from all around the world. This is a proud accomplishment and we hope that you’ll make the best of this opportunity. We have sent an email to all the selected candidates with the formal acceptance letter and further details about the fellowship. We wish you a great learning experience. And Congratulations again. Selected candidates: Brian Goss Johnson Mabgwe Nandakumar Raghu Nick Liu Chris Knorowski Manu Mukerji Joren Gijsbrechts Abhishek Maiti Khor Jia Wei Steven McFarland Barnard Hansel Guzman-Soto Igor Wolford Will Evonosky Colin Wood Gaurav Kumar Jain Alan Oursland submitted by /u/sudhanshu28 [link] [comments]  ( 88 min )
    AI generated trippy music video | "my search led me to VQGAN + CLIP (Vector Quantized Generative Adversarial Network and Contrastive Language–Image Pre-training), before settling more specifically on Disco Diffusion V5.2 Turbo"
    submitted by /u/Tao_Dragon [link] [comments]  ( 90 min )
    It is so cool! Only 3 lines of code can generate imaginative visuals and arts from texts
    Only 3 lines of code! That's all it takes to generate imaginative visuals and arts from texts. The newest PaddleHub supports ERNIE_ViLG (a SOTA text-to-image model with Chinese text prompts), Disco Diffusion and Stable Diffusion. GitHub Code: https://github.com/PaddlePaddle/PaddleHub Huggingface Space: https://huggingface.co/spaces/PaddlePaddle/ERNIE-ViLG ​ import paddlehub as hub module = hub.Module(name="ernie_vilg") results = module.generate_image(text_prompts=["巨大的白色城堡"]) # "巨大的白色城堡" means "Huge and white castles". https://preview.redd.it/urolilb9i7l91.png?width=912&format=png&auto=webp&s=edd50948689184779d72a87d29a99d0d0634ed99 import paddlehub as hub module = hub.Module(name=" stable_diffusion ") results = module.generate_image(text_prompts=[" close-up maximalist illustration of panther, by makoto shinkai, akihiko yoshida, yoshitaka amano, super detailed, hd wallpaper, digital art "]) https://preview.redd.it/majxoazai7l91.png?width=1024&format=png&auto=webp&s=54d53288396409ed3416b532206bce6ce207c554 submitted by /u/Aha_IamDaniel [link] [comments]  ( 90 min )
    DALL-E 2 has one million users, new feature rolls out
    submitted by /u/henlo_there_fren [link] [comments]  ( 87 min )
    The picture drawn by the image generation AI “Midjourney” took first place at the art competition and the human artist was furious
    submitted by /u/Sorin61 [link] [comments]  ( 90 min )
    Why do most AI image-generation models have such a similar distinctive look to them?
    I've been using AI models to generate images from text prompts for about a year or so now, and while some models like DALL·E 2 are able to make some really amazing images, like the van gogh or henri matisse imitations that look like real paintings. but the older models have an extremely specific look to them that I have no clue how to describe or replicate outside of using AI models. It's a kind of warbly artifact that happens, most often when you try to get them to generate a face or try to morph between two completely different images. Some examples are here, here, and here. I understand that the AI has limitations and isn't perfect, but why do the models generate such a distinctive looking artifact? Is there a way to recreate it without using AI models? submitted by /u/False-Fisherman [link] [comments]  ( 90 min )
    Someone should make AI that makes storyboards for movies with characters that do not change.
    I mean the AI can do marvelous job of dreaming things, but the problem is that you can not have consistent characters. Ideally for a movie storyboard you could first develop a character. "Women, brown haired, thirties, caucasian, some freckles, small nose." You know a little bit like character development in a game. But then, that you could have this character stay the same as you put her in new situation with other characters. So you could first fleshen out the characters, then use the AI to dream the storyboard for them. submitted by /u/aluode [link] [comments]  ( 87 min )
    What do you need for a job working on AI?
    Title basically. AI is fascinating to me and Id like to be involved with it in some way submitted by /u/CharmedBySnakes [link] [comments]  ( 88 min )
    The Ultimate List of AI/ML Startups
    submitted by /u/originalrumham [link] [comments]  ( 99 min )
    Panoptic scene graph generation (PSG) Explained - A New Challenging Task for AI
    submitted by /u/OnlyProggingForFun [link] [comments]  ( 87 min )
    Spellbook: GPT-3 for Legal Contract Drafting
    submitted by /u/subsun [link] [comments]  ( 87 min )
    Using AIs to create pics and backstory for a Dyslexic Bard
    I've been trying out DALL-E 2 (i got an invite within a few hours of asking) for character pics. I had an idea for a half orc bard with the trait that give you a bite attack and imagined he'd replaced his fangs for metal teeth. I was trying to get a pic of him singing in a tavern, but the pics all came out with the bard holding paper like he was writing the song. Turns out my dyslexia got the better of me and I typed 'signing', not 'singing'. Example 1 Example 2 Example 3 Example 4 I think I'll call him the Silver-Toothed Brad :) Then I asked GPT3 to give him an backstory: Prompt: Write a backstory for a dyslexic half-orc bard called The Silver-Toothed Brad Result: The Silver-Toothed Brad was born to a poor family of half-orcs in the slums of a large city. He was always different from the other children - he was smaller, weaker, and had trouble reading and writing. When he was old enough, he was sold into slavery and put to work in the city's mines. One day, while he was working, Brad heard a group of miners singing. He was transfixed by their voices and the way they brought the words to life. He begged the miners to teach him how to sing, and they took him under their wing. Brad quickly learned that he had a natural talent for singing. He was soon able to entertain the other miners with his stories and songs. When he was not working in the mines, he was performing in the city's taverns and inns. Despite his love for singing, Brad always felt like an outsider. He was never quite able to fit in with the other half-orcs. They teased him for his dyslexia and for his love of music. But Brad didn't let their teasing get to him. He continued to sing and entertain people with his stories. And he eventually found a group of friends who appreciated him for who he was. It's actually pretty stunning how good that is. submitted by /u/GershBinglander [link] [comments]  ( 89 min )
  • Open

    Is reward should be one?
    I want to clarify, main reward function is like R_total=w1r1+w2r2+… So is R_total should be 1 as sum of all rewards, or w1r1, w2r2 … should be =1 ? submitted by /u/IndependenceCivil576 [link] [comments]  ( 87 min )
    Why DeepMind’s breakthrough ML algorithm is so tricky to train
    submitted by /u/Ok-Craft-9908 [link] [comments]  ( 111 min )
    What are the dimensions of Q table if you are using Tile Coding ?
    submitted by /u/InformationOrnery194 [link] [comments]  ( 87 min )
  • Open

    Using machine learning to identify undiagnosable cancers
    A new model that maps developmental pathways to tumor cells may unlock the identity of cancers of unknown primary.  ( 7 min )
  • Open

    NVIDIA GTC Dives Into the Industrial Metaverse, Digital Twins
    This month’s NVIDIA GTC provides the best opportunity yet to learn how leading companies and their designers, planners and operators are using the industrial metaverse to create physically accurate, perfectly synchronized, AI-enabled digital twins. The global conference, which runs online Sept. 19-22, will focus in part on how NVIDIA Omniverse Enterprise enables companies to design Read article > The post NVIDIA GTC Dives Into the Industrial Metaverse, Digital Twins appeared first on NVIDIA Blog.  ( 5 min )
    GFN Thursday Slides Into September With 22 New Games
    We’d wake you up when September ends, but then you’d miss out on a whole new set of games coming to GeForce NOW. Gear up for 22 games joining the GeForce NOW library, with 19 day-and-date releases including action role-playing game Steelrising. Playing them all will take some serious strategy. And build the perfect Minifigure Read article > The post GFN Thursday Slides Into September With 22 New Games appeared first on NVIDIA Blog.  ( 6 min )
  • Open

    VM backup with Hyper-V Changed Block Tracking (CBT)
    When it comes to backup efficiency and copying only the needed data between a full backup and any incremental backups, changed block tracking technology is critical. We are very familiar with this in the VMware backup world as we have changed block tracking for a while now. However, with Hyper-V changed, block tracking is a technology that… Read More »VM backup with Hyper-V Changed Block Tracking (CBT) The post VM backup with Hyper-V Changed Block Tracking (CBT) appeared first on Data Science Central.  ( 21 min )
    15 Data Issues and How to Fix Them – Second Part
    In the first part here, I discussed missing, outdated and unobserved data, data that is costly to produce, as well as dirty, unbalanced and unstructured data. This second part deals with biased, inconsistent, siloed, too big or fast flowing data. I also cover security/privacy, data leakage and precision issues. Then, I address the case when… Read More »15 Data Issues and How to Fix Them – Second Part The post 15 Data Issues and How to Fix Them – Second Part appeared first on Data Science Central.  ( 22 min )
  • Open

    Learn How Amazon SageMaker Clarify Helps Detect Bias
    Bias detection in data and model outcomes is a fundamental requirement for building responsible artificial intelligence (AI) and machine learning (ML) models. Unfortunately, detecting bias isn’t an easy task for the vast majority of practitioners due to the large number of ways in which it can be measured and different factors that can contribute to […]  ( 14 min )
  • Open

    New Graphene Synapses Advance Future Technology | New Google Deepmind AI Makes Videos From A Photo
    submitted by /u/kenickh [link] [comments]  ( 100 min )
    Announcement of Fellows selected for Univ.AI’s MLOps Fellowship!!
    Greetings on behalf of team Univ.AI!! We are excited to have gotten a huge number of responses for our first public MLOps Fellowship offer. We are happy to announce that we have made our first round of selections. Please note that we have tended to choose those with intermediate ML skills or higher for this round of selections. For those of you whom we have not been able to select this time around, we will have future editions of this program, and we will check with you at the time if you still remain interested. If your name is on the list below, Congratulations! Dr. Pavlos Protopapus and Dr. Rahul Dave are pleased to welcome you to the GHF Fellows cohort of September 2022 for MLOps. The aggregate selection rate of the program is under 5% - among an applicant pool that comprises of extremely talented graduates and professionals from all around the world. This is a proud accomplishment and we hope that you’ll make the best of this opportunity. We have sent an email to all the selected candidates with the formal acceptance letter and further details about the fellowship. We wish you a great learning experience. And Congratulations again. Selected candidates: Brian Goss Johnson Mabgwe Nandakumar Raghu Nick Liu Chris Knorowski Manu Mukerji Joren Gijsbrechts Abhishek Maiti Khor Jia Wei Steven McFarland Barnard Hansel Guzman-Soto Igor Wolford Will Evonosky Colin Wood Gaurav Kumar Jain Alan Oursland submitted by /u/sudhanshu28 [link] [comments]  ( 88 min )
    First try with instance segmentation using lightning-flash
    dataset I have an issue with the number of instances: only one mask https://dip4fish.blogspot.com/2022/09/train-and-prediction-with-lightning.html submitted by /u/jean-pat [link] [comments]  ( 97 min )
  • Open

    Don’t Just Create Plots in Python, Go One Step Beyond !
    Show your visualizations in a web based layout using plotly Dash Continue reading on Becoming Human: Artificial Intelligence Magazine »  ( 12 min )
  • Open

    Runs of letters and digits in passwords
    If a password of length n is made up of lower case letters and digits, with each symbol appearing with equal probability, how many times would you expect to shift between letters and digits when typing the password? If you’re typing the password on a mobile device, how many times would you change between alphabetic […] Runs of letters and digits in passwords first appeared on John D. Cook.  ( 5 min )
  • Open

    Efficient CDF Approximations for Normalizing Flows. (arXiv:2202.11322v2 [cs.LG] UPDATED)
    Normalizing flows model a complex target distribution in terms of a bijective transform operating on a simple base distribution. As such, they enable tractable computation of a number of important statistical quantities, particularly likelihoods and samples. Despite these appealing properties, the computation of more complex inference tasks, such as the cumulative distribution function (CDF) over a complex region (e.g., a polytope) remains challenging. Traditional CDF approximations using Monte-Carlo techniques are unbiased but have unbounded variance and low sample efficiency. Instead, we build upon the diffeomorphic properties of normalizing flows and leverage the divergence theorem to estimate the CDF over a closed region in target space in terms of the flux across its \emph{boundary}, as induced by the normalizing flow. We describe both deterministic and stochastic instances of this estimator: while the deterministic variant iteratively improves the estimate by strategically subdividing the boundary, the stochastic variant provides unbiased estimates. Our experiments on popular flow architectures and UCI benchmark datasets show a marked improvement in sample efficiency as compared to traditional estimators.  ( 2 min )
    A Prescriptive Learning Analytics Framework: Beyond Predictive Modelling and onto Explainable AI with Prescriptive Analytics. (arXiv:2208.14582v1 [cs.LG])
    A significant body of recent research in the field of Learning Analytics has focused on leveraging machine learning approaches for predicting at-risk students in order to initiate timely interventions and thereby elevate retention and completion rates. The overarching feature of the majority of these research studies has been on the science of prediction only. The component of predictive analytics concerned with interpreting the internals of the models and explaining their predictions for individual cases to stakeholders has largely been neglected. Additionally, works that attempt to employ data-driven prescriptive analytics to automatically generate evidence-based remedial advice for at-risk learners are in their infancy. eXplainable AI is a field that has recently emerged providing cutting-edge tools which support transparent predictive analytics and techniques for generating tailored advice for at-risk students. This study proposes a novel framework that unifies both transparent machine learning as well as techniques for enabling prescriptive analytics. This work practically demonstrates the proposed framework using predictive models for identifying at-risk learners of programme non-completion. The study then further demonstrates how predictive modelling can be augmented with prescriptive analytics on two case studies in order to generate human-readable prescriptive feedback for those who are at risk.
    Constraining Representations Yields Models That Know What They Don't Know. (arXiv:2208.14488v1 [cs.LG])
    A well-known failure mode of neural networks corresponds to high confidence erroneous predictions, especially for data that somehow differs from the training distribution. Such an unsafe behaviour limits their applicability. To counter that, we show that models offering accurate confidence levels can be defined via adding constraints in their internal representations. That is, we encode class labels as fixed unique binary vectors, or class codes, and use those to enforce class-dependent activation patterns throughout the model. Resulting predictors are dubbed Total Activation Classifiers (TAC), and TAC is used as an additional component to a base classifier to indicate how reliable a prediction is. Given a data instance, TAC slices intermediate representations into disjoint sets and reduces such slices into scalars, yielding activation profiles. During training, activation profiles are pushed towards the code assigned to a given training instance. At testing time, one can predict the class corresponding to the code that best matches the activation profile of an example. Empirically, we observe that the resemblance between activation patterns and their corresponding codes results in an inexpensive unsupervised approach for inducing discriminative confidence scores. Namely, we show that TAC is at least as good as state-of-the-art confidence scores extracted from existing models, while strictly improving the model's value on the rejection setting. TAC was also observed to work well on multiple types of architectures and data modalities.
    MBI-Net: A Non-Intrusive Multi-Branched Speech Intelligibility Prediction Model for Hearing Aids. (arXiv:2204.03305v2 [eess.AS] UPDATED)
    Improving the user's hearing ability to understand speech in noisy environments is critical to the development of hearing aid (HA) devices. For this, it is important to derive a metric that can fairly predict speech intelligibility for HA users. A straightforward approach is to conduct a subjective listening test and use the test results as an evaluation metric. However, conducting large-scale listening tests is time-consuming and expensive. Therefore, several evaluation metrics were derived as surrogates for subjective listening test results. In this study, we propose a multi-branched speech intelligibility prediction model (MBI-Net), for predicting the subjective intelligibility scores of HA users. MBI-Net consists of two branches of models, with each branch consisting of a hearing loss model, a cross-domain feature extraction module, and a speech intelligibility prediction model, to process speech signals from one channel. The outputs of the two branches are fused through a linear layer to obtain predicted speech intelligibility scores. Experimental results confirm the effectiveness of MBI-Net, which produces higher prediction scores than the baseline system in Track 1 and Track 2 on the Clarity Prediction Challenge 2022 dataset.
    Annotated Dataset Creation through General Purpose Language Models for non-English Medical NLP. (arXiv:2208.14493v1 [cs.CL])
    Obtaining text datasets with semantic annotations is an effortful process, yet crucial for supervised training in natural language processsing (NLP). In general, developing and applying new NLP pipelines in domain-specific contexts for tasks often requires custom designed datasets to address NLP tasks in supervised machine learning fashion. When operating in non-English languages for medical data processing, this exposes several minor and major, interconnected problems such as lack of task-matching datasets as well as task-specific pre-trained models. In our work we suggest to leverage pretrained language models for training data acquisition in order to retrieve sufficiently large datasets for training smaller and more efficient models for use-case specific tasks. To demonstrate the effectiveness of your approach, we create a custom dataset which we use to train a medical NER model for German texts, GPTNERMED, yet our method remains language-independent in principle. Our obtained dataset as well as our pre-trained models are publicly available at: https://github.com/frankkramer-lab/GPTNERMED
    Truncated Matrix Power Iteration for Differentiable DAG Learning. (arXiv:2208.14571v1 [cs.LG])
    Recovering underlying Directed Acyclic Graph structures (DAG) from observational data is highly challenging due to the combinatorial nature of the DAG-constrained optimization problem. Recently, DAG learning has been cast as a continuous optimization problem by characterizing the DAG constraint as a smooth equality one, generally based on polynomials over adjacency matrices. Existing methods place very small coefficients on high-order polynomial terms for stabilization, since they argue that large coefficients on the higher-order terms are harmful due to numeric exploding. On the contrary, we discover that large coefficients on higher-order terms are beneficial for DAG learning, when the spectral radiuses of the adjacency matrices are small, and that larger coefficients for higher-order terms can approximate the DAG constraints much better than the small counterparts. Based on this, we propose a novel DAG learning method with efficient truncated matrix power iteration to approximate geometric series-based DAG constraints. Empirically, our DAG learning method outperforms the previous state-of-the-arts in various settings, often by a factor of 3 or more in terms of structural Hamming distance.
    Modeling Soft-Failure Evolution for Triggering Timely Repair with Low QoT Margins. (arXiv:2208.14535v1 [cs.LG])
    In this work, the capabilities of an encoder-decoder learning framework are leveraged to predict soft-failure evolution over a long future horizon. This enables the triggering of timely repair actions with low quality-of-transmission (QoT) margins before a costly hard-failure occurs, ultimately reducing the frequency of repair actions and associated operational expenses. Specifically, it is shown that the proposed scheme is capable of triggering a repair action several days prior to the expected day of a hard-failure, contrary to soft-failure detection schemes utilizing rule-based fixed QoT margins, that may lead either to premature repair actions (i.e., several months before the event of a hard-failure) or to repair actions that are taken too late (i.e., after the hard failure has occurred). Both frameworks are evaluated and compared for a lightpath established in an elastic optical network, where soft-failure evolution can be modeled by analyzing bit-error-rate information monitored at the coherent receivers.
    Do language models make human-like predictions about the coreferents of Italian anaphoric zero pronouns?. (arXiv:2208.14554v1 [cs.CL])
    Some languages allow arguments to be omitted in certain contexts. Yet human language comprehenders reliably infer the intended referents of these zero pronouns, in part because they construct expectations about which referents are more likely. We ask whether Neural Language Models also extract the same expectations. We test whether 12 contemporary language models display expectations that reflect human behavior when exposed to sentences with zero pronouns from five behavioral experiments conducted in Italian by Carminati (2005). We find that three models - XGLM 2.9B, 4.5B, and 7.5B - capture the human behavior from all the experiments, with others successfully modeling some of the results. This result suggests that human expectations about coreference can be derived from exposure to language, and also indicates features of language models that allow them to better reflect human behavior.
    Lifelong Learning for Question Answering with Hierarchical Prompts. (arXiv:2208.14602v1 [cs.CL])
    QA models with lifelong learning (LL) abilities are important for practical QA applications, and architecture-based LL methods are reported to be an effective implementation for these models. However, it is non-trivial to extend previous approaches to QA tasks since they either require access to task identities in the testing phase or do not explicitly model samples from unseen tasks. In this paper, we propose Diana: a dynamic architecture-based lifelong QA model that tries to learn a sequence of QA tasks with a prompt enhanced language model. Four types of hierarchically organized prompts are used in Diana to capture QA knowledge from different granularities. Specifically, we dedicate task-level prompts to capture task-specific knowledge to retain high LL performances and maintain instance-level prompts to learn knowledge shared across different input samples to improve the model's generalization performance. Moreover, we dedicate separate prompts to explicitly model unseen tasks and introduce a set of prompt key vectors to facilitate knowledge sharing between tasks. Extensive experiments demonstrate that Diana outperforms state-of-the-art lifelong QA models, especially in handling unseen tasks.
    Nonparametric and Online Change Detection in Multivariate Datastreams using QuantTree. (arXiv:2208.14801v1 [cs.LG])
    We address the problem of online change detection in multivariate datastreams, and we introduce QuantTree Exponentially Weighted Moving Average (QT-EWMA), a nonparametric change-detection algorithm that can control the expected time before a false alarm, yielding a desired Average Run Length (ARL$_0$). Controlling false alarms is crucial in many applications and is rarely guaranteed by online change-detection algorithms that can monitor multivariate datastreams without knowing the data distribution. Like many change-detection algorithms, QT-EWMA builds a model of the data distribution, in our case a QuantTree histogram, from a stationary training set. To monitor datastreams even when the training set is extremely small, we propose QT-EWMA-update, which incrementally updates the QuantTree histogram during monitoring, always keeping the ARL$_0$ under control. Our experiments, performed on synthetic and real-world datastreams, demonstrate that QT-EWMA and QT-EWMA-update control the ARL$_0$ and the false alarm rate better than state-of-the-art methods operating in similar conditions, achieving lower or comparable detection delays.  ( 2 min )
    Dual Representation Learning for One-Step Clustering of Multi-View Data. (arXiv:2208.14450v1 [cs.LG])
    Multi-view data are commonly encountered in data mining applications. Effective extraction of information from multi-view data requires specific design of clustering methods to cater for data with multiple views, which is non-trivial and challenging. In this paper, we propose a novel one-step multi-view clustering method by exploiting the dual representation of both the common and specific information of different views. The motivation originates from the rationale that multi-view data contain not only the consistent knowledge between views but also the unique knowledge of each view. Meanwhile, to make the representation learning more specific to the clustering task, a one-step learning framework is proposed to integrate representation learning and clustering partition as a whole. With this framework, the representation learning and clustering partition mutually benefit each other, which effectively improve the clustering performance. Results from extensive experiments conducted on benchmark multi-view datasets clearly demonstrate the superiority of the proposed method.
    Model-Based Reinforcement Learning with SINDy. (arXiv:2208.14501v1 [cs.LG])
    We draw on the latest advancements in the physics community to propose a novel method for discovering the governing non-linear dynamics of physical systems in reinforcement learning (RL). We establish that this method is capable of discovering the underlying dynamics using significantly fewer trajectories (as little as one rollout with $\leq 30$ time steps) than state of the art model learning algorithms. Further, the technique learns a model that is accurate enough to induce near-optimal policies given significantly fewer trajectories than those required by model-free algorithms. It brings the benefits of model-based RL without requiring a model to be developed in advance, for systems that have physics-based dynamics. To establish the validity and applicability of this algorithm, we conduct experiments on four classic control tasks. We found that an optimal policy trained on the discovered dynamics of the underlying system can generalize well. Further, the learned policy performs well when deployed on the actual physical system, thus bridging the model to real system gap. We further compare our method to state-of-the-art model-based and model-free approaches, and show that our method requires fewer trajectories sampled on the true physical system compared other methods. Additionally, we explored approximate dynamics models and found that they also can perform well.
    Non-readily identifiable data collaboration analysis for multiple datasets including personal information. (arXiv:2208.14611v1 [cs.LG])
    Multi-source data fusion, in which multiple data sources are jointly analyzed to obtain improved information, has considerable research attention. For the datasets of multiple medical institutions, data confidentiality and cross-institutional communication are critical. In such cases, data collaboration (DC) analysis by sharing dimensionality-reduced intermediate representations without iterative cross-institutional communications may be appropriate. Identifiability of the shared data is essential when analyzing data including personal information. In this study, the identifiability of the DC analysis is investigated. The results reveals that the shared intermediate representations are readily identifiable to the original data for supervised learning. This study then proposes a non-readily identifiable DC analysis only sharing non-readily identifiable data for multiple medical datasets including personal information. The proposed method solves identifiability concerns based on a random sample permutation, the concept of interpretable DC analysis, and usage of functions that cannot be reconstructed. In numerical experiments on medical datasets, the proposed method exhibits a non-readily identifiability while maintaining a high recognition performance of the conventional DC analysis. For a hospital dataset, the proposed method exhibits a nine percentage point improvement regarding the recognition performance over the local analysis that uses only local dataset.
    Listen to your heart: A self-supervised approach for detecting murmur in heart-beat sounds for the Physionet 2022 challenge. (arXiv:2208.14845v1 [cs.LG])
    Heart murmurs are abnormal sounds present in heartbeats, caused by turbulent blood flow through the heart. The PhysioNet 2022 challenge targets automatic detection of murmur from audio recordings of the heart and automatic detection of normal vs. abnormal clinical outcome. The recordings are captured from multiple locations around the heart. Our participation investigates the effectiveness of self-supervised learning for murmur detection. We evaluate the use of a backbone CNN, whose layers are trained in a self-supervised way with data from both this year's and the 2016 challenge. We use two different augmentations on each training sample, and normalized temperature-scaled cross-entropy loss. We experiment with different augmentations to learn effective phonocardiogram representations. To build the final detectors we train two classification heads, one for each challenge task. We present evaluation results for all combinations of the available augmentations, and for our multiple-augmentation approach.  ( 2 min )
    LINKS: A dataset of a hundred million planar linkage mechanisms for data-driven kinematic design. (arXiv:2208.14567v1 [cs.LG])
    In this paper, we introduce LINKS, a dataset of 100 million one degree of freedom planar linkage mechanisms and 1.1 billion coupler curves, which is more than 1000 times larger than any existing database of planar mechanisms and is not limited to specific kinds of mechanisms such as four-bars, six-bars, \etc which are typically what most databases include. LINKS is made up of various components including 100 million mechanisms, the simulation data for each mechanism, normalized paths generated by each mechanism, a curated set of paths, the code used to generate the data and simulate mechanisms, and a live web demo for interactive design of linkage mechanisms. The curated paths are provided as a measure for removing biases in the paths generated by mechanisms that enable a more even design space representation. In this paper, we discuss the details of how we can generate such a large dataset and how we can overcome major issues with such scales. To be able to generate such a large dataset we introduce a new operator to generate 1-DOF mechanism topologies, furthermore, we take many steps to speed up slow simulations of mechanisms by vectorizing our simulations and parallelizing our simulator on a large number of threads, which leads to a simulation 800 times faster than the simple simulation algorithm. This is necessary given on average, 1 out of 500 candidates that are generated are valid~(and all must be simulated to determine their validity), which means billions of simulations must be performed for the generation of this dataset. Then we demonstrate the depth of our dataset through a bi-directional chamfer distance-based shape retrieval study where we show how our dataset can be used directly to find mechanisms that can trace paths very close to desired target paths.
    Protea: Client Profiling within Federated Systems using Flower. (arXiv:2207.01053v2 [cs.LG] UPDATED)
    Federated Learning (FL) has emerged as a prospective solution that facilitates the training of a high-performing centralised model without compromising the privacy of users. While successful, research is currently limited by the possibility of establishing a realistic large-scale FL system at the early stages of experimentation. Simulation can help accelerate this process. To facilitate efficient scalable FL simulation of heterogeneous clients, we design and implement Protea, a flexible and lightweight client profiling component within federated systems using the FL framework Flower. It allows automatically collecting system-level statistics and estimating the resources needed for each client, thus running the simulation in a resource-aware fashion. The results show that our design successfully increases parallelism for 1.66 $\times$ faster wall-clock time and 2.6$\times$ better GPU utilisation, which enables large-scale experiments on heterogeneous clients.
    Embedding Functional Data: Multidimensional Scaling and Manifold Learning. (arXiv:2208.14540v1 [math.ST])
    We adapt concepts, methodology, and theory originally developed in the areas of multidimensional scaling and dimensionality reduction for multivariate data to the functional setting. We focus on classical scaling and Isomap -- prototypical methods that have played important roles in these area -- and showcase their use in the context of functional data analysis. In the process, we highlight the crucial role that the ambient metric plays.
    MTI-Net: A Multi-Target Speech Intelligibility Prediction Model. (arXiv:2204.03310v2 [eess.AS] UPDATED)
    Recently, deep learning (DL)-based non-intrusive speech assessment models have attracted great attention. Many studies report that these DL-based models yield satisfactory assessment performance and good flexibility, but their performance in unseen environments remains a challenge. Furthermore, compared to quality scores, fewer studies elaborate deep learning models to estimate intelligibility scores. This study proposes a multi-task speech intelligibility prediction model, called MTI-Net, for simultaneously predicting human and machine intelligibility measures. Specifically, given a speech utterance, MTI-Net is designed to predict human subjective listening test results and word error rate (WER) scores. We also investigate several methods that can improve the prediction performance of MTI-Net. First, we compare different features (including low-level features and embeddings from self-supervised learning (SSL) models) and prediction targets of MTI-Net. Second, we explore the effect of transfer learning and multi-tasking learning on training MTI-Net. Finally, we examine the potential advantages of fine-tuning SSL embeddings. Experimental results demonstrate the effectiveness of using cross-domain features, multi-task learning, and fine-tuning SSL embeddings. Furthermore, it is confirmed that the intelligibility and WER scores predicted by MTI-Net are highly correlated with the ground-truth scores.
    Exploration in Linear Bandits with Rich Action Sets and its Implications for Inference. (arXiv:2207.11597v2 [cs.LG] UPDATED)
    We present a non-asymptotic lower bound on the eigenspectrum of the design matrix generated by any linear bandit algorithm with sub-linear regret when the action set has well-behaved curvature. Specifically, we show that the minimum eigenvalue of the expected design matrix grows as $\Omega(\sqrt{n})$ whenever the expected cumulative regret of the algorithm is $O(\sqrt{n})$, where $n$ is the learning horizon, and the action-space has a constant Hessian around the optimal arm. This shows that such action-spaces force a polynomial lower bound rather than a logarithmic lower bound, as shown by \cite{lattimore2017end}, in discrete (i.e., well-separated) action spaces. Furthermore, while the previous result is shown to hold only in the asymptotic regime (as $n \to \infty$), our result for these "locally rich" action spaces is any-time. Additionally, under a mild technical assumption, we obtain a similar lower bound on the minimum eigen value holding with high probability. We apply our result to two practical scenarios -- \emph{model selection} and \emph{clustering} in linear bandits. For model selection, we show that an epoch-based linear bandit algorithm adapts to the true model complexity at a rate exponential in the number of epochs, by virtue of our novel spectral bound. For clustering, we consider a multi agent framework where we show, by leveraging the spectral result, that no forced exploration is necessary -- the agents can run a linear bandit algorithm and estimate their underlying parameters at once, and hence incur a low regret.
    A Learning-Based 3D EIT Image Reconstruction Method. (arXiv:2208.14449v1 [eess.IV])
    Deep learning has been widely employed to solve the Electrical Impedance Tomography (EIT) image reconstruction problem. Most existing physical model-based and learning-based approaches focus on 2D EIT image reconstruction. However, when they are directly extended to the 3D domain, the reconstruction performance in terms of image quality and noise robustness is hardly guaranteed mainly due to the significant increase in dimensionality. This paper presents a learning-based approach for 3D EIT image reconstruction, which is named Transposed convolution with Neurons Network (TN-Net). Simulation and experimental results show the superior performance and generalization ability of TN-Net compared with prevailing 3D EIT image reconstruction algorithms.
    Membership Inference Attacks by Exploiting Loss Trajectory. (arXiv:2208.14933v1 [cs.CR])
    Machine learning models are vulnerable to membership inference attacks in which an adversary aims to predict whether or not a particular sample was contained in the target model's training dataset. Existing attack methods have commonly exploited the output information (mostly, losses) solely from the given target model. As a result, in practical scenarios where both the member and non-member samples yield similarly small losses, these methods are naturally unable to differentiate between them. To address this limitation, in this paper, we propose a new attack method, called \system, which can exploit the membership information from the whole training process of the target model for improving the attack performance. To mount the attack in the common black-box setting, we leverage knowledge distillation, and represent the membership information by the losses evaluated on a sequence of intermediate models at different distillation epochs, namely \emph{distilled loss trajectory}, together with the loss from the given target model. Experimental results over different datasets and model architectures demonstrate the great advantage of our attack in terms of different metrics. For example, on CINIC-10, our attack achieves at least 6$\times$ higher true-positive rate at a low false-positive rate of 0.1\% than existing methods. Further analysis demonstrates the general effectiveness of our attack in more strict scenarios.  ( 2 min )
    Uncertainty Quantification and Experimental Design for Large-Scale Linear Inverse Problems under Gaussian Process Priors. (arXiv:2109.03457v4 [stat.ML] UPDATED)
    We consider the use of Gaussian process (GP) priors for solving inverse problems in a Bayesian framework. As is well known, the computational complexity of GPs scales cubically in the number of datapoints. We here show that in the context of inverse problems involving integral operators, one faces additional difficulties that hinder inversion on large grids. Furthermore, in that context, covariance matrices can become too large to be stored. By leveraging results about sequential disintegrations of Gaussian measures, we are able to introduce an implicit representation of posterior covariance matrices that reduces the memory footprint by only storing low rank intermediate matrices, while allowing individual elements to be accessed on-the-fly without needing to build full posterior covariance matrices. Moreover, it allows for fast sequential inclusion of new observations. These features are crucial when considering sequential experimental design tasks. We demonstrate our approach by computing sequential data collection plans for excursion set recovery for a gravimetric inverse problem, where the goal is to provide fine resolution estimates of high density regions inside the Stromboli volcano, Italy. Sequential data collection plans are computed by extending the weighted integrated variance reduction (wIVR) criterion to inverse problems. Our results show that this criterion is able to significantly reduce the uncertainty on the excursion volume, reaching close to minimal levels of residual uncertainty. Overall, our techniques allow the advantages of probabilistic models to be brought to bear on large-scale inverse problems arising in the natural sciences.  ( 3 min )
    Efficient Sparsely Activated Transformers. (arXiv:2208.14580v1 [cs.LG])
    Transformer-based neural networks have achieved state-of-the-art task performance in a number of machine learning domains including natural language processing and computer vision. To further improve their accuracy, recent work has explored the integration of dynamic behavior into these networks in the form of mixture-of-expert (MoE) layers. In this paper, we explore the introduction of MoE layers to optimize a different metric: inference latency. We introduce a novel system named PLANER that takes an existing Transformer-based network and a user-defined latency target and produces an optimized, sparsely-activated version of the original network that tries to meet the latency target while maintaining baseline accuracy. We evaluate PLANER on two real-world language modeling tasks using the Transformer-XL network and achieve inference latency reductions of over 2x at iso-accuracy.
    AMOS: A Large-Scale Abdominal Multi-Organ Benchmark for Versatile Medical Image Segmentation. (arXiv:2206.08023v2 [eess.IV] UPDATED)
    Despite the considerable progress in automatic abdominal multi-organ segmentation from CT/MRI scans in recent years, a comprehensive evaluation of the models' capabilities is hampered by the lack of a large-scale benchmark from diverse clinical scenarios. Constraint by the high cost of collecting and labeling 3D medical data, most of the deep learning models to date are driven by datasets with a limited number of organs of interest or samples, which still limits the power of modern deep models and makes it difficult to provide a fully comprehensive and fair estimate of various methods. To mitigate the limitations, we present AMOS, a large-scale, diverse, clinical dataset for abdominal organ segmentation. AMOS provides 500 CT and 100 MRI scans collected from multi-center, multi-vendor, multi-modality, multi-phase, multi-disease patients, each with voxel-level annotations of 15 abdominal organs, providing challenging examples and test-bed for studying robust segmentation algorithms under diverse targets and scenarios. We further benchmark several state-of-the-art medical segmentation models to evaluate the status of the existing methods on this new challenging dataset. We have made our datasets, benchmark servers, and baselines publicly available, and hope to inspire future research. Information can be found at https://amos22.grand-challenge.org.
    BioSLAM: A Bio-inspired Lifelong Memory System for General Place Recognition. (arXiv:2208.14543v1 [cs.RO])
    We present BioSLAM, a lifelong SLAM framework for learning various new appearances incrementally and maintaining accurate place recognition for previously visited areas. Unlike humans, artificial neural networks suffer from catastrophic forgetting and may forget the previously visited areas when trained with new arrivals. For humans, researchers discover that there exists a memory replay mechanism in the brain to keep the neuron active for previous events. Inspired by this discovery, BioSLAM designs a gated generative replay to control the robot's learning behavior based on the feedback rewards. Specifically, BioSLAM provides a novel dual-memory mechanism for maintenance: 1) a dynamic memory to efficiently learn new observations and 2) a static memory to balance new-old knowledge. When combined with a visual-/LiDAR- based SLAM system, the complete processing pipeline can help the agent incrementally update the place recognition ability, robust to the increasing complexity of long-term place recognition. We demonstrate BioSLAM in two incremental SLAM scenarios. In the first scenario, a LiDAR-based agent continuously travels through a city-scale environment with a 120km trajectory and encounters different types of 3D geometries (open streets, residential areas, commercial buildings). We show that BioSLAM can incrementally update the agent's place recognition ability and outperform the state-of-the-art incremental approach, Generative Replay, by 24%. In the second scenario, a LiDAR-vision-based agent repeatedly travels through a campus-scale area on a 4.5km trajectory. BioSLAM can guarantee the place recognition accuracy to outperform 15\% over the state-of-the-art approaches under different appearances. To our knowledge, BioSLAM is the first memory-enhanced lifelong SLAM system to help incremental place recognition in long-term navigation tasks.
    Orthonormal Expansions for Translation-Invariant Kernels. (arXiv:2206.08648v2 [math.CA] UPDATED)
    We present a general Fourier analytic technique for constructing orthonormal basis expansions of translation-invariant kernels from orthonormal bases of $\mathscr{L}_2(\mathbb{R})$. This allows us to derive explicit expansions on the real line for (i) Mat\'ern kernels of all half-integer orders in terms of associated Laguerre functions, (ii) the Cauchy kernel in terms of rational functions, and (iii) the Gaussian kernel in terms of Hermite functions.
    Foundation Posteriors for Approximate Probabilistic Inference. (arXiv:2205.09735v2 [cs.LG] UPDATED)
    Probabilistic programs provide an expressive representation language for generative models. Given a probabilistic program, we are interested in the task of posterior inference: estimating a latent variable given a set of observed variables. Existing techniques for inference in probabilistic programs often require choosing many hyper-parameters, are computationally expensive, and/or only work for restricted classes of programs. Here we formulate inference as masked language modeling: given a program, we generate a supervised dataset of variables and assignments, and randomly mask a subset of the assignments. We then train a neural network to unmask the random values, defining an approximate posterior distribution. By optimizing a single neural network across a range of programs we amortize the cost of training, yielding a "foundation" posterior able to do zero-shot inference for new programs. The foundation posterior can also be fine-tuned for a particular program and dataset by optimizing a variational inference objective. We show the efficacy of the approach, zero-shot and fine-tuned, on a benchmark of STAN programs.
    Karaoker: Alignment-free singing voice synthesis with speech training data. (arXiv:2204.04127v2 [eess.AS] UPDATED)
    Existing singing voice synthesis models (SVS) are usually trained on singing data and depend on either error-prone time-alignment and duration features or explicit music score information. In this paper, we propose Karaoker, a multispeaker Tacotron-based model conditioned on voice characteristic features that is trained exclusively on spoken data without requiring time-alignments. Karaoker synthesizes singing voice and transfers style following a multi-dimensional template extracted from a source waveform of an unseen singer/speaker. The model is jointly conditioned with a single deep convolutional encoder on continuous data including pitch, intensity, harmonicity, formants, cepstral peak prominence and octaves. We extend the text-to-speech training objective with feature reconstruction, classification and speaker identification tasks that guide the model to an accurate result. In addition to multitasking, we also employ a Wasserstein GAN training scheme as well as new losses on the acoustic model's output to further refine the quality of the model.
    Efficient Learning of Interpretable Classification Rules. (arXiv:2205.06936v2 [cs.LG] UPDATED)
    Machine learning has become omnipresent with applications in various safety-critical domains such as medical, law, and transportation. In these domains, high-stake decisions provided by machine learning necessitate researchers to design interpretable models, where the prediction is understandable to a human. In interpretable machine learning, rule-based classifiers are particularly effective in representing the decision boundary through a set of rules comprising input features. The interpretability of rule-based classifiers is in general related to the size of the rules, where smaller rules are considered more interpretable. To learn such a classifier, the brute-force direct approach is to consider an optimization problem that tries to learn the smallest classification rule that has close to maximum accuracy. This optimization problem is computationally intractable due to its combinatorial nature and thus, the problem is not scalable in large datasets. To this end, in this paper we study the triangular relationship among the accuracy, interpretability, and scalability of learning rule-based classifiers. The contribution of this paper is an interpretable learning framework IMLI, that is based on maximum satisfiability (MaxSAT) for synthesizing classification rules expressible in proposition logic. Despite the progress of MaxSAT solving in the last decade, the straightforward MaxSAT-based solution cannot scale. Therefore, we incorporate an efficient incremental learning technique inside the MaxSAT formulation by integrating mini-batch learning and iterative rule-learning. In our experiments, IMLI achieves the best balance among prediction accuracy, interpretability, and scalability. As an application, we deploy IMLI in learning popular interpretable classifiers such as decision lists and decision sets.
    Deep Quality Estimation: Creating Surrogate Models for Human Quality Ratings. (arXiv:2205.10355v2 [cs.CV] UPDATED)
    Human ratings are abstract representations of segmentation quality. To approximate human quality ratings on scarce expert data, we train surrogate quality estimation models. We evaluate on a complex multi-class segmentation problem, specifically glioma segmentation, following the BraTS annotation protocol. The training data features quality ratings from 15 expert neuroradiologists on a scale ranging from 1 to 6 stars for various computer-generated and manual 3D annotations. Even though the networks operate on 2D images and with scarce training data, we can approximate segmentation quality within a margin of error comparable to human intra-rater reliability. Segmentation quality prediction has broad applications. While an understanding of segmentation quality is imperative for successful clinical translation of automatic segmentation quality algorithms, it can play an essential role in training new segmentation models. Due to the split-second inference times, it can be directly applied within a loss function or as a fully-automatic dataset curation mechanism in a federated learning setting.
    Hybrid Artifact Detection System for Minute Resolution Blood Pressure Signals from ICU. (arXiv:2203.05947v2 [eess.SP] UPDATED)
    Physiological monitoring in intensive care units (ICU) generates data that can be used in clinical research. However, the recording conditions in clinical settings limit the automated extraction of relevant information from physiological signals due to noise and artifacts. Therefore, removing artifacts before clinical research is essential. Manual annotation by experienced researchers, which is the gold standard for removing artifacts, is time-consuming and costly due to the volume of the data generated in the ICU. In this study, we propose a hybrid artifact detection system that combines a Variational Autoencoder with a statistical detection component for the labeling of artifactual samples to automate the costly process of cleaning physiological recordings. The system is applied to minute-by-minute mean blood pressure signals from an intensive care unit dataset. Its performance is verified by manual annotations made by an expert. We benchmark the performance of our system with two other systems that combine an ARIMA or an autoencoder-based model with our statistical detection component. Our results indicate that the system consistently achieves sensitivity and specificity levels of over 90%. Thus, it provides an initial foundation to automate data cleaning in recordings from ICU.
    Spatial-Temporal Interactive Dynamic Graph Convolution Network for Traffic Forecasting. (arXiv:2205.08689v4 [cs.LG] UPDATED)
    Accurate traffic forecasting is essential for smart cities to achieve traffic control, route planning, and flow detection. Although many spatial-temporal methods are currently proposed, these methods are deficient in capturing the spatial-temporal dependence of traffic data synchronously. In addition, most of the methods ignore the dynamically changing correlations between road network nodes that arise as traffic data changes. We propose a neural network-based Spatial-Temporal Interactive Dynamic Graph Convolutional Network (STIDGCN) to address the above challenges for traffic forecasting. Specifically, we propose an interactive dynamic graph convolution structure, which divides the sequences at intervals and synchronously captures the traffic data's spatial-temporal dependence through an interactive learning strategy. The interactive learning strategy makes STIDGCN effective for long-term prediction. We also propose a novel dynamic graph convolution module to capture the dynamically changing correlations in the traffic network, consisting of a graph generator and fusion graph convolution. The dynamic graph convolution module can use the input traffic data and pre-defined graph structure to generate a graph structure. It is then fused with the defined adaptive adjacency matrix to generate a dynamic adjacency matrix, which fills the pre-defined graph structure and simulates the generation of dynamic associations between nodes in the road network. Extensive experiments on four real-world traffic flow datasets demonstrate that STIDGCN outperforms the state-of-the-art baseline.
    Input Invex Neural Network. (arXiv:2106.08748v2 [cs.LG] UPDATED)
    Connected decision boundaries are used in different areas like image segmentation, clustering, alpha-shape or defining a region in nD-space. However, methods for generating such connected decision boundaries using neural networks are lacking in the machine learning literature. While exploring such methods, we found that such decision boundaries can be generated by thresholding a special kind of function called an invex function. We find a connection between invex functions and the connectedness of regions and manifolds, and we apply the connectedness and locality as a foundation for interpreting the nD-data-space. In this paper, we present two methods for constructing invex function using neural networks. The first one is based on intuitions developed visually and constraining the function using our method (Gradient Clipped-Gradient Penalty). The second one is based on later findings on the relationship of invex function to the composition of invertible and convex functions. Using connectedness as a basic interpretation method, we create connected region based classifiers. We show that multiple connected set based classifiers can approximate any classification function. In the experiments section, we first use the invex function for regression and classification tasks to visualize the global optimality and connected set in 2D toy datasets. Furthermore, we use our methods for classification tasks using an ensemble of models as well as using a single model on larger-scale datasets. The experiments show that connected set based classifiers do not have a significant disadvantage over ordinary neural network classifiers. We also evaluate various properties of invex function and connected sets. The overall exploration of this work suggests that invex function is fundamental to understanding and applying locality and connectedness of input space which is useful for multiple tasks.  ( 3 min )
    Positive-Unlabeled Learning with Uncertainty-aware Pseudo-label Selection. (arXiv:2201.13192v2 [stat.ML] UPDATED)
    Positive-unlabeled (PU) learning aims at learning a binary classifier from only positive and unlabeled training data. Recent approaches addressed this problem via cost-sensitive learning by developing unbiased loss functions, and their performance was later improved by iterative pseudo-labeling solutions. However, such two-step procedures are vulnerable to incorrectly estimated pseudo-labels, as errors are propagated in later iterations when a new model is trained on erroneous predictions. To prevent such confirmation bias, we propose PUUPL, a novel loss-agnostic training procedure for PU learning that incorporates epistemic uncertainty in pseudo-label selection. By using an ensemble of neural networks and assigning pseudo-labels based on low-uncertainty predictions, we show that PUUPL improves the reliability of pseudo-labels, increasing the predictive performance of our method and leading to new state-of-the-art results in self-training for PU learning. With extensive experiments, we show the effectiveness of our method over different datasets, modalities, and learning tasks, as well as improved calibration, robustness over prior misspecifications, biased positive data, and imbalanced datasets.
    Cooperative Online Learning in Stochastic and Adversarial MDPs. (arXiv:2201.13170v2 [cs.LG] UPDATED)
    We study cooperative online learning in stochastic and adversarial Markov decision process (MDP). That is, in each episode, $m$ agents interact with an MDP simultaneously and share information in order to minimize their individual regret. We consider environments with two types of randomness: \emph{fresh} -- where each agent's trajectory is sampled i.i.d, and \emph{non-fresh} -- where the realization is shared by all agents (but each agent's trajectory is also affected by its own actions). More precisely, with non-fresh randomness the realization of every cost and transition is fixed at the start of each episode, and agents that take the same action in the same state at the same time observe the same cost and next state. We thoroughly analyze all relevant settings, highlight the challenges and differences between the models, and prove nearly-matching regret lower and upper bounds. To our knowledge, we are the first to consider cooperative reinforcement learning (RL) with either non-fresh randomness or in adversarial MDPs.
    Minimax Optimal Quantization of Linear Models: Information-Theoretic Limits and Efficient Algorithms. (arXiv:2202.11277v2 [cs.IT] UPDATED)
    High-dimensional models often have a large memory footprint and must be quantized after training before being deployed on resource-constrained edge devices for inference tasks. In this work, we develop an information-theoretic framework for the problem of quantizing a linear regressor learned from training data $(\mathbf{X}, \mathbf{y})$, for some underlying statistical relationship $\mathbf{y} = \mathbf{X}\boldsymbol{\theta} + \mathbf{v}$. The learned model, which is an estimate of the latent parameter $\boldsymbol{\theta} \in \mathbb{R}^d$, is constrained to be representable using only $Bd$ bits, where $B \in (0, \infty)$ is a pre-specified budget and $d$ is the dimension. We derive an information-theoretic lower bound for the minimax risk under this setting and propose a matching upper bound using randomized embedding-based algorithms which is tight up to constant factors. The lower and upper bounds together characterize the minimum threshold bit-budget required to achieve a performance risk comparable to the unquantized setting. We also propose randomized Hadamard embeddings that are computationally efficient and are optimal up to a mild logarithmic factor of the lower bound. Our model quantization strategy can be generalized and we show its efficacy by extending the method and upper-bounds to two-layer ReLU neural networks for non-linear regression. Numerical simulations show the improved performance of our proposed scheme as well as its closeness to the lower bound.
    Multimodal Learning on Graphs for Disease Relation Extraction. (arXiv:2203.08893v2 [cs.LG] UPDATED)
    Objective: Disease knowledge graphs are a way to connect, organize, and access disparate information about diseases with numerous benefits for artificial intelligence (AI). To create knowledge graphs, it is necessary to extract knowledge from multimodal datasets in the form of relationships between disease concepts and normalize both concepts and relationship types. Methods: We introduce REMAP, a multimodal approach for disease relation extraction and classification. The REMAP machine learning approach jointly embeds a partial, incomplete knowledge graph and a medical language dataset into a compact latent vector space, followed by aligning the multimodal embeddings for optimal disease relation extraction. Results: We apply REMAP approach to a disease knowledge graph with 96,913 relations and a text dataset of 1.24 million sentences. On a dataset annotated by human experts, REMAP improves text-based disease relation extraction by 10.0% (accuracy) and 17.2% (F1-score) by fusing disease knowledge graphs with text information. Further, REMAP leverages text information to recommend new relationships in the knowledge graph, outperforming graph-based methods by 8.4% (accuracy) and 10.4% (F1-score). Conclusion: REMAP is a multimodal approach for extracting and classifying disease relationships by fusing structured knowledge and text information. REMAP provides a flexible neural architecture to easily find, access, and validate AI-driven relationships between disease concepts.
    Automatic event detection in football using tracking data. (arXiv:2202.00804v3 [cs.LG] UPDATED)
    One of the main shortcomings of event data in football, which has been extensively used for analytics in the recent years, is that it still requires manual collection, thus limiting its availability to a reduced number of tournaments. In this work, we propose a deterministic decision tree-based algorithm to automatically extract football events using tracking data, which consists of two steps: (1) a possession step that evaluates which player was in possession of the ball at each frame in the tracking data, as well as the distinct player configurations during the time intervals where the ball is not in play to inform set piece detection; (2) an event detection step that combines the changes in ball possession computed in the first step with the laws of football to determine in-game events and set pieces. The automatically generated events are benchmarked against manually annotated events and we show that in most event categories the proposed methodology achieves $+90\%$ detection rate across different tournaments and tracking data providers. Finally, we demonstrate how the contextual information offered by tracking data can be leveraged to increase the granularity of auto-detected events, and exhibit how the proposed framework may be used to conduct a myriad of data analyses in football.
    CryoAI: Amortized Inference of Poses for Ab Initio Reconstruction of 3D Molecular Volumes from Real Cryo-EM Images. (arXiv:2203.08138v4 [cs.CV] UPDATED)
    Cryo-electron microscopy (cryo-EM) has become a tool of fundamental importance in structural biology, helping us understand the basic building blocks of life. The algorithmic challenge of cryo-EM is to jointly estimate the unknown 3D poses and the 3D electron scattering potential of a biomolecule from millions of extremely noisy 2D images. Existing reconstruction algorithms, however, cannot easily keep pace with the rapidly growing size of cryo-EM datasets due to their high computational and memory cost. We introduce cryoAI, an ab initio reconstruction algorithm for homogeneous conformations that uses direct gradient-based optimization of particle poses and the electron scattering potential from single-particle cryo-EM data. CryoAI combines a learned encoder that predicts the poses of each particle image with a physics-based decoder to aggregate each particle image into an implicit representation of the scattering potential volume. This volume is stored in the Fourier domain for computational efficiency and leverages a modern coordinate network architecture for memory efficiency. Combined with a symmetrized loss function, this framework achieves results of a quality on par with state-of-the-art cryo-EM solvers for both simulated and experimental data, one order of magnitude faster for large datasets and with significantly lower memory requirements than existing methods.
    GCNET: graph-based prediction of stock price movement using graph convolutional network. (arXiv:2203.11091v2 [q-fin.TR] UPDATED)
    The importance of considering related stocks data for the prediction of stock price movement has been shown in many studies, however, advanced graphical techniques for modeling, embedding and analyzing the behavior of interrelated stocks have not been widely exploited for the prediction of stocks price movements yet. The main challenges in this domain are to find a way for modeling the existing relations among an arbitrary set of stocks and to exploit such a model for improving the prediction performance for those stocks. The most of existing methods in this domain rely on basic graph-analysis techniques, with limited prediction power, and suffer from a lack of generality and flexibility. In this paper, we introduce a novel framework, called GCNET that models the relations among an arbitrary set of stocks as a graph structure called influence network and uses a set of history-based prediction models to infer plausible initial labels for a subset of the stock nodes in the graph. Finally, GCNET uses the Graph Convolutional Network algorithm to analyze this partially labeled graph and predicts the next price direction of movement for each stock in the graph. GCNET is a general prediction framework that can be applied for the prediction of the price fluctuations of interacting stocks based on their historical data. Our experiments and evaluations on a set of stocks from the NASDAQ index demonstrate that GCNET significantly improves the performance of SOTA in terms of accuracy and MCC measures.
    ANCER: Anisotropic Certification via Sample-wise Volume Maximization. (arXiv:2107.04570v4 [cs.LG] UPDATED)
    Randomized smoothing has recently emerged as an effective tool that enables certification of deep neural network classifiers at scale. All prior art on randomized smoothing has focused on isotropic $\ell_p$ certification, which has the advantage of yielding certificates that can be easily compared among isotropic methods via $\ell_p$-norm radius. However, isotropic certification limits the region that can be certified around an input to worst-case adversaries, i.e., it cannot reason about other "close", potentially large, constant prediction safe regions. To alleviate this issue, (i) we theoretically extend the isotropic randomized smoothing $\ell_1$ and $\ell_2$ certificates to their generalized anisotropic counterparts following a simplified analysis. Moreover, (ii) we propose evaluation metrics allowing for the comparison of general certificates - a certificate is superior to another if it certifies a superset region - with the quantification of each certificate through the volume of the certified region. We introduce ANCER, a framework for obtaining anisotropic certificates for a given test set sample via volume maximization. We achieve it by generalizing memory-based certification of data-dependent classifiers. Our empirical results demonstrate that ANCER achieves state-of-the-art $\ell_1$ and $\ell_2$ certified accuracy on CIFAR-10 and ImageNet in the data-dependence setting, while certifying larger regions in terms of volume, highlighting the benefits of moving away from isotropic analysis. Our code is available in https://github.com/MotasemAlfarra/ANCER.
    Multidimensional Persistence Module Classification via Lattice-Theoretic Convolutions. (arXiv:2011.14057v2 [math.AT] UPDATED)
    Multiparameter persistent homology has been largely neglected as an input to machine learning algorithms. We consider the use of lattice-based convolutional neural network layers as a tool for the analysis of features arising from multiparameter persistence modules. We find that these show promise as an alternative to convolutions for the classification of multidimensional persistence modules.
    Robust Stability of Neural Network-controlled Nonlinear Systems with Parametric Variability. (arXiv:2109.05710v4 [cs.LG] UPDATED)
    Stability certification and identifying a safe and stabilizing initial set are two important concerns in ensuring operational safety, stability, and robustness of dynamical systems. With the advent of machine-learning tools, these issues need to be addressed for the systems with machine-learned components in the feedback loop. To develop a general theory for stability and stabilizability of a neural network (NN)-controlled nonlinear system subject to bounded parametric variation, a Lyapunov-based stability certificate is proposed and is further used to devise a maximal Lipschitz bound for the NN controller, and also a corresponding maximal region-of-attraction (RoA) inside a given safe operating domain. To compute such a robustly stabilizing NN controller that also maximizes the system's long-run utility, a stability-guaranteed training (SGT) algorithm is proposed. The effectiveness of the proposed framework is validated through an illustrative example.
    The Complexity of Learning Approval-Based Multiwinner Voting Rules. (arXiv:2110.00254v2 [cs.GT] UPDATED)
    We study the {PAC} learnability of multiwinner voting, focusing on the class of approval-based committee scoring (ABCS) rules. These are voting rules applied on profiles with approval ballots, where each voter approves some of the candidates. According to ABCS rules, each committee of $k$ candidates collects from each voter a score, that depends on the size of the voter's ballot and on the size of its intersection with the committee. Then, committees of maximum score are the winning ones. Our goal is to learn a target rule (i.e., to learn the corresponding scoring function) using information about the winning committees of a small number of sampled profiles. Despite the existence of exponentially many outcomes compared to single-winner elections, we show that the sample complexity is still low: a polynomial number of samples carries enough information for learning the target rule with high confidence and accuracy. Unfortunately, even simple tasks that need to be solved for learning from these samples are intractable. We prove that deciding whether there exists some ABCS rule that makes a given committee winning in a given profile is a computationally hard problem. Our results extend to the class of sequential Thiele rules, which have received attention recently due to their simplicity.
    Multivariate Analysis for Multiple Network Data via Semi-Symmetric Tensor PCA. (arXiv:2202.04719v2 [stat.ML] UPDATED)
    Network data are commonly collected in a variety of applications, representing either directly measured or statistically inferred connections between features of interest. In an increasing number of domains, these networks are collected over time, such as interactions between users of a social media platform on different days, or across multiple subjects, such as in multi-subject studies of brain connectivity. When analyzing multiple large networks, dimensionality reduction techniques are often used to embed networks in a more tractable low-dimensional space. To this end, we develop a framework for principal components analysis (PCA) on collections of networks via a specialized tensor decomposition we term Semi-Symmetric Tensor PCA or SS-TPCA. We derive computationally efficient algorithms for computing our proposed SS-TPCA decomposition and establish statistical efficiency of our approach under a standard low-rank signal plus noise model. Remarkably, we show that SS-TPCA achieves the same estimation accuracy as classical matrix PCA, with error proportional to the square root of the number of vertices in the network and not the number of edges as might be expected. Our framework inherits many of the strengths of classical PCA and is suitable for a wide range of unsupervised learning tasks, including identifying principal networks, isolating meaningful changepoints or outlying observations, and for characterizing the "variability network" of the most varying edges. Finally, we demonstrate the effectiveness of our proposal on simulated data and on an example from empirical legal studies. The techniques used to establish our main consistency results are surprisingly straightforward and may find use in a variety of other network analysis problems.
    Online POI Recommendation: Learning Dynamic Geo-Human Interactions in Streams. (arXiv:2201.10983v2 [cs.IR] UPDATED)
    In this paper, we focus on the problem of modeling dynamic geo-human interactions in streams for online POI recommendations. Specifically, we formulate the in-stream geo-human interaction modeling problem into a novel deep interactive reinforcement learning framework, where an agent is a recommender and an action is a next POI to visit. We uniquely model the reinforcement learning environment as a joint and connected composition of users and geospatial contexts (POIs, POI categories, functional zones). An event that a user visits a POI in stream updates the states of both users and geospatial contexts; the agent perceives the updated environment state to make online recommendations. Specifically, we model a mixed-user event stream by unifying all users, visits, and geospatial contexts as a dynamic knowledge graph stream, in order to model human-human, geo-human, geo-geo interactions. We design an exit mechanism to address the expired information challenge, devise a meta-path method to address the recommendation candidate generation challenge, and develop a new deep policy network structure to address the varying action space challenge, and, finally, propose an effective adversarial training method for optimization. Finally, we present extensive experiments to demonstrate the enhanced performance of our method.
    Learning Stochastic Graph Neural Networks with Constrained Variance. (arXiv:2201.12611v2 [eess.SP] UPDATED)
    Stochastic graph neural networks (SGNNs) are information processing architectures that learn representations from data over random graphs. SGNNs are trained with respect to the expected performance, which comes with no guarantee about deviations of particular output realizations around the optimal expectation. To overcome this issue, we propose a variance-constrained optimization problem for SGNNs, balancing the expected performance and the stochastic deviation. An alternating primal-dual learning procedure is undertaken that solves the problem by updating the SGNN parameters with gradient descent and the dual variable with gradient ascent. To characterize the explicit effect of the variance-constrained learning, we conduct a theoretical analysis on the variance of the SGNN output and identify a trade-off between the stochastic robustness and the discrimination power. We further analyze the duality gap of the variance-constrained optimization problem and the converging behavior of the primal-dual learning procedure. The former indicates the optimality loss induced by the dual transformation and the latter characterizes the limiting error of the iterative algorithm, both of which guarantee the performance of the variance-constrained learning. Through numerical simulations, we corroborate our theoretical findings and observe a strong expected performance with a controllable standard deviation.
    Learning Tree Structures from Leaves For Particle Decay Reconstruction. (arXiv:2208.14924v1 [physics.comp-ph])
    In this work, we present a neural approach to reconstructing rooted tree graphs describing hierarchical interactions, using a novel representation we term the Lowest Common Ancestor Generations (LCAG) matrix. This compact formulation is equivalent to the adjacency matrix, but enables learning a tree's structure from its leaves alone without the prior assumptions required if using the adjacency matrix directly. Employing the LCAG therefore enables the first end-to-end trainable solution which learns the hierarchical structure of varying tree sizes directly, using only the terminal tree leaves to do so. In the case of high-energy particle physics, a particle decay forms a hierarchical tree structure of which only the final products can be observed experimentally, and the large combinatorial space of possible trees makes an analytic solution intractable. We demonstrate the use of the LCAG as a target in the task of predicting simulated particle physics decay structures using both a Transformer encoder and a Neural Relational Inference encoder Graph Neural Network. With this approach, we are able to correctly predict the LCAG purely from leaf features for a maximum tree-depth of $8$ in $92.5\%$ of cases for trees up to $6$ leaves (including) and $59.7\%$ for trees up to $10$ in our simulated dataset.  ( 3 min )
    Batch-Size Independent Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms or Independent Arms. (arXiv:2208.14837v1 [cs.LG])
    In this paper, we study the combinatorial semi-bandits (CMAB) and focus on reducing the dependency of the batch-size $K$ in the regret bound, where $K$ is the total number of arms that can be pulled or triggered in each round. First, for the setting of CMAB with probabilistically triggered arms (CMAB-T), we discover a novel (directional) triggering probability and variance modulated (TPVM) condition that can replace the previously-used smoothness condition for various applications, such as cascading bandits, online network exploration and online influence maximization. Under this new condition, we propose a BCUCB-T algorithm with variance-aware confidence intervals and conduct regret analysis which reduces the $O(K)$ factor to $O(\log K)$ or $O(\log^2 K)$ in the regret bound, significantly improving the regret bounds for the above applications. Second, for the setting of non-triggering CMAB with independent arms, we propose a SESCB algorithm which leverages on the non-triggering version of the TPVM condition and completely removes the dependency on $K$ in the leading regret. As a valuable by-product, the regret analysis used in this paper can improve several existing results by a factor of $O(\log K)$. Finally, experimental evaluations show our superior performance compared with benchmark algorithms in different applications.  ( 3 min )
    Cell-Free Latent Go-Explore. (arXiv:2208.14928v1 [cs.LG])
    In this paper, we introduce Latent Go-Explore (LGE), a simple and general approach based on the Go-Explore paradigm for exploration in reinforcement learning (RL). Go-Explore was initially introduced with a strong domain knowledge constraint for partitioning the state space into cells. However, in most real-world scenarios, drawing domain knowledge from raw observations is complex and tedious. If the cell partitioning is not informative enough, Go-Explore can completely fail to explore the environment. We argue that the Go-Explore approach can be generalized to any environment without domain knowledge and without cells by exploiting a learned latent representation. Thus, we show that LGE can be flexibly combined with any strategy for learning a latent representation. We show that LGE, although simpler than Go-Explore, is more robust and outperforms all state-of-the-art algorithms in terms of pure exploration on multiple hard-exploration environments. The LGE implementation is available as open-source at https://github.com/qgallouedec/lge.  ( 2 min )
    Fast Convex Optimization for Two-Layer ReLU Networks: Equivalent Model Classes and Cone Decompositions. (arXiv:2202.01331v3 [cs.LG] UPDATED)
    We develop fast algorithms and robust software for convex optimization of two-layer neural networks with ReLU activation functions. Our work leverages a convex reformulation of the standard weight-decay penalized training problem as a set of group-$\ell_1$-regularized data-local models, where locality is enforced by polyhedral cone constraints. In the special case of zero-regularization, we show that this problem is exactly equivalent to unconstrained optimization of a convex "gated ReLU" network. For problems with non-zero regularization, we show that convex gated ReLU models obtain data-dependent approximation bounds for the ReLU training problem. To optimize the convex reformulations, we develop an accelerated proximal gradient method and a practical augmented Lagrangian solver. We show that these approaches are faster than standard training heuristics for the non-convex problem, such as SGD, and outperform commercial interior-point solvers. Experimentally, we verify our theoretical results, explore the group-$\ell_1$ regularization path, and scale convex optimization for neural networks to image classification on MNIST and CIFAR-10.
    Discovering Conservation Laws using Optimal Transport and Manifold Learning. (arXiv:2208.14995v1 [physics.comp-ph])
    Conservation laws are key theoretical and practical tools for understanding, characterizing, and modeling nonlinear dynamical systems. However, for many complex dynamical systems, the corresponding conserved quantities are difficult to identify, making it hard to analyze their dynamics and build efficient, stable predictive models. Current approaches for discovering conservation laws often depend on detailed dynamical information, such as the equation of motion or fine-grained time measurements, with many recent proposals also relying on black box parametric deep learning methods. We instead reformulate this task as a manifold learning problem and propose a non-parametric approach, combining the Wasserstein metric from optimal transport with diffusion maps, to discover conserved quantities that vary across trajectories sampled from a dynamical system. We test this new approach on a variety of physical systems$\unicode{x2014}$including conservative Hamiltonian systems, dissipative systems, and spatiotemporal systems$\unicode{x2014}$and demonstrate that our manifold learning method is able to both identify the number of conserved quantities and extract their values. Using tools from optimal transport theory and manifold learning, our proposed method provides a direct geometric approach to identifying conservation laws that is both robust and interpretable without requiring an explicit model of the system nor accurate time information.  ( 3 min )
    Sparsification of the regularized magnetic Laplacian with multi-type spanning forests. (arXiv:2208.14797v1 [cs.SI])
    In this paper, we consider a ${\rm U}(1)$-connection graph, that is, a graph where each oriented edge is endowed with a unit modulus complex number which is simply conjugated under orientation flip. A natural replacement for the combinatorial Laplacian is then the so-called magnetic Laplacian, an Hermitian matrix that includes information about the graph's connection. Connection graphs and magnetic Laplacians appear, e.g., in the problem of angular synchronization. In the context of large and dense graphs, we study here sparsifiers of the magnetic Laplacian, i.e., spectral approximations based on subgraphs with few edges. Our approach relies on sampling multi-type spanning forests (MTSFs) using a custom determinantal point process, a distribution over edges that favours diversity. In a word, an MTSF is a spanning subgraph whose connected components are either trees or cycle-rooted trees. The latter partially capture the angular inconsistencies of the connection graph, and thus provide a way to compress information contained in the connection. Interestingly, when this connection graph has weakly inconsistent cycles, samples of this distribution can be obtained by using a random walk with cycle popping. We provide statistical guarantees for a choice of natural estimators of the connection Laplacian, and investigate the practical application of our sparsifiers in two applications.  ( 3 min )
    A Realism Metric for Generated LiDAR Point Clouds. (arXiv:2208.14958v1 [cs.CV])
    A considerable amount of research is concerned with the generation of realistic sensor data. LiDAR point clouds are generated by complex simulations or learned generative models. The generated data is usually exploited to enable or improve downstream perception algorithms. Two major questions arise from these procedures: First, how to evaluate the realism of the generated data? Second, does more realistic data also lead to better perception performance? This paper addresses both questions and presents a novel metric to quantify the realism of LiDAR point clouds. Relevant features are learned from real-world and synthetic point clouds by training on a proxy classification task. In a series of experiments, we demonstrate the application of our metric to determine the realism of generated LiDAR data and compare the realism estimation of our metric to the performance of a segmentation model. We confirm that our metric provides an indication for the downstream segmentation performance.  ( 2 min )
    Formalising the Robustness of Counterfactual Explanations for Neural Networks. (arXiv:2208.14878v1 [cs.LG])
    The use of counterfactual explanations (CFXs) is an increasingly popular explanation strategy for machine learning models. However, recent studies have shown that these explanations may not be robust to changes in the underlying model (e.g., following retraining), which raises questions about their reliability in real-world applications. Existing attempts towards solving this problem are heuristic, and the robustness to model changes of the resulting CFXs is evaluated with only a small number of retrained models, failing to provide exhaustive guarantees. To remedy this, we propose the first notion to formally and deterministically assess the robustness (to model changes) of CFXs for neural networks, that we call {\Delta}-robustness. We introduce an abstraction framework based on interval neural networks to verify the {\Delta}-robustness of CFXs against a possibly infinite set of changes to the model parameters, i.e., weights and biases. We then demonstrate the utility of this approach in two distinct ways. First, we analyse the {\Delta}-robustness of a number of CFX generation methods from the literature and show that they unanimously host significant deficiencies in this regard. Second, we demonstrate how embedding {\Delta}-robustness within existing methods can provide CFXs which are provably robust.  ( 2 min )
    Predicting spatial distribution of Palmer Drought Severity Index. (arXiv:2208.14833v1 [cs.LG])
    The probability of a drought for a particular region is crucial when making decisions related to agriculture. Forecasting this probability is critical for management and challenging at the same time. The prediction model should consider multiple factors with complex relationships across the region of interest and neighbouring regions. We approach this problem by presenting an end-to-end solution based on a spatio-temporal neural network. The model predicts the Palmer Drought Severity Index (PDSI) for subregions of interest. Predictions by climate models provide an additional source of knowledge of the model leading to more accurate drought predictions. Our model has better accuracy than baseline Gradient boosting solutions, as the $R^2$ score for it is $0.90$ compared to $0.85$ for Gradient boosting. Specific attention is on the range of applicability of the model. We examine various regions across the globe to validate them under different conditions. We complement the results with an analysis of how future climate changes for different scenarios affect the PDSI and how our model can help to make better decisions and more sustainable economics.  ( 2 min )
    Statistical embedding: Beyond principal components. (arXiv:2106.01858v2 [stat.ML] UPDATED)
    There has been an intense recent activity in embedding of very high dimensional and nonlinear data structures, much of it in the data science and machine learning literature. We survey this activity in four parts. In the first part we cover nonlinear methods such as principal curves, multidimensional scaling, local linear methods, ISOMAP, graph based methods and diffusion mapping, kernel based methods and random projections. The second part is concerned with topological embedding methods, in particular mapping topological properties into persistence diagrams and the Mapper algorithm. Another type of data sets with a tremendous growth is very high-dimensional network data. The task considered in part three is how to embed such data in a vector space of moderate dimension to make the data amenable to traditional techniques such as cluster and classification techniques. Arguably this is the part where the contrast between algorithmic machine learning methods and statistical modeling, the so-called stochastic block modeling, is at its greatest. In the paper, we discuss the pros and cons for the two approaches. The final part of the survey deals with embedding in $\mathbb{R}^ 2$, i.e. visualization. Three methods are presented: $t$-SNE, UMAP and LargeVis based on methods in parts one, two and three, respectively. The methods are illustrated and compared on two simulated data sets; one consisting of a triplet of noisy Ranunculoid curves, and one consisting of networks of increasing complexity generated with stochastic block models and with two types of nodes.
    A Fair Experimental Comparison of Neural Network Architectures for Latent Representations of Multi-Omics for Drug Response Prediction. (arXiv:2208.14822v1 [cs.LG])
    Recent years have seen a surge of novel neural network architectures for the integration of multi-omics data for prediction. Most of the architectures include either encoders alone or encoders and decoders, i.e., autoencoders of various sorts, to transform multi-omics data into latent representations. One important parameter is the depth of integration: the point at which the latent representations are computed or merged, which can be either early, intermediate, or late. The literature on integration methods is growing steadily, however, close to nothing is known about the relative performance of these methods under fair experimental conditions and under consideration of different use cases. We developed a comparison framework that trains and optimizes multi-omics integration methods under equal conditions. We incorporated early integration and four recently published deep learning methods: MOLI, Super.FELT, OmiEmbed, and MOMA. Further, we devised a novel method, Omics Stacking, that combines the advantages of intermediate and late integration. Experiments were conducted on a public drug response data set with multiple omics data (somatic point mutations, somatic copy number profiles and gene expression profiles) that was obtained from cell lines, patient-derived xenografts, and patient samples. Our experiments confirmed that early integration has the lowest predictive performance. Overall, architectures that integrate triplet loss achieved the best results. Statistical differences can, overall, rarely be observed, however, in terms of the average ranks of methods, Super.FELT is consistently performing best in a cross-validation setting and Omics Stacking best in an external test set setting. The source code of all experiments is available under \url{https://github.com/kramerlab/Multi-Omics_analysis}  ( 3 min )
    Intelligent Closed-loop RAN Control with xApps in OpenRAN Gym. (arXiv:2208.14877v1 [cs.NI])
    Softwarization, programmable network control and the use of all-encompassing controllers acting at different timescales are heralded as the key drivers for the evolution to next-generation cellular networks. These technologies have fostered newly designed intelligent data-driven solutions for managing large sets of diverse cellular functionalities, basically impossible to implement in traditionally closed cellular architectures. Despite the evident interest of industry on Artificial Intelligence (AI) and Machine Learning (ML) solutions for closed-loop control of the Radio Access Network (RAN), and several research works in the field, their design is far from mainstream, and it is still a sophisticated and often overlooked operation. In this paper, we discuss how to design AI/ML solutions for the intelligent closed-loop control of the Open RAN, providing guidelines and insights based on exemplary solutions with high-performance record. We then show how to embed these solutions into xApps instantiated on the O-RAN near-real-time RAN Intelligent Controller (RIC) through OpenRAN Gym, the first publicly available toolbox for data-driven O-RAN experimentation at scale. We showcase a use case of an xApp developed with OpenRAN Gym and tested on a cellular network with 7 base stations and 42 users deployed on the Colosseum wireless network emulator. Our demonstration shows the high degree of flexibility of the OpenRAN Gym-based xApp development environment, which is independent of deployment scenarios and traffic demand.  ( 3 min )
    Data-Driven Chance Constrained AC-OPF using Hybrid Sparse Gaussian Processes. (arXiv:2208.14814v1 [eess.SY])
    The alternating current (AC) chance-constrained optimal power flow (CC-OPF) problem addresses the economic efficiency of electricity generation and delivery under generation uncertainty. The latter is intrinsic to modern power grids because of the high amount of renewables. Despite its academic success, the AC CC-OPF problem is highly nonlinear and computationally demanding, which limits its practical impact. For improving the AC-OPF problem complexity/accuracy trade-off, the paper proposes a fast data-driven setup that uses the sparse and hybrid Gaussian processes (GP) framework to model the power flow equations with input uncertainty. We advocate the efficiency of the proposed approach by a numerical study over multiple IEEE test cases showing up to two times faster and more accurate solutions compared to the state-of-the-art methods.  ( 2 min )
    Improving Operational Efficiency In EV Ridepooling Fleets By Predictive Exploitation of Idle Times. (arXiv:2208.14852v1 [cs.LG])
    In ridepooling systems with electric fleets, charging is a complex decision-making process. Most electric vehicle (EV) taxi services require drivers to make egoistic decisions, leading to decentralized ad-hoc charging strategies. The current state of the mobility system is often lacking or not shared between vehicles, making it impossible to make a system-optimal decision. Most existing approaches do not combine time, location and duration into a comprehensive control algorithm or are unsuitable for real-time operation. We therefore present a real-time predictive charging method for ridepooling services with a single operator, called Idle Time Exploitation (ITX), which predicts the periods where vehicles are idle and exploits these periods to harvest energy. It relies on Graph Convolutional Networks and a linear assignment algorithm to devise an optimal pairing of vehicles and charging stations, in pursuance of maximizing the exploited idle time. We evaluated our approach through extensive simulation studies on real-world datasets from New York City. The results demonstrate that ITX outperforms all baseline methods by at least 5% (equivalent to $70,000 for a 6,000 vehicle operation) per week in terms of a monetary reward function which was modeled to replicate the profitability of a real-world ridepooling system. Moreover, ITX can reduce delays by at least 4.68% in comparison with baseline methods and generally increase passenger comfort by facilitating a better spread of customers across the fleet. Our results also demonstrate that ITX enables vehicles to harvest energy during the day, stabilizing battery levels and increasing resilience to unexpected surges in demand. Lastly, compared to the best-performing baseline strategy, peak loads are reduced by 17.39% which benefits grid operators and paves the way for more sustainable use of the electrical grid.  ( 3 min )
    Dynamic Global Sensitivity for Differentially Private Contextual Bandits. (arXiv:2208.14555v1 [cs.LG])
    Bandit algorithms have become a reference solution for interactive recommendation. However, as such algorithms directly interact with users for improved recommendations, serious privacy concerns have been raised regarding its practical use. In this work, we propose a differentially private linear contextual bandit algorithm, via a tree-based mechanism to add Laplace or Gaussian noise to model parameters. Our key insight is that as the model converges during online update, the global sensitivity of its parameters shrinks over time (thus named dynamic global sensitivity). Compared with existing solutions, our dynamic global sensitivity analysis allows us to inject less noise to obtain $(\epsilon, \delta)$-differential privacy with added regret caused by noise injection in $\tilde O(\log{T}\sqrt{T}/\epsilon)$. We provide a rigorous theoretical analysis over the amount of noise added via dynamic global sensitivity and the corresponding upper regret bound of our proposed algorithm. Experimental results on both synthetic and real-world datasets confirmed the algorithm's advantage against existing solutions.  ( 2 min )
    Style-Agnostic Reinforcement Learning. (arXiv:2208.14863v1 [cs.CV])
    We present a novel method of learning style-agnostic representation using both style transfer and adversarial learning in the reinforcement learning framework. The style, here, refers to task-irrelevant details such as the color of the background in the images, where generalizing the learned policy across environments with different styles is still a challenge. Focusing on learning style-agnostic representations, our method trains the actor with diverse image styles generated from an inherent adversarial style perturbation generator, which plays a min-max game between the actor and the generator, without demanding expert knowledge for data augmentation or additional class labels for adversarial training. We verify that our method achieves competitive or better performances than the state-of-the-art approaches on Procgen and Distracting Control Suite benchmarks, and further investigate the features extracted from our model, showing that the model better captures the invariants and is less distracted by the shifted style. The code is available at https://github.com/POSTECH-CVLab/style-agnostic-RL.  ( 2 min )
    ARMA Cell: A Modular and Effective Approach for Neural Autoregressive Modeling. (arXiv:2208.14919v1 [cs.LG])
    The autoregressive moving average (ARMA) model is a classical, and arguably one of the most studied approaches to model time series data. It has compelling theoretical properties and is widely used among practitioners. More recent deep learning approaches popularize recurrent neural networks (RNNs) and, in particular, long short-term memory (LSTM) cells that have become one of the best performing and most common building blocks in neural time series modeling. While advantageous for time series data or sequences with long-term effects, complex RNN cells are not always a must and can sometimes even be inferior to simpler recurrent approaches. In this work, we introduce the ARMA cell, a simpler, modular, and effective approach for time series modeling in neural networks. This cell can be used in any neural network architecture where recurrent structures are present and naturally handles multivariate time series using vector autoregression. We also introduce the ConvARMA cell as a natural successor for spatially-correlated time series. Our experiments show that the proposed methodology is competitive with popular alternatives in terms of performance while being more robust and compelling due to its simplicity.  ( 2 min )
    Lazy Online Gradient Descent is Universal on Polytopes. (arXiv:2004.01739v2 [cs.LG] UPDATED)
    We prove the familiar Lazy Online Gradient Descent algorithm is universal on polytope domains. That means it gets $O(1)$ pseudo-regret against i.i.d opponents, while simultaneously achieving the well-known $O(\sqrt N)$ worst-case regret bound. For comparison the bulk of the literature focuses on variants of the Hedge (exponential weights) algorithm on the simplex. These can in principle be lifted to general polytopes; however the process is computationally unfeasible for many important classes where the number of vertices grows quickly with the dimension. The lifting procedure also ignores any Euclidean bounds on the cost vectors, and can create extra factors of dimension in the pseudo-regret bound. Gradient Descent is simpler than the handful of purpose-built algorithms for polytopes in the literature, and works in a broader setting. In particular existing algorithms assume the optimiser is unique, while our bound allows for several optimal vertices.
    Multiscale Non-stationary Causal Structure Learning from Time Series Data. (arXiv:2208.14989v1 [cs.LG])
    This paper introduces a new type of causal structure, namely multiscale non-stationary directed acyclic graph (MN-DAG), that generalizes DAGs to the time-frequency domain. Our contribution is twofold. First, by leveraging results from spectral and causality theories, we expose a novel probabilistic generative model, which allows to sample an MN-DAG according to user-specified priors concerning the time-dependence and multiscale properties of the causal graph. Second, we devise a Bayesian method for the estimation of MN-DAGs, by means of stochastic variational inference (SVI), called Multiscale Non-Stationary Causal Structure Learner (MN-CASTLE). In addition to direct observations, MN-CASTLE exploits information from the decomposition of the total power spectrum of time series over different time resolutions. In our experiments, we first use the proposed model to generate synthetic data according to a latent MN-DAG, showing that the data generated reproduces well-known features of time series in different domains. Then we compare our learning method MN-CASTLE against baseline models on synthetic data generated with different multiscale and non-stationary settings, confirming the good performance of MN-CASTLE. Finally, we show some insights derived from the application of MN-CASTLE to study the causal structure of 7 global equity markets during the Covid-19 pandemic.
    Inverse Propensity Score based offline estimator for deterministic ranking lists using position bias. (arXiv:2208.14980v1 [cs.IR])
    In this work, we present a novel way of computing IPS using a position-bias model for deterministic logging policies. This technique significantly widens the policies on which OPE can be used. We validate this technique using two different experiments on industry-scale data. The OPE results are clearly strongly correlated with the online results, with some constant bias. The estimator requires the examination model to be a reasonably accurate approximation of real user behaviour.
    NeurIPS'22 Cross-Domain MetaDL competition: Design and baseline results. (arXiv:2208.14686v1 [cs.LG])
    We present the design and baseline results for a new challenge in the ChaLearn meta-learning series, accepted at NeurIPS'22, focusing on "cross-domain" meta-learning. Meta-learning aims to leverage experience gained from previous tasks to solve new tasks efficiently (i.e., with better performance, little training data, and/or modest computational resources). While previous challenges in the series focused on within-domain few-shot learning problems, with the aim of learning efficiently N-way k-shot tasks (i.e., N class classification problems with k training examples), this competition challenges the participants to solve "any-way" and "any-shot" problems drawn from various domains (healthcare, ecology, biology, manufacturing, and others), chosen for their humanitarian and societal impact. To that end, we created Meta-Album, a meta-dataset of 40 image classification datasets from 10 domains, from which we carve out tasks with any number of "ways" (within the range 2-20) and any number of "shots" (within the range 1-20). The competition is with code submission, fully blind-tested on the CodaLab challenge platform. The code of the winners will be open-sourced, enabling the deployment of automated machine learning solutions for few-shot image classification across several domains.
    Reducing Impacts of System Heterogeneity in Federated Learning using Weight Update Magnitudes. (arXiv:2208.14808v1 [cs.LG])
    The widespread adoption of handheld devices have fueled rapid growth in new applications. Several of these new applications employ machine learning models to train on user data that is typically private and sensitive. Federated Learning enables machine learning models to train locally on each handheld device while only synchronizing their neuron updates with a server. While this enables user privacy, technology scaling and software advancements have resulted in handheld devices with varying performance capabilities. This results in the training time of federated learning tasks to be dictated by a few low-performance straggler devices, essentially becoming a bottleneck to the entire training process. In this work, we aim to mitigate the performance bottleneck of federated learning by dynamically forming sub-models for stragglers based on their performance and accuracy feedback. To this end, we offer the Invariant Dropout, a dynamic technique that forms a sub-model based on the neuron update threshold. Invariant Dropout uses neuron updates from the non-straggler clients to develop a tailored sub-models for each straggler during each training iteration. All corresponding weights which have a magnitude less than the threshold are dropped for the iteration. We evaluate Invariant Dropout using five real-world mobile clients. Our evaluations show that Invariant Dropout obtains a maximum accuracy gain of 1.4% points over state-of-the-art Ordered Dropout while mitigating performance bottlenecks of stragglers.
    Bayesian Optimization-based Combinatorial Assignment. (arXiv:2208.14698v1 [cs.LG])
    We study the combinatorial assignment domain, which includes combinatorial auctions and course allocation. The main challenge in this domain is that the bundle space grows exponentially in the number of items. To address this, several papers have recently proposed machine learning-based preference elicitation algorithms that aim to elicit only the most important information from agents. However, the main shortcoming of this prior work is that it does not model a mechanism's uncertainty over values for not yet elicited bundles. In this paper, we address this shortcoming by presenting a Bayesian Optimization-based Combinatorial Assignment (BOCA) mechanism. Our key technical contribution is to integrate a method for capturing model uncertainty into an iterative combinatorial auction mechanism. Concretely, we design a new method for estimating an upper uncertainty bound that can be used as an acquisition function to determine the next query to the agents. This enables the mechanism to properly explore (and not just exploit) the bundle space during its preference elicitation phase. We run computational experiments in several spectrum auction domains to evaluate BOCA's performance. Our results show that BOCA achieves higher allocative efficiency than state-of-the-art approaches.
    Let us Build Bridges: Understanding and Extending Diffusion Generative Models. (arXiv:2208.14699v1 [cs.LG])
    Diffusion-based generative models have achieved promising results recently, but raise an array of open questions in terms of conceptual understanding, theoretical analysis, algorithm improvement and extensions to discrete, structured, non-Euclidean domains. This work tries to re-exam the overall framework, in order to gain better theoretical understandings and develop algorithmic extensions for data from arbitrary domains. By viewing diffusion models as latent variable models with unobserved diffusion trajectories and applying maximum likelihood estimation (MLE) with latent trajectories imputed from an auxiliary distribution, we show that both the model construction and the imputation of latent trajectories amount to constructing diffusion bridge processes that achieve deterministic values and constraints at end point, for which we provide a systematic study and a suit of tools. Leveraging our framework, we present 1) a first theoretical error analysis for learning diffusion generation models, and 2) a simple and unified approach to learning on data from different discrete and constrained domains. Experiments show that our methods perform superbly on generating images, semantic segments and 3D point clouds.
    Graph Distance Neural Networks for Predicting Multiple Drug Interactions. (arXiv:2208.14810v1 [cs.LG])
    Since multidrug combination is widely applied, the accurate prediction of drug-drug interaction (DDI) is becoming more and more critical. In our method, we use graph to represent drug-drug interaction: nodes represent drug; edges represent drug-drug interactions. Based on our assumption, we convert the prediction of DDI to link prediction problem, utilizing known drug node characteristics and DDI types to predict unknown DDI types. This work proposes a Graph Distance Neural Network (GDNN) to predict drug-drug interactions. Firstly, GDNN generates initial features for nodes via target point method, fully including the distance information in the graph. Secondly, GDNN adopts an improved message passing framework to better generate each drug node embedded expression, comprehensively considering the nodes and edges characteristics synchronously. Thirdly, GDNN aggregates the embedded expressions, undergoing MLP processing to generate the final predicted drug interaction type. GDNN achieved Test Hits@20=0.9037 on the ogb-ddi dataset, proving GDNN can predict DDI efficiently.
    Federated Online Clustering of Bandits. (arXiv:2208.14865v1 [cs.LG])
    Contextual multi-armed bandit (MAB) is an important sequential decision-making problem in recommendation systems. A line of works, called the clustering of bandits (CLUB), utilize the collaborative effect over users and dramatically improve the recommendation quality. Owing to the increasing application scale and public concerns about privacy, there is a growing demand to keep user data decentralized and push bandit learning to the local server side. Existing CLUB algorithms, however, are designed under the centralized setting where data are available at a central server. We focus on studying the federated online clustering of bandit (FCLUB) problem, which aims to minimize the total regret while satisfying privacy and communication considerations. We design a new phase-based scheme for cluster detection and a novel asynchronous communication protocol for cooperative bandit learning for this problem. To protect users' privacy, previous differential privacy (DP) definitions are not very suitable, and we propose a new DP notion that acts on the user cluster level. We provide rigorous proofs to show that our algorithm simultaneously achieves (clustered) DP, sublinear communication complexity and sublinear regret. Finally, experimental evaluations show our superior performance compared with benchmark algorithms.
    Segmentation-guided Domain Adaptation and Data Harmonization of Multi-device Retinal Optical Coherence Tomography using Cycle-Consistent Generative Adversarial Networks. (arXiv:2208.14635v1 [eess.IV])
    Optical Coherence Tomography(OCT) is a non-invasive technique capturing cross-sectional area of the retina in micro-meter resolutions. It has been widely used as a auxiliary imaging reference to detect eye-related pathology and predict longitudinal progression of the disease characteristics. Retina layer segmentation is one of the crucial feature extraction techniques, where the variations of retinal layer thicknesses and the retinal layer deformation due to the presence of the fluid are highly correlated with multiple epidemic eye diseases like Diabetic Retinopathy(DR) and Age-related Macular Degeneration (AMD). However, these images are acquired from different devices, which have different intensity distribution, or in other words, belong to different imaging domains. This paper proposes a segmentation-guided domain-adaptation method to adapt images from multiple devices into single image domain, where the state-of-art pre-trained segmentation model is available. It avoids the time consumption of manual labelling for the upcoming new dataset and the re-training of the existing network. The semantic consistency and global feature consistency of the network will minimize the hallucination effect that many researchers reported regarding Cycle-Consistent Generative Adversarial Networks(CycleGAN) architecture.
    Classical-to-quantum convolutional neural network transfer learning. (arXiv:2208.14708v1 [quant-ph])
    Machine learning using quantum convolutional neural networks (QCNNs) has demonstrated success in both quantum and classical data classification. In previous studies, QCNNs attained a higher classification accuracy than their classical counterparts under the same training conditions in the few-parameter regime. However, the general performance of large-scale quantum models is difficult to examine because of the limited size of quantum circuits, which can be reliably implemented in the near future. We propose transfer learning as an effective strategy for utilizing small QCNNs in the noisy intermediate-scale quantum era to the full extent. In the classical-to-quantum transfer learning framework, a QCNN can solve complex classification problems without requiring a large-scale quantum circuit by utilizing a pre-trained classical convolutional neural network (CNN). We perform numerical simulations of QCNN models with various sets of quantum convolution and pooling operations for MNIST data classification under transfer learning, in which a classical CNN is trained with Fashion-MNIST data. The results show that transfer learning from classical to quantum CNN performs considerably better than purely classical transfer learning models under similar training conditions.
    Fine-Grained Distribution-Dependent Learning Curves. (arXiv:2208.14615v1 [cs.LG])
    Learning curves plot the expected error of a learning algorithm as a function of the number of labeled input samples. They are widely used by machine learning practitioners as a measure of an algorithm's performance, but classic PAC learning theory cannot explain their behavior. In this paper we introduce a new combinatorial characterization called the VCL dimension that improves and refines the recent results of Bousquet et al. (2021). Our characterization sheds new light on the structure of learning curves by providing fine-grained bounds, and showing that for classes with finite VCL, the rate of decay can be decomposed into a linear component that depends only on the hypothesis class and an exponential component that depends also on the target distribution. In particular, the finer nuance of the VCL dimension implies lower bounds that are quantitatively stronger than the bounds of Bousquet et al. (2021) and qualitatively stronger than classic 'no free lunch' lower bounds. The VCL characterization solves an open problem studied by Antos and Lugosi (1998), who asked in what cases such lower bounds exist. As a corollary, we recover their lower bound for half-spaces in $\mathbb{R}^d$, and we do so in a principled way that should be applicable to other cases as well. Finally, to provide another viewpoint on our work and how it compares to traditional PAC learning bounds, we also present an alternative formulation of our results in a language that is closer to the PAC setting.
    A stabilizing reinforcement learning approach for sampled systems with partially unknown models. (arXiv:2208.14714v1 [eess.SY])
    Reinforcement learning is commonly associated with training of reward-maximizing (or cost-minimizing) agents, in other words, controllers. It can be applied in model-free or model-based fashion, using a priori or online collected system data to train involved parametric architectures. In general, online reinforcement learning does not guarantee closed loop stability unless special measures are taken, for instance, through learning constraints or tailored training rules. Particularly promising are hybrids of reinforcement learning with "classical" control approaches. In this work, we suggest a method to guarantee practical stability of the system-controller closed loop in a purely online learning setting, i.e., without offline training. Moreover, we assume only partial knowledge of the system model. To achieve the claimed results, we employ techniques of classical adaptive control. The implementation of the overall control scheme is provided explicitly in a digital, sampled setting. That is, the controller receives the state of the system and computes the control action at discrete, specifically, equidistant moments in time. The method is tested in adaptive traction control and cruise control where it proved to significantly reduce the cost.
    Model-Based Imitation Learning Using Entropy Regularization of Model and Policy. (arXiv:2206.10101v2 [cs.LG] UPDATED)
    Approaches based on generative adversarial networks for imitation learning are promising because they are sample efficient in terms of expert demonstrations. However, training a generator requires many interactions with the actual environment because model-free reinforcement learning is adopted to update a policy. To improve the sample efficiency using model-based reinforcement learning, we propose model-based Entropy-Regularized Imitation Learning (MB-ERIL) under the entropy-regularized Markov decision process to reduce the number of interactions with the actual environment. MB-ERIL uses two discriminators. A policy discriminator distinguishes the actions generated by a robot from expert ones, and a model discriminator distinguishes the counterfactual state transitions generated by the model from the actual ones. We derive structured discriminators so that the learning of the policy and the model is efficient. Computer simulations and real robot experiments show that MB-ERIL achieves a competitive performance and significantly improves the sample efficiency compared to baseline methods.
    Unifying Evaluation of Machine Learning Safety Monitors. (arXiv:2208.14660v1 [cs.LG])
    With the increasing use of Machine Learning (ML) in critical autonomous systems, runtime monitors have been developed to detect prediction errors and keep the system in a safe state during operations. Monitors have been proposed for different applications involving diverse perception tasks and ML models, and specific evaluation procedures and metrics are used for different contexts. This paper introduces three unified safety-oriented metrics, representing the safety benefits of the monitor (Safety Gain), the remaining safety gaps after using it (Residual Hazard), and its negative impact on the system's performance (Availability Cost). To compute these metrics, one requires to define two return functions, representing how a given ML prediction will impact expected future rewards and hazards. Three use-cases (classification, drone landing, and autonomous driving) are used to demonstrate how metrics from the literature can be expressed in terms of the proposed metrics. Experimental results on these examples show how different evaluation choices impact the perceived performance of a monitor. As our formalism requires us to formulate explicit safety assumptions, it allows us to ensure that the evaluation conducted matches the high-level system requirements.
    Accelerating Deep Unrolling Networks via Dimensionality Reduction. (arXiv:2208.14784v1 [eess.IV])
    In this work we propose a new paradigm for designing efficient deep unrolling networks using dimensionality reduction schemes, including minibatch gradient approximation and operator sketching. The deep unrolling networks are currently the state-of-the-art solutions for imaging inverse problems. However, for high-dimensional imaging tasks, especially X-ray CT and MRI imaging, the deep unrolling schemes typically become inefficient both in terms of memory and computation, due to the need of computing multiple times the high-dimensional forward and adjoint operators. Recently researchers have found that such limitations can be partially addressed by unrolling the stochastic gradient descent (SGD), inspired by the success of stochastic first-order optimization. In this work, we explore further this direction and propose first a more expressive and practical stochastic primal-dual unrolling, based on the state-of-the-art Learned Primal-Dual (LPD) network, and also a further acceleration upon stochastic primal-dual unrolling, using sketching techniques to approximate products in the high-dimensional image space. The operator sketching can be jointly applied with stochastic unrolling for the best acceleration and compression performance. Our numerical experiments on X-ray CT image reconstruction demonstrate the remarkable effectiveness of our accelerated unrolling schemes.
    Cadence Detection in Symbolic Classical Music using Graph Neural Networks. (arXiv:2208.14819v1 [cs.SD])
    Cadences are complex structures that have been driving music from the beginning of contrapuntal polyphony until today. Detecting such structures is vital for numerous MIR tasks such as musicological analysis, key detection, or music segmentation. However, automatic cadence detection remains challenging mainly because it involves a combination of high-level musical elements like harmony, voice leading, and rhythm. In this work, we present a graph representation of symbolic scores as an intermediate means to solve the cadence detection task. We approach cadence detection as an imbalanced node classification problem using a Graph Convolutional Network. We obtain results that are roughly on par with the state of the art, and we present a model capable of making predictions at multiple levels of granularity, from individual notes to beats, thanks to the fine-grained, note-by-note representation. Moreover, our experiments suggest that graph convolution can learn non-local features that assist in cadence detection, freeing us from the need of having to devise specialized features that encode non-local context. We argue that this general approach to modeling musical scores and classification tasks has a number of potential advantages, beyond the specific recognition task presented here.
    Deep Anomaly Detection and Search via Reinforcement Learning. (arXiv:2208.14834v1 [cs.LG])
    Semi-supervised Anomaly Detection (AD) is a kind of data mining task which aims at learning features from partially-labeled datasets to help detect outliers. In this paper, we classify existing semi-supervised AD methods into two categories: unsupervised-based and supervised-based, and point out that most of them suffer from insufficient exploitation of labeled data and under-exploration of unlabeled data. To tackle these problems, we propose Deep Anomaly Detection and Search (DADS), which applies Reinforcement Learning (RL) to balance exploitation and exploration. During the training process, the agent searches for possible anomalies with hierarchically-structured datasets and uses the searched anomalies to enhance performance, which in essence draws lessons from the idea of ensemble learning. Experimentally, we compare DADS with several state-of-the-art methods in the settings of leveraging labeled known anomalies to detect both other known anomalies and unknown anomalies. Results show that DADS can efficiently and precisely search anomalies from unlabeled data and learn from them, thus achieving good performance.
    Mathematical Models of Human Drivers Using Artificial Risk Fields. (arXiv:2205.12722v2 [cs.LG] UPDATED)
    In this paper, we use the concept of artificial risk fields to predict how human operators control a vehicle in response to upcoming road situations. A risk field assigns a non-negative risk measure to the state of the system in order to model how close that state is to violating a safety property, such as hitting an obstacle or exiting the road. Using risk fields, we construct a stochastic model of the operator that maps from states to likely actions. We demonstrate our approach on a driving task wherein human subjects are asked to drive a car inside a realistic driving simulator while avoiding obstacles placed on the road. We show that the most likely risk field given the driving data is obtained by solving a convex optimization problem. Next, we apply the inferred risk fields to generate distinct driving behaviors while comparing predicted trajectories against ground truth measurements. We observe that the risk fields are excellent at predicting future trajectory distributions with high prediction accuracy for up to twenty seconds prediction horizons. At the same time, we observe some challenges such as the inability to account for how drivers choose to accelerate/decelerate based on the road conditions.
    Towards Deployment-Efficient Reinforcement Learning: Lower Bound and Optimality. (arXiv:2202.06450v3 [cs.LG] UPDATED)
    Deployment efficiency is an important criterion for many real-world applications of reinforcement learning (RL). Despite the community's increasing interest, there lacks a formal theoretical formulation for the problem. In this paper, we propose such a formulation for deployment-efficient RL (DE-RL) from an "optimization with constraints" perspective: we are interested in exploring an MDP and obtaining a near-optimal policy within minimal \emph{deployment complexity}, whereas in each deployment the policy can sample a large batch of data. Using finite-horizon linear MDPs as a concrete structural model, we reveal the fundamental limit in achieving deployment efficiency by establishing information-theoretic lower bounds, and provide algorithms that achieve the optimal deployment efficiency. Moreover, our formulation for DE-RL is flexible and can serve as a building block for other practically relevant settings; we give "Safe DE-RL" and "Sample-Efficient DE-RL" as two examples, which may be worth future investigation.
    Joint Learning of Localized Representations from Medical Images and Reports. (arXiv:2112.02889v2 [cs.CV] UPDATED)
    Contrastive learning has proven effective for pre-training image models on unlabeled data with promising results for tasks such as medical image classification. Using paired text (like radiological reports) during pre-training improves the results even further. Still, most existing methods target image classification downstream tasks and may not be optimal for localized tasks like semantic segmentation or object detection. We therefore propose Localized representation learning from Vision and Text (LoVT), to our best knowledge, the first text-supervised pre-training method that targets localized medical imaging tasks. Our method combines instance-level image-report contrastive learning with local contrastive learning on image region and report sentence representations. We evaluate LoVT and commonly used pre-training methods on an evaluation framework of 18 localized tasks on chest X-rays from five public datasets. LoVT performs best on 10 of the 18 studied tasks making it the preferred method of choice for localized tasks.
    Stationary Kernels and Gaussian Processes on Lie Groups and their Homogeneous Spaces I: the Compact Case. (arXiv:2208.14960v1 [stat.ME])
    Gaussian processes are arguably the most important model class in spatial statistics. They encode prior information about the modeled function and can be used for exact or approximate Bayesian inference. In many applications, particularly in physical sciences and engineering, but also in areas such as geostatistics and neuroscience, invariance to symmetries is one of the most fundamental forms of prior information one can consider. The invariance of a Gaussian process' covariance to such symmetries gives rise to the most natural generalization of the concept of stationarity to such spaces. In this work, we develop constructive and practical techniques for building stationary Gaussian processes on a very large class of non-Euclidean spaces arising in the context of symmetries. Our techniques make it possible to (i) calculate covariance kernels and (ii) sample from prior and posterior Gaussian processes defined on such spaces, both in a practical manner. This work is split into two parts, each involving different technical considerations: part I studies compact spaces, while part II studies non-compact spaces possessing certain structure. Our contributions make the non-Euclidean Gaussian process models we study compatible with well-understood computational techniques available in standard Gaussian process software packages, thereby making them accessible to practitioners.
    Projection Pursuit Gaussian Process Regression. (arXiv:2004.00667v2 [stat.ML] UPDATED)
    A primary goal of computer experiments is to reconstruct the function given by the computer code via scattered evaluations. Traditional isotropic Gaussian process models suffer from the curse of dimensionality, when the input dimension is relatively high given limited data points. Gaussian process models with additive correlation functions are scalable to dimensionality, but they are more restrictive as they only work for additive functions. In this work, we consider a projection pursuit model, in which the nonparametric part is driven by an additive Gaussian process regression. We choose the dimension of the additive function higher than the original input dimension, and call this strategy "dimension expansion". We show that dimension expansion can help approximate more complex functions. A gradient descent algorithm is proposed for model training based on the maximum likelihood estimation. Simulation studies show that the proposed method outperforms the traditional Gaussian process models. The Supplementary Materials are available online.
    Deep-Learning-Based Device Fingerprinting for Increased LoRa-IoT Security: Sensitivity to Network Deployment Changes. (arXiv:2208.14964v1 [cs.LG])
    Deep-learning-based device fingerprinting has recently been recognized as a key enabler for automated network access authentication. Its robustness to impersonation attacks due to the inherent difficulty of replicating physical features is what distinguishes it from conventional cryptographic solutions. Although device fingerprinting has shown promising performances, its sensitivity to changes in the network operating environment still poses a major limitation. This paper presents an experimental framework that aims to study and overcome the sensitivity of LoRa-enabled device fingerprinting to such changes. We first begin by describing RF datasets we collected using our LoRa-enabled wireless device testbed. We then propose a new fingerprinting technique that exploits out-of-band distortion information caused by hardware impairments to increase the fingerprinting accuracy. Finally, we experimentally study and analyze the sensitivity of LoRa RF fingerprinting to various network setting changes. Our results show that fingerprinting does relatively well when the learning models are trained and tested under the same settings. However, when trained and tested under different settings, these models exhibit moderate sensitivity to channel condition changes and severe sensitivity to protocol configuration and receiver hardware changes when IQ data is used as input. However, with FFT data is used as input, they perform poorly under any change.
    Concept Gradient: Concept-based Interpretation Without Linear Assumption. (arXiv:2208.14966v1 [cs.LG])
    Concept-based interpretations of black-box models are often more intuitive for humans to understand. The most widely adopted approach for concept-based interpretation is Concept Activation Vector (CAV). CAV relies on learning a linear relation between some latent representation of a given model and concepts. The linear separability is usually implicitly assumed but does not hold true in general. In this work, we started from the original intent of concept-based interpretation and proposed Concept Gradient (CG), extending concept-based interpretation beyond linear concept functions. We showed that for a general (potentially non-linear) concept, we can mathematically evaluate how a small change of concept affecting the model's prediction, which leads to an extension of gradient-based interpretation to the concept space. We demonstrated empirically that CG outperforms CAV in both toy examples and real world datasets.
    Zero-day DDoS Attack Detection. (arXiv:2208.14971v1 [cs.CR])
    The ability to detect zero-day (novel) attacks has become essential in the network security industry. Due to ever evolving attack signatures, existing network intrusion detection systems often fail to detect these threats. This project aims to solve the task of detecting zero-day DDoS (distributed denial-of-service) attacks by utilizing network traffic that is captured before entering a private network. Modern feature extraction techniques are used in conjunction with neural networks to determine if a network packet is either benign or malicious.
    Feature Alignment by Uncertainty and Self-Training for Source-Free Unsupervised Domain Adaptation. (arXiv:2208.14888v1 [cs.CV])
    Most unsupervised domain adaptation (UDA) methods assume that labeled source images are available during model adaptation. However, this assumption is often infeasible owing to confidentiality issues or memory constraints on mobile devices. To address these problems, we propose a simple yet effective source-free UDA method that uses only a pre-trained source model and unlabeled target images. Our method captures the aleatoric uncertainty by incorporating data augmentation and trains the feature generator with two consistency objectives. The feature generator is encouraged to learn consistent visual features away from the decision boundaries of the head classifier. Inspired by self-supervised learning, our method promotes inter-space alignment between the prediction space and the feature space while incorporating intra-space consistency within the feature space to reduce the domain gap between the source and target domains. We also consider epistemic uncertainty to boost the model adaptation performance. Extensive experiments on popular UDA benchmarks demonstrate that the performance of our approach is comparable or even superior to vanilla UDA methods without using source images or network modifications.
    U(1) Symmetry-breaking Observed in Generic CNN Bottleneck Layers. (arXiv:2206.02220v2 [cs.CV] UPDATED)
    We report on a novel model linking deep convolutional neural networks (CNN) to biological vision and fundamental particle physics. Information propagation in a CNN is modeled via an analogy to an optical system, where information is concentrated near a bottleneck where the 2D spatial resolution collapses about a focal point $1\times 1=1$. A 3D space $(x,y,t)$ is defined by $(x,y)$ coordinates in the image plane and CNN layer $t$, where a principal ray $(0,0,t)$ runs in the direction of information propagation through both the optical axis and the image center pixel located at $(x,y)=(0,0)$, about which the sharpest possible spatial focus is limited to a circle of confusion in the image plane. Our novel insight is to model the principal optical ray $(0,0,t)$ as geometrically equivalent to the medial vector in the positive orthant $I(x,y) \in R^{N+}$ of a $N$-channel activation space, e.g. along the greyscale (or luminance) vector $(t,t,t)$ in $RGB$ colour space. Information is thus concentrated into an energy potential $E(x,y,t)=\|I(x,y,t)\|^2$, which, particularly for bottleneck layers $t$ of generic CNNs, is highly concentrated and symmetric about the spatial origin $(0,0,t)$ and exhibits the well-known "Sombrero" potential of the boson particle. This symmetry is broken in classification, where bottleneck layers of generic pre-trained CNN models exhibit a consistent class-specific bias towards an angle $\theta \in U(1)$ defined simultaneously in the image plane and in activation feature space. Initial observations validate our hypothesis from generic pre-trained CNN activation maps and a bare-bones memory-based classification scheme, with no training or tuning. Training from scratch using combined one-hot $+ U(1)$ loss improves classification for all tasks tested including ImageNet.
    Incremental Learning in Diagonal Linear Networks. (arXiv:2208.14673v1 [cs.LG])
    Diagonal linear networks (DLNs) are a toy simplification of artificial neural networks; they consist in a quadratic reparametrization of linear regression inducing a sparse implicit regularization. In this paper, we describe the trajectory of the gradient flow of DLNs in the limit of small initialization. We show that incremental learning is effectively performed in the limit: coordinates are successively activated, while the iterate is the minimizer of the loss constrained to have support on the active coordinates only. This shows that the sparse implicit regularization of DLNs decreases with time. This work is restricted to the underparametrized regime with anti-correlated features for technical reasons.
    Deep Reinforcement Learning for Uplink Multi-Carrier Non-Orthogonal Multiple Access Resource Allocation Using Buffer State Information. (arXiv:2208.14689v1 [cs.NI])
    For orthogonal multiple access (OMA) systems, the number of served user equipments (UEs) is limited to the number of available orthogonal resources. On the other hand, non-orthogonal multiple access (NOMA) schemes allow multiple UEs to use the same orthogonal resource. This extra degree of freedom introduces new challenges for resource allocation. Buffer state information (BSI), like the size and age of packets waiting for transmission, can be used to improve scheduling in OMA systems. In this paper, we investigate the impact of BSI on the performance of a centralized scheduler in an uplink multi-carrier NOMA scenario with UEs having various data rate and latency requirements. To handle the large combinatorial space of allocating UEs to the resources, we propose a novel scheduler based on actor-critic reinforcement learning incorporating BSI. Training and evaluation are carried out using Nokia's "wireless suite". We propose various novel techniques to both stabilize and speed up training. The proposed scheduler outperforms benchmark schedulers.
    A further exploration of deep Multi-Agent Reinforcement Learning with Hybrid Action Space. (arXiv:2208.14447v1 [cs.LG])
    The research of extending deep reinforcement learning (drl) to multi-agent field has solved many complicated problems and made great achievements. However, almost all these studies only focus on discrete or continuous action space and there are few works having ever used multi-agent deep reinforcement learning to real-world environment problems which mostly have a hybrid action space. Therefore, in this paper, we propose two algorithms: deep multi-agent hybrid soft actor-critic (MAHSAC) and multi-agent hybrid deep deterministic policy gradients (MAHDDPG) to fill this gap. This two algorithms follow the centralized training and decentralized execution (CTDE) paradigm and could handle hybrid action space problems. Our experiences are running on multi-agent particle environment which is an easy multi-agent particle world, along with some basic simulated physics. The experimental results show that these algorithms have good performances.  ( 2 min )
    You Only Search Once: On Lightweight Differentiable Architecture Search for Resource-Constrained Embedded Platforms. (arXiv:2208.14446v1 [cs.LG])
    Benefiting from the search efficiency, differentiable neural architecture search (NAS) has evolved as the most dominant alternative to automatically design competitive deep neural networks (DNNs). We note that DNNs must be executed under strictly hard performance constraints in real-world scenarios, for example, the runtime latency on autonomous vehicles. However, to obtain the architecture that meets the given performance constraint, previous hardware-aware differentiable NAS methods have to repeat a plethora of search runs to manually tune the hyper-parameters by trial and error, and thus the total design cost increases proportionally. To resolve this, we introduce a lightweight hardware-aware differentiable NAS framework dubbed LightNAS, striving to find the required architecture that satisfies various performance constraints through a one-time search (i.e., \underline{\textit{you only search once}}). Extensive experiments are conducted to show the superiority of LightNAS over previous state-of-the-art methods.  ( 2 min )
  • Open

    Light curve completion and forecasting using fast and scalable Gaussian processes (MuyGPs). (arXiv:2208.14592v1 [astro-ph.IM])
    Temporal variations of apparent magnitude, called light curves, are observational statistics of interest captured by telescopes over long periods of time. Light curves afford the exploration of Space Domain Awareness (SDA) objectives such as object identification or pose estimation as latent variable inference problems. Ground-based observations from commercial off the shelf (COTS) cameras remain inexpensive compared to higher precision instruments, however, limited sensor availability combined with noisier observations can produce gappy time-series data that can be difficult to model. These external factors confound the automated exploitation of light curves, which makes light curve prediction and extrapolation a crucial problem for applications. Traditionally, image or time-series completion problems have been approached with diffusion-based or exemplar-based methods. More recently, Deep Neural Networks (DNNs) have become the tool of choice due to their empirical success at learning complex nonlinear embeddings. However, DNNs often require large training data that are not necessarily available when looking at unique features of a light curve of a single satellite. In this paper, we present a novel approach to predicting missing and future data points of light curves using Gaussian Processes (GPs). GPs are non-linear probabilistic models that infer posterior distributions over functions and naturally quantify uncertainty. However, the cubic scaling of GP inference and training is a major barrier to their adoption in applications. In particular, a single light curve can feature hundreds of thousands of observations, which is well beyond the practical realization limits of a conventional GP on a single machine. Consequently, we employ MuyGPs, a scalable framework for hyperparameter estimation of GP models that uses nearest neighbors sparsification and local cross-validation. MuyGPs...
    Stationary Kernels and Gaussian Processes on Lie Groups and their Homogeneous Spaces I: the Compact Case. (arXiv:2208.14960v1 [stat.ME])
    Gaussian processes are arguably the most important model class in spatial statistics. They encode prior information about the modeled function and can be used for exact or approximate Bayesian inference. In many applications, particularly in physical sciences and engineering, but also in areas such as geostatistics and neuroscience, invariance to symmetries is one of the most fundamental forms of prior information one can consider. The invariance of a Gaussian process' covariance to such symmetries gives rise to the most natural generalization of the concept of stationarity to such spaces. In this work, we develop constructive and practical techniques for building stationary Gaussian processes on a very large class of non-Euclidean spaces arising in the context of symmetries. Our techniques make it possible to (i) calculate covariance kernels and (ii) sample from prior and posterior Gaussian processes defined on such spaces, both in a practical manner. This work is split into two parts, each involving different technical considerations: part I studies compact spaces, while part II studies non-compact spaces possessing certain structure. Our contributions make the non-Euclidean Gaussian process models we study compatible with well-understood computational techniques available in standard Gaussian process software packages, thereby making them accessible to practitioners.
    Foundation Posteriors for Approximate Probabilistic Inference. (arXiv:2205.09735v2 [cs.LG] UPDATED)
    Probabilistic programs provide an expressive representation language for generative models. Given a probabilistic program, we are interested in the task of posterior inference: estimating a latent variable given a set of observed variables. Existing techniques for inference in probabilistic programs often require choosing many hyper-parameters, are computationally expensive, and/or only work for restricted classes of programs. Here we formulate inference as masked language modeling: given a program, we generate a supervised dataset of variables and assignments, and randomly mask a subset of the assignments. We then train a neural network to unmask the random values, defining an approximate posterior distribution. By optimizing a single neural network across a range of programs we amortize the cost of training, yielding a "foundation" posterior able to do zero-shot inference for new programs. The foundation posterior can also be fine-tuned for a particular program and dataset by optimizing a variational inference objective. We show the efficacy of the approach, zero-shot and fine-tuned, on a benchmark of STAN programs.
    Orthonormal Expansions for Translation-Invariant Kernels. (arXiv:2206.08648v2 [math.CA] UPDATED)
    We present a general Fourier analytic technique for constructing orthonormal basis expansions of translation-invariant kernels from orthonormal bases of $\mathscr{L}_2(\mathbb{R})$. This allows us to derive explicit expansions on the real line for (i) Mat\'ern kernels of all half-integer orders in terms of associated Laguerre functions, (ii) the Cauchy kernel in terms of rational functions, and (iii) the Gaussian kernel in terms of Hermite functions.
    Dynamic Global Sensitivity for Differentially Private Contextual Bandits. (arXiv:2208.14555v1 [cs.LG])
    Bandit algorithms have become a reference solution for interactive recommendation. However, as such algorithms directly interact with users for improved recommendations, serious privacy concerns have been raised regarding its practical use. In this work, we propose a differentially private linear contextual bandit algorithm, via a tree-based mechanism to add Laplace or Gaussian noise to model parameters. Our key insight is that as the model converges during online update, the global sensitivity of its parameters shrinks over time (thus named dynamic global sensitivity). Compared with existing solutions, our dynamic global sensitivity analysis allows us to inject less noise to obtain $(\epsilon, \delta)$-differential privacy with added regret caused by noise injection in $\tilde O(\log{T}\sqrt{T}/\epsilon)$. We provide a rigorous theoretical analysis over the amount of noise added via dynamic global sensitivity and the corresponding upper regret bound of our proposed algorithm. Experimental results on both synthetic and real-world datasets confirmed the algorithm's advantage against existing solutions.
    Associative Learning for Network Embedding. (arXiv:2208.14376v1 [cs.LG] CROSS LISTED)
    The network embedding task is to represent the node in the network as a low-dimensional vector while incorporating the topological and structural information. Most existing approaches solve this problem by factorizing a proximity matrix, either directly or implicitly. In this work, we introduce a network embedding method from a new perspective, which leverages Modern Hopfield Networks (MHN) for associative learning. Our network learns associations between the content of each node and that node's neighbors. These associations serve as memories in the MHN. The recurrent dynamics of the network make it possible to recover the masked node, given that node's neighbors. Our proposed method is evaluated on different downstream tasks such as node classification and linkage prediction. The results show competitive performance compared to the common matrix factorization techniques and deep learning based methods.
    Batch-Size Independent Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms or Independent Arms. (arXiv:2208.14837v1 [cs.LG])
    In this paper, we study the combinatorial semi-bandits (CMAB) and focus on reducing the dependency of the batch-size $K$ in the regret bound, where $K$ is the total number of arms that can be pulled or triggered in each round. First, for the setting of CMAB with probabilistically triggered arms (CMAB-T), we discover a novel (directional) triggering probability and variance modulated (TPVM) condition that can replace the previously-used smoothness condition for various applications, such as cascading bandits, online network exploration and online influence maximization. Under this new condition, we propose a BCUCB-T algorithm with variance-aware confidence intervals and conduct regret analysis which reduces the $O(K)$ factor to $O(\log K)$ or $O(\log^2 K)$ in the regret bound, significantly improving the regret bounds for the above applications. Second, for the setting of non-triggering CMAB with independent arms, we propose a SESCB algorithm which leverages on the non-triggering version of the TPVM condition and completely removes the dependency on $K$ in the leading regret. As a valuable by-product, the regret analysis used in this paper can improve several existing results by a factor of $O(\log K)$. Finally, experimental evaluations show our superior performance compared with benchmark algorithms in different applications.
    Exploration in Linear Bandits with Rich Action Sets and its Implications for Inference. (arXiv:2207.11597v2 [cs.LG] UPDATED)
    We present a non-asymptotic lower bound on the eigenspectrum of the design matrix generated by any linear bandit algorithm with sub-linear regret when the action set has well-behaved curvature. Specifically, we show that the minimum eigenvalue of the expected design matrix grows as $\Omega(\sqrt{n})$ whenever the expected cumulative regret of the algorithm is $O(\sqrt{n})$, where $n$ is the learning horizon, and the action-space has a constant Hessian around the optimal arm. This shows that such action-spaces force a polynomial lower bound rather than a logarithmic lower bound, as shown by \cite{lattimore2017end}, in discrete (i.e., well-separated) action spaces. Furthermore, while the previous result is shown to hold only in the asymptotic regime (as $n \to \infty$), our result for these "locally rich" action spaces is any-time. Additionally, under a mild technical assumption, we obtain a similar lower bound on the minimum eigen value holding with high probability. We apply our result to two practical scenarios -- \emph{model selection} and \emph{clustering} in linear bandits. For model selection, we show that an epoch-based linear bandit algorithm adapts to the true model complexity at a rate exponential in the number of epochs, by virtue of our novel spectral bound. For clustering, we consider a multi agent framework where we show, by leveraging the spectral result, that no forced exploration is necessary -- the agents can run a linear bandit algorithm and estimate their underlying parameters at once, and hence incur a low regret.
    Sparsification of the regularized magnetic Laplacian with multi-type spanning forests. (arXiv:2208.14797v1 [cs.SI])
    In this paper, we consider a ${\rm U}(1)$-connection graph, that is, a graph where each oriented edge is endowed with a unit modulus complex number which is simply conjugated under orientation flip. A natural replacement for the combinatorial Laplacian is then the so-called magnetic Laplacian, an Hermitian matrix that includes information about the graph's connection. Connection graphs and magnetic Laplacians appear, e.g., in the problem of angular synchronization. In the context of large and dense graphs, we study here sparsifiers of the magnetic Laplacian, i.e., spectral approximations based on subgraphs with few edges. Our approach relies on sampling multi-type spanning forests (MTSFs) using a custom determinantal point process, a distribution over edges that favours diversity. In a word, an MTSF is a spanning subgraph whose connected components are either trees or cycle-rooted trees. The latter partially capture the angular inconsistencies of the connection graph, and thus provide a way to compress information contained in the connection. Interestingly, when this connection graph has weakly inconsistent cycles, samples of this distribution can be obtained by using a random walk with cycle popping. We provide statistical guarantees for a choice of natural estimators of the connection Laplacian, and investigate the practical application of our sparsifiers in two applications.
    Data fission: splitting a single data point. (arXiv:2112.11079v5 [stat.ME] UPDATED)
    Suppose we observe a random vector $X$ from some distribution $P$ in a known family with unknown parameters. We ask the following question: when is it possible to split $X$ into two parts $f(X)$ and $g(X)$ such that neither part is sufficient to reconstruct $X$ by itself, but both together can recover $X$ fully, and the joint distribution of $(f(X),g(X))$ is tractable? As one example, if $X=(X_1,\dots,X_n)$ and $P$ is a product distribution, then for any $m<n$, we can split the sample to define $f(X)=(X_1,\dots,X_m)$ and $g(X)=(X_{m+1},\dots,X_n)$. Rasines and Young (2021) offers an alternative route of accomplishing this task through randomization of $X$ with additive Gaussian noise which enables post-selection inference in finite samples for Gaussian distributed data and asymptotically for non-Gaussian additive models. In this paper, we offer a more general methodology for achieving such a split in finite samples by borrowing ideas from Bayesian inference to yield a (frequentist) solution that can be viewed as a continuous analog of data splitting. We call our method data fission, as an alternative to data splitting, data carving and p-value masking. We exemplify the method on a few prototypical applications, such as post-selection inference for trend filtering and other regression problems.
    ARMA Cell: A Modular and Effective Approach for Neural Autoregressive Modeling. (arXiv:2208.14919v1 [cs.LG])
    The autoregressive moving average (ARMA) model is a classical, and arguably one of the most studied approaches to model time series data. It has compelling theoretical properties and is widely used among practitioners. More recent deep learning approaches popularize recurrent neural networks (RNNs) and, in particular, long short-term memory (LSTM) cells that have become one of the best performing and most common building blocks in neural time series modeling. While advantageous for time series data or sequences with long-term effects, complex RNN cells are not always a must and can sometimes even be inferior to simpler recurrent approaches. In this work, we introduce the ARMA cell, a simpler, modular, and effective approach for time series modeling in neural networks. This cell can be used in any neural network architecture where recurrent structures are present and naturally handles multivariate time series using vector autoregression. We also introduce the ConvARMA cell as a natural successor for spatially-correlated time series. Our experiments show that the proposed methodology is competitive with popular alternatives in terms of performance while being more robust and compelling due to its simplicity.
    Infinite-Dimensional Sparse Learning in Linear System Identification. (arXiv:2203.14731v2 [eess.SY] UPDATED)
    Regularized methods have been widely applied to system identification problems without known model structures. This paper proposes an infinite-dimensional sparse learning algorithm based on atomic norm regularization. Atomic norm regularization decomposes the transfer function into first-order atomic models and solves a group lasso problem that selects a sparse set of poles and identifies the corresponding coefficients. The difficulty in solving the problem lies in the fact that there are an infinite number of possible atomic models. This work proposes a greedy algorithm that generates new candidate atomic models maximizing the violation of the optimality condition of the existing problem. This algorithm is able to solve the infinite-dimensional group lasso problem with high precision. The algorithm is further extended to reduce the bias and reject false positives in pole location estimation by iteratively reweighted adaptive group lasso and complementary pairs stability selection respectively. Numerical results demonstrate that the proposed algorithm performs better than benchmark parameterized and regularized methods in terms of both impulse response fitting and pole location estimation.  ( 2 min )
    Minimax Optimal Quantization of Linear Models: Information-Theoretic Limits and Efficient Algorithms. (arXiv:2202.11277v2 [cs.IT] UPDATED)
    High-dimensional models often have a large memory footprint and must be quantized after training before being deployed on resource-constrained edge devices for inference tasks. In this work, we develop an information-theoretic framework for the problem of quantizing a linear regressor learned from training data $(\mathbf{X}, \mathbf{y})$, for some underlying statistical relationship $\mathbf{y} = \mathbf{X}\boldsymbol{\theta} + \mathbf{v}$. The learned model, which is an estimate of the latent parameter $\boldsymbol{\theta} \in \mathbb{R}^d$, is constrained to be representable using only $Bd$ bits, where $B \in (0, \infty)$ is a pre-specified budget and $d$ is the dimension. We derive an information-theoretic lower bound for the minimax risk under this setting and propose a matching upper bound using randomized embedding-based algorithms which is tight up to constant factors. The lower and upper bounds together characterize the minimum threshold bit-budget required to achieve a performance risk comparable to the unquantized setting. We also propose randomized Hadamard embeddings that are computationally efficient and are optimal up to a mild logarithmic factor of the lower bound. Our model quantization strategy can be generalized and we show its efficacy by extending the method and upper-bounds to two-layer ReLU neural networks for non-linear regression. Numerical simulations show the improved performance of our proposed scheme as well as its closeness to the lower bound.  ( 3 min )
    Statistical embedding: Beyond principal components. (arXiv:2106.01858v2 [stat.ML] UPDATED)
    There has been an intense recent activity in embedding of very high dimensional and nonlinear data structures, much of it in the data science and machine learning literature. We survey this activity in four parts. In the first part we cover nonlinear methods such as principal curves, multidimensional scaling, local linear methods, ISOMAP, graph based methods and diffusion mapping, kernel based methods and random projections. The second part is concerned with topological embedding methods, in particular mapping topological properties into persistence diagrams and the Mapper algorithm. Another type of data sets with a tremendous growth is very high-dimensional network data. The task considered in part three is how to embed such data in a vector space of moderate dimension to make the data amenable to traditional techniques such as cluster and classification techniques. Arguably this is the part where the contrast between algorithmic machine learning methods and statistical modeling, the so-called stochastic block modeling, is at its greatest. In the paper, we discuss the pros and cons for the two approaches. The final part of the survey deals with embedding in $\mathbb{R}^ 2$, i.e. visualization. Three methods are presented: $t$-SNE, UMAP and LargeVis based on methods in parts one, two and three, respectively. The methods are illustrated and compared on two simulated data sets; one consisting of a triplet of noisy Ranunculoid curves, and one consisting of networks of increasing complexity generated with stochastic block models and with two types of nodes.  ( 3 min )
    Towards Deployment-Efficient Reinforcement Learning: Lower Bound and Optimality. (arXiv:2202.06450v3 [cs.LG] UPDATED)
    Deployment efficiency is an important criterion for many real-world applications of reinforcement learning (RL). Despite the community's increasing interest, there lacks a formal theoretical formulation for the problem. In this paper, we propose such a formulation for deployment-efficient RL (DE-RL) from an "optimization with constraints" perspective: we are interested in exploring an MDP and obtaining a near-optimal policy within minimal \emph{deployment complexity}, whereas in each deployment the policy can sample a large batch of data. Using finite-horizon linear MDPs as a concrete structural model, we reveal the fundamental limit in achieving deployment efficiency by establishing information-theoretic lower bounds, and provide algorithms that achieve the optimal deployment efficiency. Moreover, our formulation for DE-RL is flexible and can serve as a building block for other practically relevant settings; we give "Safe DE-RL" and "Sample-Efficient DE-RL" as two examples, which may be worth future investigation.  ( 2 min )
    Multiscale Non-stationary Causal Structure Learning from Time Series Data. (arXiv:2208.14989v1 [cs.LG])
    This paper introduces a new type of causal structure, namely multiscale non-stationary directed acyclic graph (MN-DAG), that generalizes DAGs to the time-frequency domain. Our contribution is twofold. First, by leveraging results from spectral and causality theories, we expose a novel probabilistic generative model, which allows to sample an MN-DAG according to user-specified priors concerning the time-dependence and multiscale properties of the causal graph. Second, we devise a Bayesian method for the estimation of MN-DAGs, by means of stochastic variational inference (SVI), called Multiscale Non-Stationary Causal Structure Learner (MN-CASTLE). In addition to direct observations, MN-CASTLE exploits information from the decomposition of the total power spectrum of time series over different time resolutions. In our experiments, we first use the proposed model to generate synthetic data according to a latent MN-DAG, showing that the data generated reproduces well-known features of time series in different domains. Then we compare our learning method MN-CASTLE against baseline models on synthetic data generated with different multiscale and non-stationary settings, confirming the good performance of MN-CASTLE. Finally, we show some insights derived from the application of MN-CASTLE to study the causal structure of 7 global equity markets during the Covid-19 pandemic.  ( 3 min )
    Few-shot Adaptive Object Detection with Cross-Domain CutMix. (arXiv:2208.14586v1 [cs.CV])
    In object detection, data amount and cost are a trade-off, and collecting a large amount of data in a specific domain is labor intensive. Therefore, existing large-scale datasets are used for pre-training. However, conventional transfer learning and domain adaptation cannot bridge the domain gap when the target domain differs significantly from the source domain. We propose a data synthesis method that can solve the large domain gap problem. In this method, a part of the target image is pasted onto the source image, and the position of the pasted region is aligned by utilizing the information of the object bounding box. In addition, we introduce adversarial learning to discriminate whether the original or the pasted regions. The proposed method trains on a large number of source images and a few target domain images. The proposed method achieves higher accuracy than conventional methods in a very different domain problem setting, where RGB images are the source domain, and thermal infrared images are the target domain. Similarly, the proposed method achieves higher accuracy in the cases of simulation images to real images.  ( 2 min )
    Efficient CDF Approximations for Normalizing Flows. (arXiv:2202.11322v2 [cs.LG] UPDATED)
    Normalizing flows model a complex target distribution in terms of a bijective transform operating on a simple base distribution. As such, they enable tractable computation of a number of important statistical quantities, particularly likelihoods and samples. Despite these appealing properties, the computation of more complex inference tasks, such as the cumulative distribution function (CDF) over a complex region (e.g., a polytope) remains challenging. Traditional CDF approximations using Monte-Carlo techniques are unbiased but have unbounded variance and low sample efficiency. Instead, we build upon the diffeomorphic properties of normalizing flows and leverage the divergence theorem to estimate the CDF over a closed region in target space in terms of the flux across its \emph{boundary}, as induced by the normalizing flow. We describe both deterministic and stochastic instances of this estimator: while the deterministic variant iteratively improves the estimate by strategically subdividing the boundary, the stochastic variant provides unbiased estimates. Our experiments on popular flow architectures and UCI benchmark datasets show a marked improvement in sample efficiency as compared to traditional estimators.  ( 2 min )
    Dual Representation Learning for One-Step Clustering of Multi-View Data. (arXiv:2208.14450v1 [cs.LG])
    Multi-view data are commonly encountered in data mining applications. Effective extraction of information from multi-view data requires specific design of clustering methods to cater for data with multiple views, which is non-trivial and challenging. In this paper, we propose a novel one-step multi-view clustering method by exploiting the dual representation of both the common and specific information of different views. The motivation originates from the rationale that multi-view data contain not only the consistent knowledge between views but also the unique knowledge of each view. Meanwhile, to make the representation learning more specific to the clustering task, a one-step learning framework is proposed to integrate representation learning and clustering partition as a whole. With this framework, the representation learning and clustering partition mutually benefit each other, which effectively improve the clustering performance. Results from extensive experiments conducted on benchmark multi-view datasets clearly demonstrate the superiority of the proposed method.  ( 2 min )
    Partial recovery and weak consistency in the non-uniform hypergraph Stochastic Block Model. (arXiv:2112.11671v2 [math.ST] UPDATED)
    We consider the community detection problem in sparse random hypergraphs under the non-uniform hypergraph stochastic block model (HSBM), a general model of random networks with community structure and higher-order interactions. When the random hypergraph has bounded expected degrees, we provide a spectral algorithm that outputs a partition with at least a $\gamma$ fraction of the vertices classified correctly, where $\gamma\in (0.5,1)$ depends on the signal-to-noise ratio (SNR) of the model. When the SNR grows slowly as the number of vertices goes to infinity, our algorithm achieves weak consistency, which improves the previous results in Ghoshdastidar and Dukkipati (2017) for non-uniform HSBMs. Our spectral algorithm consists of three major steps: (1) Hyperedge selection: select hyperedges of certain sizes to provide the maximal signal-to-noise ratio for the induced sub-hypergraph; (2) Spectral partition: construct a regularized adjacency matrix and obtain an approximate partition based on singular vectors; (3) Correction and merging: incorporate the hyperedge information from adjacency tensors to upgrade the error rate guarantee. The theoretical analysis of our algorithm relies on the concentration and regularization of the adjacency matrix for sparse non-uniform random hypergraphs, which can be of independent interest.  ( 2 min )
    A regime switching on Covid19 analysis and prediction in Romania. (arXiv:2007.13494v3 [q-bio.PE] UPDATED)
    In this paper we propose a three stages analysis of the evolution of Covid19 in Romania. There are two main issues when it comes to pandemic prediction. The first one is the fact that the numbers reported of infected and recovered are unreliable, however the number of deaths is more accurate. The second issue is that there were many factors which affected the evolution of the pandemic. In this paper we propose an analysis in three stages. The first stage is based on the classical SIR model which we do using a neural network. This provides a first set of daily parameters. In the second stage we propose a refinement of the SIR model in which we separate the deceased into a distinct category. By using the first estimate and a grid search, we give a daily estimation of the parameters. The third stage is used to define a notion of turning points (local extremes) for the parameters. We call a regime the time between these points. We outline a general way based on time varying parameters of SIRD to make predictions.  ( 3 min )
    Projection Pursuit Gaussian Process Regression. (arXiv:2004.00667v2 [stat.ML] UPDATED)
    A primary goal of computer experiments is to reconstruct the function given by the computer code via scattered evaluations. Traditional isotropic Gaussian process models suffer from the curse of dimensionality, when the input dimension is relatively high given limited data points. Gaussian process models with additive correlation functions are scalable to dimensionality, but they are more restrictive as they only work for additive functions. In this work, we consider a projection pursuit model, in which the nonparametric part is driven by an additive Gaussian process regression. We choose the dimension of the additive function higher than the original input dimension, and call this strategy "dimension expansion". We show that dimension expansion can help approximate more complex functions. A gradient descent algorithm is proposed for model training based on the maximum likelihood estimation. Simulation studies show that the proposed method outperforms the traditional Gaussian process models. The Supplementary Materials are available online.  ( 2 min )
    Truncated Matrix Power Iteration for Differentiable DAG Learning. (arXiv:2208.14571v1 [cs.LG])
    Recovering underlying Directed Acyclic Graph structures (DAG) from observational data is highly challenging due to the combinatorial nature of the DAG-constrained optimization problem. Recently, DAG learning has been cast as a continuous optimization problem by characterizing the DAG constraint as a smooth equality one, generally based on polynomials over adjacency matrices. Existing methods place very small coefficients on high-order polynomial terms for stabilization, since they argue that large coefficients on the higher-order terms are harmful due to numeric exploding. On the contrary, we discover that large coefficients on higher-order terms are beneficial for DAG learning, when the spectral radiuses of the adjacency matrices are small, and that larger coefficients for higher-order terms can approximate the DAG constraints much better than the small counterparts. Based on this, we propose a novel DAG learning method with efficient truncated matrix power iteration to approximate geometric series-based DAG constraints. Empirically, our DAG learning method outperforms the previous state-of-the-arts in various settings, often by a factor of 3 or more in terms of structural Hamming distance.  ( 2 min )
    Multivariate Analysis for Multiple Network Data via Semi-Symmetric Tensor PCA. (arXiv:2202.04719v2 [stat.ML] UPDATED)
    Network data are commonly collected in a variety of applications, representing either directly measured or statistically inferred connections between features of interest. In an increasing number of domains, these networks are collected over time, such as interactions between users of a social media platform on different days, or across multiple subjects, such as in multi-subject studies of brain connectivity. When analyzing multiple large networks, dimensionality reduction techniques are often used to embed networks in a more tractable low-dimensional space. To this end, we develop a framework for principal components analysis (PCA) on collections of networks via a specialized tensor decomposition we term Semi-Symmetric Tensor PCA or SS-TPCA. We derive computationally efficient algorithms for computing our proposed SS-TPCA decomposition and establish statistical efficiency of our approach under a standard low-rank signal plus noise model. Remarkably, we show that SS-TPCA achieves the same estimation accuracy as classical matrix PCA, with error proportional to the square root of the number of vertices in the network and not the number of edges as might be expected. Our framework inherits many of the strengths of classical PCA and is suitable for a wide range of unsupervised learning tasks, including identifying principal networks, isolating meaningful changepoints or outlying observations, and for characterizing the "variability network" of the most varying edges. Finally, we demonstrate the effectiveness of our proposal on simulated data and on an example from empirical legal studies. The techniques used to establish our main consistency results are surprisingly straightforward and may find use in a variety of other network analysis problems.  ( 3 min )
    Fine-Grained Distribution-Dependent Learning Curves. (arXiv:2208.14615v1 [cs.LG])
    Learning curves plot the expected error of a learning algorithm as a function of the number of labeled input samples. They are widely used by machine learning practitioners as a measure of an algorithm's performance, but classic PAC learning theory cannot explain their behavior. In this paper we introduce a new combinatorial characterization called the VCL dimension that improves and refines the recent results of Bousquet et al. (2021). Our characterization sheds new light on the structure of learning curves by providing fine-grained bounds, and showing that for classes with finite VCL, the rate of decay can be decomposed into a linear component that depends only on the hypothesis class and an exponential component that depends also on the target distribution. In particular, the finer nuance of the VCL dimension implies lower bounds that are quantitatively stronger than the bounds of Bousquet et al. (2021) and qualitatively stronger than classic 'no free lunch' lower bounds. The VCL characterization solves an open problem studied by Antos and Lugosi (1998), who asked in what cases such lower bounds exist. As a corollary, we recover their lower bound for half-spaces in $\mathbb{R}^d$, and we do so in a principled way that should be applicable to other cases as well. Finally, to provide another viewpoint on our work and how it compares to traditional PAC learning bounds, we also present an alternative formulation of our results in a language that is closer to the PAC setting.  ( 3 min )
    Positive-Unlabeled Learning with Uncertainty-aware Pseudo-label Selection. (arXiv:2201.13192v2 [stat.ML] UPDATED)
    Positive-unlabeled (PU) learning aims at learning a binary classifier from only positive and unlabeled training data. Recent approaches addressed this problem via cost-sensitive learning by developing unbiased loss functions, and their performance was later improved by iterative pseudo-labeling solutions. However, such two-step procedures are vulnerable to incorrectly estimated pseudo-labels, as errors are propagated in later iterations when a new model is trained on erroneous predictions. To prevent such confirmation bias, we propose PUUPL, a novel loss-agnostic training procedure for PU learning that incorporates epistemic uncertainty in pseudo-label selection. By using an ensemble of neural networks and assigning pseudo-labels based on low-uncertainty predictions, we show that PUUPL improves the reliability of pseudo-labels, increasing the predictive performance of our method and leading to new state-of-the-art results in self-training for PU learning. With extensive experiments, we show the effectiveness of our method over different datasets, modalities, and learning tasks, as well as improved calibration, robustness over prior misspecifications, biased positive data, and imbalanced datasets.  ( 2 min )
    Uncertainty Quantification and Experimental Design for Large-Scale Linear Inverse Problems under Gaussian Process Priors. (arXiv:2109.03457v4 [stat.ML] UPDATED)
    We consider the use of Gaussian process (GP) priors for solving inverse problems in a Bayesian framework. As is well known, the computational complexity of GPs scales cubically in the number of datapoints. We here show that in the context of inverse problems involving integral operators, one faces additional difficulties that hinder inversion on large grids. Furthermore, in that context, covariance matrices can become too large to be stored. By leveraging results about sequential disintegrations of Gaussian measures, we are able to introduce an implicit representation of posterior covariance matrices that reduces the memory footprint by only storing low rank intermediate matrices, while allowing individual elements to be accessed on-the-fly without needing to build full posterior covariance matrices. Moreover, it allows for fast sequential inclusion of new observations. These features are crucial when considering sequential experimental design tasks. We demonstrate our approach by computing sequential data collection plans for excursion set recovery for a gravimetric inverse problem, where the goal is to provide fine resolution estimates of high density regions inside the Stromboli volcano, Italy. Sequential data collection plans are computed by extending the weighted integrated variance reduction (wIVR) criterion to inverse problems. Our results show that this criterion is able to significantly reduce the uncertainty on the excursion volume, reaching close to minimal levels of residual uncertainty. Overall, our techniques allow the advantages of probabilistic models to be brought to bear on large-scale inverse problems arising in the natural sciences.  ( 3 min )
    Data-Driven Chance Constrained AC-OPF using Hybrid Sparse Gaussian Processes. (arXiv:2208.14814v1 [eess.SY])
    The alternating current (AC) chance-constrained optimal power flow (CC-OPF) problem addresses the economic efficiency of electricity generation and delivery under generation uncertainty. The latter is intrinsic to modern power grids because of the high amount of renewables. Despite its academic success, the AC CC-OPF problem is highly nonlinear and computationally demanding, which limits its practical impact. For improving the AC-OPF problem complexity/accuracy trade-off, the paper proposes a fast data-driven setup that uses the sparse and hybrid Gaussian processes (GP) framework to model the power flow equations with input uncertainty. We advocate the efficiency of the proposed approach by a numerical study over multiple IEEE test cases showing up to two times faster and more accurate solutions compared to the state-of-the-art methods.  ( 2 min )
    Lazy Online Gradient Descent is Universal on Polytopes. (arXiv:2004.01739v2 [cs.LG] UPDATED)
    We prove the familiar Lazy Online Gradient Descent algorithm is universal on polytope domains. That means it gets $O(1)$ pseudo-regret against i.i.d opponents, while simultaneously achieving the well-known $O(\sqrt N)$ worst-case regret bound. For comparison the bulk of the literature focuses on variants of the Hedge (exponential weights) algorithm on the simplex. These can in principle be lifted to general polytopes; however the process is computationally unfeasible for many important classes where the number of vertices grows quickly with the dimension. The lifting procedure also ignores any Euclidean bounds on the cost vectors, and can create extra factors of dimension in the pseudo-regret bound. Gradient Descent is simpler than the handful of purpose-built algorithms for polytopes in the literature, and works in a broader setting. In particular existing algorithms assume the optimiser is unique, while our bound allows for several optimal vertices.  ( 2 min )

  • Open

    multioutput quantile regression with boosted trees [D] [R]
    Hi guys Im trying to use xgboost for quantile regression, and I've seen and tried few scripts that have modified objective functions. They are, however, prepared and tested for single output predictions. As a result, something breaks (i think its because of 2 dimensional prediction/test data. So far my modifications haven't been fruitful. I'm wondering whether anyone has used quantial multioutput regression, and if yes, what was your Setup like thank you for attention ] submitted by /u/ncuxomun [link] [comments]  ( 88 min )
    [D] The Toxic Culture of Rejection in Computer Science
    I have read an interesting article https://sigbed.org/2022/08/22/the-toxic-culture-of-rejection-in-computer-science/ . It says that the high rejection rate cannot help the community. It is true to reject some papers to guarantee quality. However, the current rejection ratio is too high such that many papers are rejected for trival reasons (e.g. lack of novelty). I have personally received many rejections due to no reason (they just want to control the ratio to make them looks highly selective). I think currently, due to too many submissions, the reviewer team cannot pick really "good" papers within a very limited acceptance ratio. submitted by /u/singularpanda [link] [comments]  ( 109 min )
    US Gov imposes export requirements on NVIDIA A100s and future H100s to China and Russia
    According to this SEC filing, the US government has instituted a new license requirement for exports to China or Russia of any NVIDIA GPUs that are as good or better than the A100. The motivation is supposedly to prevent possible military uses. Seems the collateral damage could be a blow to Chinese ML research moving forward, considering the massive reliance on NVIDIA GPUs currently: The Company’s outlook for its third fiscal quarter provided on August 24, 2022 included approximately $400 million in potential sales to China which may be subject to the new license requirement if customers do not want to purchase the Company’s alternative product offerings or if the USG does not grant licenses in a timely manner or denies licenses to significant customers. submitted by /u/dojoteef [link] [comments]  ( 91 min )
    [D] Best Resources for Training Data?
    I'm experimenting with some image and video ML models and I'm not sure where to find good labelled or annotated data. Please share any resources you know of. Willing to pay, but would prefer something inexpensive or free if it's available! submitted by /u/memorycardfull [link] [comments]  ( 88 min )
    Discriminator model that can determine if two objects are the same or different? [D]
    Some cases for "the same or different" (because every object is technically different from another in some way, so specific cases must be made) could be car make and models, people, satellite images of the same location etc. Generally whats a good place to look for an answer to such a problem if there is no specific model out there? submitted by /u/UncleSammmm [link] [comments]  ( 88 min )
    [R] Faithful Reasoning Using Large Language Models - DeepMind 2022 - Avoids making up facts or recall hidden context from weights and can recognize when its logic isn't sound or doesn't know the answer! + Many sotas regarding reasoning depth!
    Paper: https://arxiv.org/abs/2208.14271 Abstract: Although contemporary large language models (LMs) demonstrate impressive question-answering capabilities, their answers are typically the product of a single call to the model. This entails an unwelcome degree of opacity and compromises performance, especially on problems that are inherently multi-step. To address these limitations, we show how LMs can be made to perform faithful multi-step reasoning via a process whose causal structure mirrors the underlying logical structure of the problem. Our approach works by chaining together reasoning steps, where each step results from calls to two fine-tuned LMs, one for selection and one for inference, to produce a valid reasoning trace. Our method carries out a beam search through the space of reasoning traces to improve reasoning quality. We demonstrate the effectiveness of our model on multi-step logical deduction and scientific question-answering, showing that it outperforms baselines on final answer accuracy, and generates humanly interpretable reasoning traces whose validity can be checked by the user. https://preview.redd.it/cn9wutrwz3l91.jpg?width=938&format=pjpg&auto=webp&s=46af908a212ef92f2409c1d9e595f9e7542407c3 https://preview.redd.it/w4s6kerwz3l91.jpg?width=1294&format=pjpg&auto=webp&s=9c22997d7c92f281eb0a1fcec7a6cc39c524653e https://preview.redd.it/gpxyrerwz3l91.jpg?width=509&format=pjpg&auto=webp&s=46900894e0495253fea372abb73ca5da376185ac https://preview.redd.it/s3soierwz3l91.jpg?width=912&format=pjpg&auto=webp&s=5f88da0b0a0e187d886f255156d49b562975c97b https://preview.redd.it/qxgrifrwz3l91.jpg?width=670&format=pjpg&auto=webp&s=534dd5cac6d810cfe772f643eeacd4fdc987e198 https://preview.redd.it/d1dt3n3c04l91.jpg?width=1069&format=pjpg&auto=webp&s=e0f889cbdc3a43b15e820cba0b62b25a8a4b0933 https://preview.redd.it/dnbvyerwz3l91.jpg?width=1417&format=pjpg&auto=webp&s=b8e66f05296780ddb6b4beaa704ff6829ec80de7 submitted by /u/Singularian2501 [link] [comments]  ( 89 min )
    [P] Little Learning Machines, A game that uses REAL Machine Learning
    Hello r/MachineLearning !! We have rebranded!! Introducing… drumroll… 🤖 LITTLE LEARNING MACHINES 🤖 (Formerly: Animo Island) The reinforcement learning game where you train cute AI robots with love and fear to do all sorts of actions! With unlimited possibilities, you can train your little machine to do nearly anything. In our cozy artificial life simulator YOU experiment with the power of A.I. and reinforcement learning, your little robots are actually learning from their wins and their mistakes making no two machines alike. We are excited to be back on Reddit and can't wait to see all of the cool projects you are working on! And what do you all think of our new logo? PS. Currently working on getting our Steam page up and running so keep your eyes peeled this week! 👀 https://i.redd.it/l2l165es04l91.gif submitted by /u/LoveFearLearn [link] [comments]  ( 89 min )
    [D] How to get the best cost and latency on ML workloads from a mix of GPUs and CPUs.
    Here's a link to the post discussing this topic - https://exafunction.com/blog/are-gpus-worth-it At Exafunction, we've noticed a lot of companies CPUs for machine learning inference workloads. We wrote this post to add some color on this and explain why generally GPUs, if managed properly, are the right hardware for these workloads. submitted by /u/varunkmohan [link] [comments]  ( 88 min )
    [Project] I wanna keylog myself... for science!
    Hello all, I'm working on a machine learning/datascience project. I thought to myself, why should Meta and Google hoard and sell all my data when that benefits me nothing? So I thought I'd use my skills in machine learning and take matters into my own hands. I'd like to log every piece of data that goes into my PC and phone. Web browser history, keystrokes, chats, content of web pages, screenshots, mouse movements. I'm very interested in what I could learn from this. And I'd have a detailed diary of my life so that in twenty years I can go back and know exactly what I did that day. What do you guys think is worth logging and what could I do with this data? submitted by /u/PMMEYOURSMIL3 [link] [comments]  ( 91 min )
    [D] Best NLP to Identify Word Association/Relationship
    Hi there, Happy wonderful Wednesday! I had a quick question I was hoping the community could help with some personal stories or suggestions? I'm trying to figure out what's considered the best NLP technique or algo when you want to find all the words associated with a term across posts. For example: Say we have the sentence: "the treatment has been rather difficult because I've experienced some horrible side effects such as headaches". Now we want to figure out what someone thinks about the treatment, the answer should be difficult. I've had a lot of great results using dependency parsing but wanted to know if there were any other really great techniques that I haven't tried, maybe BERT, Word2Vec etc. Thank you very much and all the best, N submitted by /u/Gyllenspetz [link] [comments]  ( 108 min )
    [Discussion] On the question of giving a catchy title to a ML research paper
    Hey guys, I'm currently in the process of writing an ML paper, and I hesitate for the title of my paper between a factual neutral title and a more catchy one. One of my options is the title: "One graph to solve them all" which is a direct reference to lord of the rings. The fact is that this title fits perfectly with the subject of the paper. However, I don't want to follow the line of random papers that have taken over the name of known papers. ( ex: -My niche concept that nobody will never heard of- is all you need ... ) What do you think of these kinds of titles? do you consider that research should remain as much neutral as possible ? submitted by /u/Zealousideal-Ice9957 [link] [comments]  ( 90 min )
    [P] Machine Learning approaches in Fraud Detection
    Different methods can be used for this complex task. Most of them use machine learning, deep and neural networks. Usually, one big problem is working with imbalanced datasets. Some algorithms such as Smote and Adasyn could resolve the data skewness. One very important aspect of fraud detection is model explainability and rule extraction from the trained model. Here I show how to apply these concepts to real data. https://medium.com/data-reply-it-datatech/imbalanced-classification-in-fraud-detection-8f63474ff8c7 For you which is the best approach? How important is a model like that? Let's discuss it. https://preview.redd.it/fver03haf2l91.jpg?width=840&format=pjpg&auto=webp&s=830a60b0f45ffc216a622d92b2f00d719adf8827 submitted by /u/AlexRava7 [link] [comments]  ( 88 min )
    [D] Will continuous model development inevitably lead to data leakage and overfitting?
    In a naive ML project one might split the dataset into train - test to train a model and get a sense of its performance, either during or after training. The problem arises when you want to improve the model continuously. The test set will no longer give an unbiased measure of model performance since you have improved your model to achieve as good a performance as possible on the test set. So we introduce the validation set (dataset -> train, validation, test). Now we can continuously improve our model, evaluate performance on the validation set as much as we want and once we feel happy we will evaluate on the test set. Except we realize we actually wanted to improve the model further a few times. To confirm that the new model is better we also inference on the test set. After all, we would not want to put our new model in production if it performs worse on the test set compared to the old model would we? So the test set once again influences our model selection/development and will therefore be biased. We can introduce a fourth partition, or a fifth to test more seldom on more and more unbiased data after iterative development on the previous partitions. One solution could be to perform k-fold or nested cross validation, but as you continuously want to improve your models after having used the test data, the test samples will once again have been used to select the final model. We want unbiased results when evaluating and comparing models, but comparing models multiple times to select the best one will make our data biased. Is this truly a dichotomy? submitted by /u/zimonitrome [link] [comments]  ( 92 min )
    [D] Best tool for scientific research and HCI
    Hi, I recently discovered that stable diffusion img2img can be used for designing interfaces, which could be interesting in human computer interaction. Any other interesting projects that can be used in scientific research and HCI? submitted by /u/Flippynips987 [link] [comments]  ( 88 min )
    [R] Timeout error in Gunicorn (WSGI) in Classification Model
    BLS/BLA Classifier Model is getting trained, post which it’s getting terminated with a timeout error without displaying results (running with WSGI and gunicorn). When the model is executed with a single file of inputs in a standalone mode without gunicorn, the model is displaying the desired results. But when the model is executed with gunicorn, it’s getting abruptly terminated without printing the results. The Error which we get is as follows: Timeout Error from Log (Screenshot/Snippet): PostMAN : https://preview.redd.it/h8e0l57tw1l91.png?width=1495&format=png&auto=webp&s=f95ee82f6bcdd743fea3979092f07f11c61b791f Gunicorn std_err: https://preview.redd.it/y9b14tkvw1l91.png?width=1919&format=png&auto=webp&s=c6f0d95731d5473566c18a66cf46e1916246bf7b submitted by /u/ActualRecover1639 [link] [comments]  ( 88 min )
    [R] Layer Recycling and Fine-tuning Efficiency
    Model fine-tuning allows you to improve the quality of the pre-trained models with just a fraction of the resources spent on training the original model. We've run several experiments to test the number of layers to fine-tune. Here are the results: https://qdrant.tech/articles/embedding-recycler/ There is a trade-off between the number of layers you tune and the precision you get. Using fewer layers allows for faster training with a larger batch size, while more layers increase the model's capacity. Highlights from the experiments: 1. Training only the head of a model (5% of weights) gives a 2x boost on metrics, while full training—only 3x. 2. Training only a head layer allows using larger models with bigger batch sizes, compensating for the precision. 3. If you only have a small dataset, full model tuning will give a more negligible effect. submitted by /u/devzaya [link] [comments]  ( 106 min )
    [R] Best website to make figures for a machine learning paper
    Hello, I’m actually writing a paper and I ´m looking for a website than help me do Machine learning figures. Any suggestions ? submitted by /u/Meddhouib10 [link] [comments]  ( 90 min )
    [P] Evaluate 65,000 models across 10,000 datasets on the Hugging Face Hub!
    Hi there, it's Lewis from the open-source team at Hugging Face 👋 I'm heaps stoked to share a project we've been working on to enable large-scale evaluation of over 65k models on the Hugging Face Hub 🤯! Evaluation on the Hub provides a tool to evaluate models directly from the Hub, as well as leaderboards to quickly find the best performing models: ​ Model evaluator: https://huggingface.co/spaces/autoevaluate/model-evaluator?dataset=glue Leaderboards: https://huggingface.co/spaces/autoevaluate/leaderboards?dataset=glue You can also reach these tools directly from buttons provide on every dataset pages on the Hub: https://preview.redd.it/cl5iaf8ua1l91.png?width=2698&format=png&auto=webp&s=0683edb0a95626472ac3fc66dac95daa557ef961 We've just added support for all GLUE tasks like MNLI and MRPC, and we also support computer vision tasks like image classification. You can find more details in the blog post from the launch: https://huggingface.co/blog/eval-on-the-hub We'd love to hear what the community thinks about this feature (or what an evaluation wish list would look like) - feel free to drop a comment below! submitted by /u/lewtun [link] [comments]  ( 90 min )
    [D] Causal Machine Learning: A Survey and Open Problems
    (https://arxiv.org/abs/2206.15475) Causal Machine Learning (CausalML) is an umbrella term for machine learning methods that formalize the data-generation process as a structural causal model (SCM). This perspective enables us to reason about the effects of changes to this process (interventions) and what would have happened in hindsight (counterfactuals). We categorize work in CausalML into five groups according to the problems they address: (1) causal supervised learning, (2) causal generative modeling, (3) causal explanations, (4) causal fairness, and (5) causal reinforcement learning. We systematically compare the methods in each category and point out open problems. Further, we review data-modality-specific applications in computer vision, natural language processing, and graph representation learning. Finally, we provide an overview of causal benchmarks and a critical discussion of the state of this nascent field, including recommendations for future work. ​ Here's a link to our website, and tweet https://www.causal-machine-learning.com/ https://twitter.com/aengus_lynch1/status/1558128460102602757 submitted by /u/its_the_goose [link] [comments]  ( 89 min )
    [P] Object Detection for Security Camera Images
    Hello everyone, I am currently looking at creating a false alert filtering and intelligent detection software for security cameras. This object detection software that can look at pictures in near real time and filter them (keeping only human and vehicles). Basic flow: Security cameras generate 30-second intrusion detection, line crossing, and loitering alerts These alerts are saved in different folders based on source location Sparse images are taken out of each alert, video is discarded Object detection and classification tool runs, remove all false alerts, leaving only humans and vehicles, saving one image per alert, all others are deleted Based on ruleset for that particular source location/folder further detection is done (no PPE, restricted area, wearing shorts, generator off, water present, light flashing, etc.) Results are stored in a database and displayed in a GUI showing all incoming alerts categorized by source location, alert type and object detected Alert videos are generated by outdoor security cameras installed at construction sites, malls, warehouses, parking lots, etc. (about 3,000 cameras and growing). We have over a million actual and relevant images saved that can be used to train the model. I have figured out everything else and have been doing some research on OpenCV, YOLOV7, Detectron2, etc. and would love to get some guidance on selecting the best framework and tools from experienced members for the object detection and categorization. Any and all advice is appreciated. Thanks a bunch in advance submitted by /u/DesignerWatercress99 [link] [comments]  ( 93 min )
    [D] Can GPT3's few-shot learning replace supervised training? experiment in biomedical domain
    Thinking about GPT-3 In-Context Learning for Biomedical IE? Think Again TL;DR: GPT-3 few-shot (100 examples) performed worse than supervised "small"-size transformers (BERT, RoBERTa) on named-entity recognition and relation extraction in the biomedical domain. A few additional issues I can see with using LLM for classification include: Temperature (and other hyper parameters) need to be tuned. This requires additional (albeit small) validation set, so no longer actual zero-shot classification. The obscurity of the association between prompt design and output make it a trial-and-error ("black art") process, again requiring additional training data. Essentially we replace data labeling with prompt designing. Efficiency: often classification is used for "needle in a haystack" goals (to avoid the cost of human effort). LLM are expensive and can easily reach $50-100K for running the classifier on a production population. If many prompts are needed (as in the article), this adds up to the token-based pricing of the model. Any real-world experience/input on using GPT-3 et al for classification? submitted by /u/CacheMeUp [link] [comments]  ( 89 min )
    [N] Google Colab Pro is switching to a “compute credits” model.
    submitted by /u/hardmaru [link] [comments]  ( 95 min )
  • Open

    Nashwat Kawnia نشوة كونية: AI Manifest | 4K UHD | 60 FPS
    submitted by /u/Available_Tadpole829 [link] [comments]  ( 87 min )
    In Latest Machine Learning Research, A Group at CMU Release a Simple and Efficient Implementation of Recurrent Model-Free Reinforcement Learning (RL) for Future Work to Use as a Baseline for POMDP Algorithms
    submitted by /u/ai-lover [link] [comments]  ( 92 min )
    Tech Talk Sept 1 @ 12:30 EDT: Cutting GPU Costs for AI
    GPUs are designed to accelerate machine learning computations while simultaneously reducing latency and costs for training models and running inferences for production ML. While they are optimized to quickly process large workloads, unless they are managed efficiently, they can quickly drive up your consumption costs. This tech talk will explore how you can efficiently use GPU resources for production inferences. We'll walk through some of the common approaches and potential pitfalls with using GPUs, and help you identify the most efficient and cost effective method to meet your team's needs and resources. Thursday Sept 1 at 12:30PM EDT in Discord submitted by /u/modzykirsten [link] [comments]  ( 92 min )
    Space Channel using AI in News Reports
    Check out Space Channel's new AI-enhanced Digital News report from Brynly Mac on the Artemis 1 launch delay. Artemis 1 Launch Scrubbed. Will it launch this Saturday? submitted by /u/Space_Channel_FDIR [link] [comments]  ( 87 min )
    AI for Malnutrition Detection and Diet Recommendation
    I want to create an application which can detect malnutrition in children. Technically, the app can capture and measure the child's height, body volume, and weight ratio by using an infrared sensor that creates a 3D scan of a child. And based from that data, the AI can detect whether someone is malnourished or not. If malnourished, it will recommend the suitable and the variety of food items that the child can have. I was wondering if this is feasible. What are the initials steps I should take? submitted by /u/eudaimoniclux [link] [comments]  ( 87 min )
    Google colab adding limits
    submitted by /u/Alternative-Ninja-50 [link] [comments]  ( 87 min )
    From building blocks to blockbusters at the World’s Top 50 Innovators 2022
    submitted by /u/chelsea_bear [link] [comments]  ( 86 min )
    How to Perform Neural Architecture Search with no Training
    Hey everyone, today I published 1st video regarding "How you can perform neural Architecture Search without training". Do checkout the video. https://www.youtube.com/watch?v=5Iw9pPdXPzI submitted by /u/nightstorm1990 [link] [comments]  ( 92 min )
    almost like this people see each other at psy festivals 😂 (AI Video)
    submitted by /u/nalr00n [link] [comments]  ( 87 min )
    What's the most annoying about any Machine Learning pipeline?
    Hi everyone, I’m part of the Memri team that’s building an open source a machine learning pipeline that makes it easier for you to create data apps you can instantly use. Think building a sentiment analysis app for your own whatsapp chats, while your privacy is protected and you control the algorithms, so you can fully trust your data isn't being misused. We’re trying to learn more about what’s the most annoying when applying ML so we can try to make that easier for everyone who’s interested in ML. TLDR: What improvements to any ML-pipeline would make your ML-life muuuuch easier? ps; If you want be informed about our alpha and just play around with what we have already you could Sign up for updates here or have a look at our [Discord] (https://discord.com/invite/kRTcDtGUmr) or website if you want to learn more. submitted by /u/BLECQ1 [link] [comments]  ( 87 min )
    DOLL-Y mini
    submitted by /u/iTieRoomsTogether [link] [comments]  ( 86 min )
    America's got Talent Deepfake Simon Terry Crews sings
    submitted by /u/Hades_adhbik [link] [comments]  ( 93 min )
    Hecate in the forest with gold accesories and dark dress, made with Midjourney
    submitted by /u/Knadin [link] [comments]  ( 90 min )
    Wombo Dream's idea of human hands and feet
    I am fascinated by how an AI art generator like this can't get fingers and toes right. Does anyone have an explanation for this? submitted by /u/HelpfulAmoeba [link] [comments]  ( 87 min )
    Typed some dirty words in and ended up with this... I'm horrified.
    submitted by /u/halfawatermelon69 [link] [comments]  ( 90 min )
  • Open

    What does ~ symbol stand for in the action-value function?
    submitted by /u/Playful_Worldliness2 [link] [comments]  ( 87 min )
    In Latest Machine Learning Research, A Group at CMU Release a Simple and Efficient Implementation of Recurrent Model-Free Reinforcement Learning (RL) for Future Work to Use as a Baseline for POMDP Algorithms
    submitted by /u/ai-lover [link] [comments]  ( 87 min )
    Little Learning Machines - The Reinforcement Learning Game
    Hello r/reinforcementlearning !! We have rebranded!! Introducing… drumroll… 🤖 LITTLE LEARNING MACHINES 🤖 (Formerly: Animo Island) The reinforcement learning game where you train cute AI robots with love and fear to do all sorts of actions! With unlimited possibilities, you can train your little machine to do nearly anything. In our cozy artificial life simulator YOU experiment with the power of A.I. and reinforcement learning, your little robots are actually learning from their wins and their mistakes making no two machines alike. We are excited to be back on Reddit and can't wait to see all of the cool projects you are working on! And what do you all think of our new logo? PS. Currently working on getting our Steam page up and running so keep your eyes peeled this week! 👀 https://i.redd.it/ssdy5mma04l91.gif submitted by /u/LoveFearLearn [link] [comments]  ( 87 min )
    Did anyone try TorchScript vs Jax?
    More and more people seem to use / try out Jax. For example CleanRL has some implementation in Jax now. I also tried to code PPO, DQN etc. in Jax. The speed up is really nice but I'm wondering if anyone tried out doing the same (jitting the learning procedure) with TorchScript. If so, what were your results? submitted by /u/NiconiusX [link] [comments]  ( 87 min )
    Are there any advantages to on-policy learning in deep RL?
    The main argument for off-policy algorithms is usually sample efficiency, and perhaps a robustness argument about agents being able to escape local minima by learning from a wider range of experience. The consensus I get is, there’s no reason why you would want to use on-policy learning if you can find a way to stabilize off-policy learning. Are there any pros to on-policy learning? If we could find a way to make on-policy learning more sample efficient, is there a reason why we’d want to use it over off-policy methods? submitted by /u/vandelay_inds [link] [comments]  ( 88 min )
    RL newspaper?
    I was wondering if there were any RL-focused newspapers that summarise recent research and developments in the field? If not, how many of you would be interested in following such a newspaper? submitted by /u/nacho_rz [link] [comments]  ( 95 min )
  • Open

    Announcing the Patent Phrase Similarity Dataset
    Posted Grigor Aslanyan, Software Engineer, Google Patent documents typically use legal and highly technical language, with context-dependent terms that may have meanings quite different from colloquial usage and even between different documents. The process of using traditional patent search methods (e.g., keyword searching) to search through the corpus of over one hundred million patent documents can be tedious and result in many missed results due to the broad and non-standard language used. For example, a "soccer ball" may be described as a "spherical recreation device", "inflatable sportsball" or "ball for ball game". Additionally, the language used in some patent documents may obfuscate terms to their advantage, so more powerful natural language processing (NLP) and semantic similari…  ( 26 min )
  • Open

    DALL·E: Introducing Outpainting
    Extend creativity and tell a bigger story with DALL-E images of any size Original outpainting by Emma Catnip Today we’re introducing Outpainting, a new feature which helps users extend their creativity by continuing an image beyond its original borders — adding visual elements in the same style, or  ( 4 min )
  • Open

    Fraunhofer Research Leads Way Into Future of Robotics
    Joseph Fraunhofer was a 19th-century pioneer in optics who brought together scientific research with industrial applications. Fast forward to today and Germany’s Fraunhofer Society — Europe’s largest R&D organization — is setting its sights on the applied research of key technologies, from AI to cybersecurity to medicine. Its Fraunhofer IML unit is aiming to push Read article > The post Fraunhofer Research Leads Way Into Future of Robotics appeared first on NVIDIA Blog.  ( 5 min )
    UN Economic Commission for Africa Engages NVIDIA to Boost Data Science in 10 Nations
    NVIDIA is collaborating with the United Nations Economic Commission for Africa (UNECA) to equip governments and developer communities in 10 nations with data science training and technology to support more informed policymaking and accelerate how resources are allocated. The initiative will empower the countries’ national statistical offices — agencies that handle population censuses data, economic Read article > The post UN Economic Commission for Africa Engages NVIDIA to Boost Data Science in 10 Nations appeared first on NVIDIA Blog.  ( 6 min )
    OBS Studio to Release Software Update 28.0 With NVIDIA Broadcast Features ‘In the NVIDIA Studio’
    In the NVIDIA Studio celebrates the Open Broadcaster Software (OBS) Studio’s 10th anniversary and its 28.0 software release. Plus, popular streamer WATCHHOLLIE shares how she uses OBS and a GeForce RTX 3080 GPU in a single-PC setup to elevate her livestreams. The post OBS Studio to Release Software Update 28.0 With NVIDIA Broadcast Features ‘In the NVIDIA Studio’ appeared first on NVIDIA Blog.  ( 7 min )
    Rendered.ai Founder and CEO Nathan Kundtz on Using AI to Build Better AI
    Data is the fuel that makes artificial intelligence run. Training machine learning and AI systems requires data. And the quality of datasets has a big impact on the systems’ results. But compiling quality real-world data for AI and ML can be difficult and expensive. That’s where synthetic data comes in. The guest for this week’s Read article > The post Rendered.ai Founder and CEO Nathan Kundtz on Using AI to Build Better AI appeared first on NVIDIA Blog.  ( 5 min )
  • Open

    Beyond Greedy Search: Tracking by Multi-Agent Reinforcement Learning-based Beam Search. (arXiv:2205.09676v3 [cs.CV] UPDATED)
    To track the target in a video, current visual trackers usually adopt greedy search for target object localization in each frame, that is, the candidate region with the maximum response score will be selected as the tracking result of each frame. However, we found that this may be not an optimal choice, especially when encountering challenging tracking scenarios such as heavy occlusion and fast motion. To address this issue, we propose to maintain multiple tracking trajectories and apply beam search strategy for visual tracking, so that the trajectory with fewer accumulated errors can be identified. Accordingly, this paper introduces a novel multi-agent reinforcement learning based beam search tracking strategy, termed BeamTracking. It is mainly inspired by the image captioning task, which takes an image as input and generates diverse descriptions using beam search algorithm. Accordingly, we formulate the tracking as a sample selection problem fulfilled by multiple parallel decision-making processes, each of which aims at picking out one sample as their tracking result in each frame. Each maintained trajectory is associated with an agent to perform the decision-making and determine what actions should be taken to update related information. When all the frames are processed, we select the trajectory with the maximum accumulated score as the tracking result. Extensive experiments on seven popular tracking benchmark datasets validated the effectiveness of the proposed algorithm.
    On the Trade-Off between Actionable Explanations and the Right to be Forgotten. (arXiv:2208.14137v1 [cs.LG])
    As machine learning (ML) models are increasingly being deployed in high-stakes applications, policymakers have suggested tighter data protection regulations (e.g., GDPR, CCPA). One key principle is the ``right to be forgotten'' which gives users the right to have their data deleted. Another key principle is the right to an actionable explanation, also known as algorithmic recourse, allowing users to reverse unfavorable decisions. To date it is unknown whether these two principles can be operationalized simultaneously. Therefore, we introduce and study the problem of recourse invalidation in the context of data deletion requests. More specifically, we theoretically and empirically analyze the behavior of popular state-of-the-art algorithms and demonstrate that the recourses generated by these algorithms are likely to be invalidated if a small number of data deletion requests (e.g., 1 or 2) warrant updates of the predictive model. For the setting of linear models and overparameterized neural networks -- studied through the lens of neural tangent kernels (NTKs) -- we suggest a framework to identify a minimal subset of critical training points, which when removed, would lead to maximize the fraction of invalidated recourses. Using our framework, we empirically establish that the removal of as little as 2 data instances from the training set can invalidate up to 95 percent of all recourses output by popular state-of-the-art algorithms. Thus, our work raises fundamental questions about the compatibility of ``the right to an actionable explanation'' in the context of the ``right to be forgotten''.
    Solving parametric partial differential equations with deep rectified quadratic unit neural networks. (arXiv:2203.06973v2 [math.NA] UPDATED)
    Implementing deep neural networks for learning the solution maps of parametric partial differential equations (PDEs) turns out to be more efficient than using many conventional numerical methods. However, limited theoretical analyses have been conducted on this approach. In this study, we investigate the expressive power of deep rectified quadratic unit (ReQU) neural networks for approximating the solution maps of parametric PDEs. The proposed approach is motivated by the recent important work of G. Kutyniok, P. Petersen, M. Raslan and R. Schneider (Gitta Kutyniok, Philipp Petersen, Mones Raslan, and Reinhold Schneider. A theoretical analysis of deep neural networks and parametric pdes. Constructive Approximation, pages 1-53, 2021), which uses deep rectified linear unit (ReLU) neural networks for solving parametric PDEs. In contrast to the previously established complexity-bound $\mathcal{O}\left(d^3\log_{2}^{q}(1/ \epsilon) \right)$ for ReLU neural networks, we derive an upper bound $\mathcal{O}\left(d^3\log_{2}^{q}\log_{2}(1/ \epsilon) \right)$ on the size of the deep ReQU neural network required to achieve accuracy $\epsilon>0$, where $d$ is the dimension of reduced basis representing the solutions. Our method takes full advantage of the inherent low-dimensionality of the solution manifolds and better approximation performance of deep ReQU neural networks. Numerical experiments are performed to verify our theoretical result.
    FDB: Fraud Dataset Benchmark. (arXiv:2208.14417v1 [cs.LG])
    Standardized datasets and benchmarks have spurred innovations in computer vision, natural language processing, multi-modal and tabular settings. We note that, as compared to other well researched fields fraud detection has numerous differences. The differences include a high class imbalance, diverse feature types, frequently changing fraud patterns, and adversarial nature of the problem. Due to these differences, the modeling approaches that are designed for other classification tasks may not work well for the fraud detection. We introduce Fraud Dataset Benchmark (FDB), a compilation of publicly available datasets catered to fraud detection. FDB comprises variety of fraud related tasks, ranging from identifying fraudulent card-not-present transactions, detecting bot attacks, classifying malicious URLs, predicting risk of loan to content moderation. The Python based library from FDB provides consistent API for data loading with standardized training and testing splits. For reference, we also provide baseline evaluations of different modeling approaches on FDB. Considering the increasing popularity of Automated Machine Learning (AutoML) for various research and business problems, we used AutoML frameworks for our baseline evaluations. For fraud prevention, the organizations that operate with limited resources and lack ML expertise often hire a team of investigators, use blocklists and manual rules, all of which are inefficient and do not scale well. Such organizations can benefit from AutoML solutions that are easy to deploy in production and pass the bar of fraud prevention requirements. We hope that FDB helps in the development of customized fraud detection techniques catered to different fraud modus operandi (MOs) as well as in the improvement of AutoML systems that can work well for all datasets in the benchmark.
    Ranking Neural Checkpoints. (arXiv:2011.11200v4 [cs.LG] UPDATED)
    This paper is concerned with ranking many pre-trained deep neural networks (DNNs), called checkpoints, for the transfer learning to a downstream task. Thanks to the broad use of DNNs, we may easily collect hundreds of checkpoints from various sources. Which of them transfers the best to our downstream task of interest? Striving to answer this question thoroughly, we establish a neural checkpoint ranking benchmark (NeuCRaB) and study some intuitive ranking measures. These measures are generic, applying to the checkpoints of different output types without knowing how the checkpoints are pre-trained on which dataset. They also incur low computation cost, making them practically meaningful. Our results suggest that the linear separability of the features extracted by the checkpoints is a strong indicator of transferability. We also arrive at a new ranking measure, NLEEP, which gives rise to the best performance in the experiments.
    SpeqNets: Sparsity-aware Permutation-equivariant Graph Networks. (arXiv:2203.13913v3 [cs.LG] UPDATED)
    While (message-passing) graph neural networks have clear limitations in approximating permutation-equivariant functions over graphs or general relational data, more expressive, higher-order graph neural networks do not scale to large graphs. They either operate on $k$-order tensors or consider all $k$-node subgraphs, implying an exponential dependence on $k$ in memory requirements, and do not adapt to the sparsity of the graph. By introducing new heuristics for the graph isomorphism problem, we devise a class of universal, permutation-equivariant graph networks, which, unlike previous architectures, offer a fine-grained control between expressivity and scalability and adapt to the sparsity of the graph. These architectures lead to vastly reduced computation times compared to standard higher-order graph networks in the supervised node- and graph-level classification and regression regime while significantly improving over standard graph neural network and graph kernel architectures in terms of predictive performance.
    Anomaly Detection using Contrastive Normalizing Flows. (arXiv:2208.14024v1 [cs.LG])
    Detecting test data deviating from training data is a central problem for safe and robust machine learning. Likelihoods learned by a generative model, e.g., a normalizing flow via standard log-likelihood training, perform poorly as an anomaly score. We propose to use an unlabelled auxiliary dataset and a probabilistic outlier score for anomaly detection. We use a self-supervised feature extractor trained on the auxiliary dataset and train a normalizing flow on the extracted features by maximizing the likelihood on in-distribution data and minimizing the likelihood on the auxiliary dataset. We show that this is equivalent to learning the normalized positive difference between the in-distribution and the auxiliary feature density. We conduct experiments on benchmark datasets and show a robust improvement compared to likelihood, likelihood ratio methods and state-of-the-art anomaly detection methods.
    Denoising Architecture for Unsupervised Anomaly Detection in Time-Series. (arXiv:2208.14337v1 [cs.LG])
    Anomalies in time-series provide insights of critical scenarios across a range of industries, from banking and aerospace to information technology, security, and medicine. However, identifying anomalies in time-series data is particularly challenging due to the imprecise definition of anomalies, the frequent absence of labels, and the enormously complex temporal correlations present in such data. The LSTM Autoencoder is an Encoder-Decoder scheme for Anomaly Detection based on Long Short Term Memory Networks that learns to reconstruct time-series behavior and then uses reconstruction error to identify abnormalities. We introduce the Denoising Architecture as a complement to this LSTM Encoder-Decoder model and investigate its effect on real-world as well as artificially generated datasets. We demonstrate that the proposed architecture increases both the accuracy and the training speed, thereby, making the LSTM Autoencoder more efficient for unsupervised anomaly detection tasks.
    Principal Component Analysis based frameworks for efficient missing data imputation algorithms. (arXiv:2205.15150v2 [cs.LG] UPDATED)
    Missing data is a commonly occurring problem in practice. Many imputation methods have been developed to fill in the missing entries. However, not all of them can scale to high-dimensional data, especially the multiple imputation techniques. Meanwhile, the data nowadays tends toward high-dimensional. Therefore, in this work, we propose Principal Component Analysis Imputation (PCAI), a simple but versatile framework based on Principal Component Analysis (PCA) to speed up the imputation process and alleviate memory issues of many available imputation techniques, without sacrificing the imputation quality in term of MSE. In addition, the frameworks can be used even when some or all of the missing features are categorical, or when the number of missing features is large. Next, we introduce PCA Imputation - Classification (PIC), an application of PCAI for classification problems with some adjustments. We validate our approach by experiments on various scenarios, which shows that PCAI and PIC can work with various imputation algorithms, including the state-of-the-art ones and improve the imputation speed significantly, while achieving competitive mean square error/classification accuracy compared to direct imputation (i.e., impute directly on the missing data).
    A Black-Box Attack on Optical Character Recognition Systems. (arXiv:2208.14302v1 [cs.CV])
    Adversarial machine learning is an emerging area showing the vulnerability of deep learning models. Exploring attack methods to challenge state of the art artificial intelligence (A.I.) models is an area of critical concern. The reliability and robustness of such A.I. models are one of the major concerns with an increasing number of effective adversarial attack methods. Classification tasks are a major vulnerable area for adversarial attacks. The majority of attack strategies are developed for colored or gray-scaled images. Consequently, adversarial attacks on binary image recognition systems have not been sufficiently studied. Binary images are simple two possible pixel-valued signals with a single channel. The simplicity of binary images has a significant advantage compared to colored and gray scaled images, namely computation efficiency. Moreover, most optical character recognition systems (O.C.R.s), such as handwritten character recognition, plate number identification, and bank check recognition systems, use binary images or binarization in their processing steps. In this paper, we propose a simple yet efficient attack method, Efficient Combinatorial Black-box Adversarial Attack, on binary image classifiers. We validate the efficiency of the attack technique on two different data sets and three classification networks, demonstrating its performance. Furthermore, we compare our proposed method with state-of-the-art methods regarding advantages and disadvantages as well as applicability.
    A Non-Classical Parameterization for Density Estimation Using Sample Moments. (arXiv:2201.04786v3 [stat.ML] UPDATED)
    Moment methods are an important means of density estimation, but they are generally strongly dependent on the choice of feasible functions, which severely affects the performance. We propose a non-classical parameterization for density estimation using the sample moments, which does not require the choice of such functions. The parameterization is induced by the Kullback-Leibler distance, and the solution of it, which is proved to exist and be unique subject to simple prior that does not depend on data, can be obtained by convex optimization. Simulation results show the performance of the proposed estimator in estimating multi-modal densities which are mixtures of different types of functions.
    HiGNN: Hierarchical Informative Graph Neural Networks for Molecular Property Prediction Equipped with Feature-Wise Attention. (arXiv:2208.13994v1 [cs.LG])
    Elucidating and accurately predicting the druggability and bioactivities of molecules plays a pivotal role in drug design and discovery and remains an open challenge. Recently, graph neural networks (GNN) have made remarkable advancements in graph-based molecular property prediction. However, current graph-based deep learning methods neglect the hierarchical information of molecules and the relationships between feature channels. In this study, we propose a well-designed hierarchical informative graph neural networks framework (termed HiGNN) for predicting molecular property by utilizing a co-representation learning of molecular graphs and chemically synthesizable BRICS fragments. Furthermore, a plug-and-play feature-wise attention block is first designed in HiGNN architecture to adaptively recalibrate atomic features after the message passing phase. Extensive experiments demonstrate that HiGNN achieves state-of-the-art predictive performance on many challenging drug discovery-associated benchmark datasets. In addition, we devise a molecule-fragment similarity mechanism to comprehensively investigate the interpretability of HiGNN model at the subgraph level, indicating that HiGNN as a powerful deep learning tool can help chemists and pharmacists identify the key components of molecules for designing better molecules with desired properties or functions. The source code is publicly available at https://github.com/idruglab/hignn.
    An efficient and flexible inference system for serving heterogeneous ensembles of deep neural networks. (arXiv:2208.14049v1 [cs.DC])
    Ensembles of Deep Neural Networks (DNNs) have achieved qualitative predictions but they are computing and memory intensive. Therefore, the demand is growing to make them answer a heavy workload of requests with available computational resources. Unlike recent initiatives on inference servers and inference frameworks, which focus on the prediction of single DNNs, we propose a new software layer to serve with flexibility and efficiency ensembles of DNNs. Our inference system is designed with several technical innovations. First, we propose a novel procedure to find a good allocation matrix between devices (CPUs or GPUs) and DNN instances. It runs successively a worst-fit to allocate DNNs into the memory devices and a greedy algorithm to optimize allocation settings and speed up the ensemble. Second, we design the inference system based on multiple processes to run asynchronously: batching, prediction, and the combination rule with an efficient internal communication scheme to avoid overhead. Experiments show the flexibility and efficiency under extreme scenarios: It successes to serve an ensemble of 12 heavy DNNs into 4 GPUs and at the opposite, one single DNN multi-threaded into 16 GPUs. It also outperforms the simple baseline consisting of optimizing the batch size of DNNs by a speedup up to 2.7X on the image classification task.
    FUSION: Fully Unsupervised Test-Time Stain Adaptation via Fused Normalization Statistics. (arXiv:2208.14206v1 [eess.IV])
    Staining reveals the micro structure of the aspirate while creating histopathology slides. Stain variation, defined as a chromatic difference between the source and the target, is caused by varying characteristics during staining, resulting in a distribution shift and poor performance on the target. The goal of stain normalization is to match the target's chromatic distribution to that of the source. However, stain normalisation causes the underlying morphology to distort, resulting in an incorrect diagnosis. We propose FUSION, a new method for promoting stain-adaption by adjusting the model to the target in an unsupervised test-time scenario, eliminating the necessity for significant labelling at the target end. FUSION works by altering the target's batch normalization statistics and fusing them with source statistics using a weighting factor. The algorithm reduces to one of two extremes based on the weighting factor. Despite the lack of training or supervision, FUSION surpasses existing equivalent algorithms for classification and dense predictions (segmentation), as demonstrated by comprehensive experiments on two public datasets.
    Independent Natural Policy Gradient Methods for Potential Games: Finite-time Global Convergence with Entropy Regularization. (arXiv:2204.05466v2 [math.OC] UPDATED)
    A major challenge in multi-agent systems is that the system complexity grows dramatically with the number of agents as well as the size of their action spaces, which is typical in real world scenarios such as autonomous vehicles, robotic teams, network routing, etc. It is hence in imminent need to design decentralized or independent algorithms where the update of each agent is only based on their local observations without the need of introducing complex communication/coordination mechanisms. In this work, we study the finite-time convergence of independent entropy-regularized natural policy gradient (NPG) methods for potential games, where the difference in an agent's utility function due to unilateral deviation matches exactly that of a common potential function. The proposed entropy-regularized NPG method enables each agent to deploy symmetric, decentralized, and multiplicative updates according to its own payoff. We show that the proposed method converges to the quantal response equilibrium (QRE) -- the equilibrium to the entropy-regularized game -- at a sublinear rate, which is independent of the size of the action space and grows at most sublinearly with the number of agents. Appealingly, the convergence rate further becomes independent with the number of agents for the important special case of identical-interest games, leading to the first method that converges at a dimension-free rate. Our approach can be used as a smoothing technique to find an approximate Nash equilibrium (NE) of the unregularized problem without assuming that stationary policies are isolated.
    SFP: State-free Priors for Exploration in Off-Policy Reinforcement Learning. (arXiv:2205.13528v2 [cs.LG] UPDATED)
    Efficient exploration is a crucial challenge in deep reinforcement learning. Several methods, such as behavioral priors, are able to leverage offline data in order to efficiently accelerate reinforcement learning on complex tasks. However, if the task at hand deviates excessively from the demonstrated task, the effectiveness of such methods is limited. In our work, we propose to learn features from offline data that are shared by a more diverse range of tasks, such as correlation between actions and directedness. Therefore, we introduce state-free priors, which directly model temporal consistency in demonstrated trajectories, and are capable of driving exploration in complex tasks, even when trained on data collected on simpler tasks. Furthermore, we introduce a novel integration scheme for action priors in off-policy reinforcement learning by dynamically sampling actions from a probabilistic mixture of policy and action prior. We compare our approach against strong baselines and provide empirical evidence that it can accelerate reinforcement learning in long-horizon continuous control tasks under sparse reward settings.
    Online Active Regression. (arXiv:2207.05945v2 [cs.LG] UPDATED)
    Active regression considers a linear regression problem where the learner receives a large number of data points but can only observe a small number of labels. Since online algorithms can deal with incremental training data and take advantage of low computational cost, we consider an online extension of the active regression problem: the learner receives data points one by one and immediately decides whether it should collect the corresponding labels. The goal is to efficiently maintain the regression of received data points with a small budget of label queries. We propose novel algorithms for this problem under $\ell_p$ loss where $p\in[1,2]$. To achieve a $(1+\epsilon)$-approximate solution, our proposed algorithms only require $\tilde{\mathcal{O}}(\epsilon^{-1} d \log(n\kappa))$ queries of labels, where $n$ is the number of data points and $\kappa$ is a quantity, called the condition number, of the data points. The numerical results verify our theoretical results and show that our methods have comparable performance with offline active regression algorithms.
    Reducing Certified Regression to Certified Classification. (arXiv:2208.13904v1 [cs.LG])
    Adversarial training instances can severely distort a model's behavior. This work investigates certified regression defenses, which provide guaranteed limits on how much a regressor's prediction may change under a training-set attack. Our key insight is that certified regression reduces to certified classification when using median as a model's primary decision function. Coupling our reduction with existing certified classifiers, we propose six new provably-robust regressors. To the extent of our knowledge, this is the first work that certifies the robustness of individual regression predictions without any assumptions about the data distribution and model architecture. We also show that existing state-of-the-art certified classifiers often make overly-pessimistic assumptions that can degrade their provable guarantees. We introduce a tighter analysis of model robustness, which in many cases results in significantly improved certified guarantees. Lastly, we empirically demonstrate our approaches' effectiveness on both regression and classification data, where the accuracy of up to 50% of test predictions can be guaranteed under 1% training-set corruption and up to 30% of predictions under 4% corruption. Our source code is available at https://github.com/ZaydH/certified-regression.
    Transfer learning to decode brain states reflecting the relationship between cognitive tasks. (arXiv:2206.03950v3 [q-bio.NC] UPDATED)
    Transfer learning improves the performance of the target task by leveraging the data of a specific source task: the closer the relationship between the source and the target tasks, the greater the performance improvement by transfer learning. In neuroscience, the relationship between cognitive tasks is usually represented by similarity of activated brain regions or neural representation. However, no study has linked transfer learning and neuroscience to reveal the relationship between cognitive tasks. In this study, we propose a transfer learning framework to reflect the relationship between cognitive tasks, and compare the task relations reflected by transfer learning and by the overlaps of brain regions (e.g., neurosynth). Our results of transfer learning create cognitive taskonomy to reflect the relationship between cognitive tasks which is well in line with the task relations derived from neurosynth. Transfer learning performs better in task decoding with fMRI data if the source and target cognitive tasks activate similar brain regions. Our study uncovers the relationship of multiple cognitive tasks and provides guidance for source task selection in transfer learning for neural decoding based on small-sample data.
    Finding neural signatures for obesity using source-localized EEG features. (arXiv:2208.14007v1 [cs.LG])
    Obesity is a serious issue in the modern society since it associates to a significantly reduced quality of life. Current research conducted to explore the obesity-related neurological evidences using electroencephalography (EEG) data are limited to traditional approaches. In this study, we developed a novel machine learning model to identify brain networks of obese females using alpha band functional connectivity features derived from EEG data. An overall classification accuracy of 90% is achieved. Our finding suggests that the obese brain is characterized by a dysfunctional network in which the areas that are responsible for processing self-referential information such as energy requirement are impaired.
    Beyond Supervised Continual Learning: a Review. (arXiv:2208.14307v1 [cs.LG])
    Continual Learning (CL, sometimes also termed incremental learning) is a flavor of machine learning where the usual assumption of stationary data distribution is relaxed or omitted. When naively applying, e.g., DNNs in CL problems, changes in the data distribution can cause the so-called catastrophic forgetting (CF) effect: an abrupt loss of previous knowledge. Although many significant contributions to enabling CL have been made in recent years, most works address supervised (classification) problems. This article reviews literature that study CL in other settings, such as learning with reduced supervision, fully unsupervised learning, and reinforcement learning. Besides proposing a simple schema for classifying CL approaches w.r.t. their level of autonomy and supervision, we discuss the specific challenges associated with each setting and the potential contributions to the field of CL in general.
    Community recovery in non-binary and temporal stochastic block models. (arXiv:2008.04790v5 [math.ST] UPDATED)
    This article studies the estimation of latent community memberships from pairwise interactions in a network of $N$ nodes, where the observed interactions can be of arbitrary type, including binary, categorical, and vector-valued, and not excluding even more general objects such as time series or spatial point patterns. As a generative model for such data, we introduce a stochastic block model with a general measurable interaction space $\mathcal S$, for which we derive information-theoretic bounds for the minimum achievable error rate. These bounds yield sharp criteria for the existence of consistent and strongly consistent estimators in terms of data sparsity, statistical similarity between intra- and inter-block interaction distributions, and the shape and size of the interaction space. The general framework makes it possible to study temporal and multiplex networks with $\mathcal S = \{0,1\}^T$, in settings where both $N \to \infty$ and $T \to \infty$, and the temporal interaction patterns are correlated over time. For temporal Markov interactions, we derive sharp consistency thresholds. We also present fast online estimation algorithms which fully utilise the non-binary nature of the observed data. Numerical experiments on synthetic and real data show that these algorithms rapidly produce accurate estimates even for very sparse data arrays.
    AutoWS-Bench-101: Benchmarking Automated Weak Supervision with 100 Labels. (arXiv:2208.14362v1 [cs.LG])
    Weak supervision (WS) is a powerful method to build labeled datasets for training supervised models in the face of little-to-no labeled data. It replaces hand-labeling data with aggregating multiple noisy-but-cheap label estimates expressed by labeling functions (LFs). While it has been used successfully in many domains, weak supervision's application scope is limited by the difficulty of constructing labeling functions for domains with complex or high-dimensional features. To address this, a handful of methods have proposed automating the LF design process using a small set of ground truth labels. In this work, we introduce AutoWS-Bench-101: a framework for evaluating automated WS (AutoWS) techniques in challenging WS settings -- a set of diverse application domains on which it has been previously difficult or impossible to apply traditional WS techniques. While AutoWS is a promising direction toward expanding the application-scope of WS, the emergence of powerful methods such as zero-shot foundation models reveals the need to understand how AutoWS techniques compare or cooperate with modern zero-shot or few-shot learners. This informs the central question of AutoWS-Bench-101: given an initial set of 100 labels for each task, we ask whether a practitioner should use an AutoWS method to generate additional labels or use some simpler baseline, such as zero-shot predictions from a foundation model or supervised learning. We observe that in many settings, it is necessary for AutoWS methods to incorporate signal from foundation models if they are to outperform simple few-shot baselines, and AutoWS-Bench-101 promotes future research in this direction. We conclude with a thorough ablation study of AutoWS methods.
    Application of Convolutional Neural Networks with Quasi-Reversibility Method Results for Option Forecasting. (arXiv:2208.14385v1 [q-fin.ST])
    This paper presents a novel way to apply mathematical finance and machine learning (ML) to forecast stock options prices. Following results from the paper Quasi-Reversibility Method and Neural Network Machine Learning to Solution of Black-Scholes Equations (appeared on the AMS Contemporary Mathematics journal), we create and evaluate new empirical mathematical models for the Black-Scholes equation to analyze data for 92,846 companies. We solve the Black-Scholes (BS) equation forwards in time as an ill-posed inverse problem, using the Quasi-Reversibility Method (QRM), to predict option price for the future one day. For each company, we have 13 elements including stock and option daily prices, volatility, minimizer, etc. Because the market is so complicated that there exists no perfect model, we apply ML to train algorithms to make the best prediction. The current stage of research combines QRM with Convolutional Neural Networks (CNN), which learn information across a large number of data points simultaneously. We implement CNN to generate new results by validating and testing on sample market data. We test different ways of applying CNN and compare our CNN models with previous models to see if achieving a higher profit rate is possible.
    Leap-frog neural network for learning the symplectic evolution from partitioned data. (arXiv:2208.14148v1 [astro-ph.EP])
    For the Hamiltonian system, this work considers the learning and prediction of the position (q) and momentum (p) variables generated by a symplectic evolution map. Similar to Chen & Tao (2021), the symplectic map is represented by the generating function. In addition, we develop a new learning scheme by splitting the time series (q_i, p_i) into several partitions, and then train a leap-frog neural network (LFNN) to approximate the generating function between the first (i.e. initial condition) and one of the rest partitions. For predicting the system evolution in a short timescale, the LFNN could effectively avoid the issue of accumulative error. Then the LFNN is applied to learn the behavior of the 2:3 resonant Kuiper belt objects, in a much longer time period, and there are two significant improvements on the neural network constructed in our previous work (Li et al. 2022): (1) conservation of the Jacobi integral ; (2) highly accurate prediction of the orbital evolution. We propose that the LFNN may be useful to make the prediction of the long time evolution of the Hamiltonian system.
    Competition, Alignment, and Equilibria in Digital Marketplaces. (arXiv:2208.14423v1 [cs.GT])
    Competition between traditional platforms is known to improve user utility by aligning the platform's actions with user preferences. But to what extent is alignment exhibited in data-driven marketplaces? To study this question from a theoretical perspective, we introduce a duopoly market where platform actions are bandit algorithms and the two platforms compete for user participation. A salient feature of this market is that the quality of recommendations depends on both the bandit algorithm and the amount of data provided by interactions from users. This interdependency between the algorithm performance and the actions of users complicates the structure of market equilibria and their quality in terms of user utility. Our main finding is that competition in this market does not perfectly align market outcomes with user utility. Interestingly, market outcomes exhibit misalignment not only when the platforms have separate data repositories, but also when the platforms have a shared data repository. Nonetheless, the data sharing assumptions impact what mechanism drives misalignment and also affect the specific form of misalignment (e.g. the quality of the best-case and worst-case market outcomes). More broadly, our work illustrates that competition in digital marketplaces has subtle consequences for user utility that merit further investigation.
    "Prompt-Gamma Neutron Activation Analysis (PGNAA)" Metal Spectral Classification using Deep Learning Method. (arXiv:2208.13909v1 [cs.LG])
    There is a pressing market demand to minimize the test time of Prompt Gamma Neutron Activation Analysis (PGNAA) spectra measurement machine, so that it could function as an instant material analyzer, e.g. to classify waste samples instantaneously and determine the best recycling method based on the detected compositions of the testing sample. This article introduces a new development of the deep learning classification and contrive to reduce the test time for PGNAA machine. We propose both Random Sampling Methods and Class Activation Map (CAM) to generate "downsized" samples and train the CNN model continuously. Random Sampling Methods (RSM) aims to reduce the measuring time within a sample, and Class Activation Map (CAM) is for filtering out the less important energy range of the downsized samples. We shorten the overall PGNAA measuring time down to 2.5 seconds while ensuring the accuracy is around 96.88 % for our dataset with 12 different species of substances. Compared with classifying different species of materials, it requires more test time (sample count rate) for substances having the same elements to archive good accuracy. For example, the classification of copper alloys requires nearly 24 seconds test time to reach 98 % accuracy.
    FuncFooler: A Practical Black-box Attack Against Learning-based Binary Code Similarity Detection Methods. (arXiv:2208.14191v1 [cs.CR])
    The binary code similarity detection (BCSD) method measures the similarity of two binary executable codes. Recently, the learning-based BCSD methods have achieved great success, outperforming traditional BCSD in detection accuracy and efficiency. However, the existing studies are rather sparse on the adversarial vulnerability of the learning-based BCSD methods, which cause hazards in security-related applications. To evaluate the adversarial robustness, this paper designs an efficient and black-box adversarial code generation algorithm, namely, FuncFooler. FuncFooler constrains the adversarial codes 1) to keep unchanged the program's control flow graph (CFG), and 2) to preserve the same semantic meaning. Specifically, FuncFooler consecutively 1) determines vulnerable candidates in the malicious code, 2) chooses and inserts the adversarial instructions from the benign code, and 3) corrects the semantic side effect of the adversarial code to meet the constraints. Empirically, our FuncFooler can successfully attack the three learning-based BCSD models, including SAFE, Asm2Vec, and jTrans, which calls into question whether the learning-based BCSD is desirable.
    FAST-AID Brain: Fast and Accurate Segmentation Tool using Artificial Intelligence Developed for Brain. (arXiv:2208.14360v1 [eess.IV])
    Medical images used in clinical practice are heterogeneous and not the same quality as scans studied in academic research. Preprocessing breaks down in extreme cases when anatomy, artifacts, or imaging parameters are unusual or protocols are different. Methods robust to these variations are most needed. A novel deep learning method is proposed for fast and accurate segmentation of the human brain into 132 regions. The proposed model uses an efficient U-Net-like network and benefits from the intersection points of different views and hierarchical relations for the fusion of the orthogonal 2D planes and brain labels during the end-to-end training. Weakly supervised learning is deployed to take the advantage of partially labeled data for the whole brain segmentation and estimation of the intracranial volume (ICV). Moreover, data augmentation is used to expand the magnetic resonance imaging (MRI) data by generating realistic brain scans with high variability for robust training of the model while preserving data privacy. The proposed method can be applied to brain MRI data including skull or any other artifacts without preprocessing the images or a drop in performance. Several experiments using different atlases are conducted to evaluate the segmentation performance of the trained model compared to the state-of-the-art, and the results show higher segmentation accuracy and robustness of the proposed model compared to the existing methods across different intra- and inter-domain datasets.
    Learning Spatiotemporal Chaos Using Next-Generation Reservoir Computing. (arXiv:2203.13294v3 [cs.LG] UPDATED)
    Forecasting the behavior of high-dimensional dynamical systems using machine learning requires efficient methods to learn the underlying physical model. We demonstrate spatiotemporal chaos prediction using a machine learning architecture that, when combined with a next-generation reservoir computer, displays state-of-the-art performance with a computational time $10^3-10^4$ times faster for training process and training data set $\sim 10^2$ times smaller than other machine learning algorithms. We also take advantage of the translational symmetry of the model to further reduce the computational cost and training data, each by a factor of $\sim$10.
    AtteSTNet -- An attention and subword tokenization based approach for code-switched text hate speech detection. (arXiv:2112.11479v2 [cs.CL] UPDATED)
    Recent advancements in technology have led to a boost in social media usage which has ultimately led to large amounts of user-generated data which also includes hateful and offensive speech. The language used in social media is often a combination of English and the native language in the region. In India, Hindi is used predominantly and is often code-switched with English, giving rise to the Hinglish (Hindi+English) language. Various approaches have been made in the past to classify the code-mixed Hinglish hate speech using different machine learning and deep learning-based techniques. However, these techniques make use of recurrence on convolution mechanisms which are computationally expensive and have high memory requirements. Past techniques also make use of complex data processing making the existing techniques very complex and non-sustainable to change in data. We propose a much simpler approach which is not only at par with these complex networks but also exceeds performance with the use of subword tokenization algorithms like BPE and Unigram along with multi-head attention-based technique giving an accuracy of 87.41% and F1 score of 0.851 on standard datasets. Efficient use of BPE and Unigram algorithms help handle the non-conventional Hinglish vocabulary making our technique simple, efficient and sustainable to use in the real world.
    A Data-Centric Optimization Framework for Machine Learning. (arXiv:2110.10802v3 [cs.LG] UPDATED)
    Rapid progress in deep learning is leading to a diverse set of quickly changing models, with a dramatically growing demand for compute. However, as frameworks specialize performance optimization to patterns in popular networks, they implicitly constrain novel and diverse models that drive progress in research. We empower deep learning researchers by defining a flexible and user-customizable pipeline for optimizing training of arbitrary deep neural networks, based on data movement minimization. The pipeline begins with standard networks in PyTorch or ONNX and transforms computation through progressive lowering. We define four levels of general-purpose transformations, from local intra-operator optimizations to global data movement reduction. These operate on a data-centric graph intermediate representation that expresses computation and data movement at all levels of abstraction, including expanding basic operators such as convolutions to their underlying computations. Central to the design is the interactive and introspectable nature of the pipeline. Every part is extensible through a Python API, and can be tuned interactively using a GUI. We demonstrate competitive performance or speedups on ten different networks, with interactive optimizations discovering new opportunities in EfficientNet.
    Associative Learning for Network Embedding. (arXiv:2208.14376v1 [cs.LG])
    The network embedding task is to represent the node in the network as a low-dimensional vector while incorporating the topological and structural information. Most existing approaches solve this problem by factorizing a proximity matrix, either directly or implicitly. In this work, we introduce a network embedding method from a new perspective, which leverages Modern Hopfield Networks (MHN) for associative learning. Our network learns associations between the content of each node and that node's neighbors. These associations serve as memories in the MHN. The recurrent dynamics of the network make it possible to recover the masked node, given that node's neighbors. Our proposed method is evaluated on different downstream tasks such as node classification and linkage prediction. The results show competitive performance compared to the common matrix factorization techniques and deep learning based methods.
    Inference and Optimization for Engineering and Physical Systems. (arXiv:2208.13880v1 [cs.LG])
    The central object of this PhD thesis is known under different names in the fields of computer science and statistical mechanics. In computer science, it is called the Maximum Cut problem, one of the famous twenty-one Karp's original NP-hard problems, while the same object from Physics is called the Ising Spin Glass model. This model of a rich structure often appears as a reduction or reformulation of real-world problems from computer science, physics and engineering. However, solving this model exactly (finding the maximal cut or the ground state) is likely to stay an intractable problem (unless $\textit{P} = \textit{NP}$) and requires the development of ad-hoc heuristics for every particular family of instances. One of the bright and beautiful connections between discrete and continuous optimization is a Semidefinite Programming-based rounding scheme for Maximum Cut. This procedure allows us to find a provably near-optimal solution; moreover, this method is conjectured to be the best possible in polynomial time. In the first two chapters of this thesis, we investigate local non-convex heuristics intended to improve the rounding scheme. In the last chapter of this thesis, we make one step further and aim to control the solution of the problem we wanted to solve in previous chapters. We formulate a bi-level optimization problem over the Ising model where we want to tweak the interactions as little as possible so that the ground state of the resulting Ising model satisfies the desired criteria. This kind of problem arises in pandemic modeling. We show that when the interactions are non-negative, our bi-level optimization is solvable in polynomial time using convex programming.
    DASH: Distributed Adaptive Sequencing Heuristic for Submodular Maximization. (arXiv:2206.09563v2 [cs.DS] UPDATED)
    MapReduce (MR) model algorithms for maximizing monotone, submodular functions subject to a cardinality constraint (SMCC) are currently restricted to the use of the linear-adaptive (non-parallelizable) algorithm GREEDY. Low-adaptive algorithms do not satisfy the requirements of these distributed MR frameworks, thereby limiting their performance. We study the SMCC problem in a distributed setting and propose the first MR algorithms with sublinear adaptive complexity. Our algorithms, R-DASH, T-DASH and G-DASH provide ($0.316-\varepsilon$), ($0.375-\varepsilon$) and ($0.632-\varepsilon$) approximation ratio respectively with near-optimal adaptive complexity. Additionally, we provide a memory-efficient framework MED that eliminates the memory limitations of all MR model algorithms resulting from large distributed setups or considerable cardinality constraints. Finally, we provide empirical evidence to demonstrate that our sublinear-adaptive distributed algorithms provide orders of magnitude faster speedup in runtime compared to current state-of-the-art distributed algorithms while producing near identical results.
    RAGUEL: Recourse-Aware Group Unfairness Elimination. (arXiv:2208.14175v1 [cs.LG])
    While machine learning and ranking-based systems are in widespread use for sensitive decision-making processes (e.g., determining job candidates, assigning credit scores), they are rife with concerns over unintended biases in their outcomes, which makes algorithmic fairness (e.g., demographic parity, equal opportunity) an objective of interest. 'Algorithmic recourse' offers feasible recovery actions to change unwanted outcomes through the modification of attributes. We introduce the notion of ranked group-level recourse fairness, and develop a 'recourse-aware ranking' solution that satisfies ranked recourse fairness constraints while minimizing the cost of suggested modifications. Our solution suggests interventions that can reorder the ranked list of database records and mitigate group-level unfairness; specifically, disproportionate representation of sub-groups and recourse cost imbalance. This re-ranking identifies the minimum modifications to data points, with these attribute modifications weighted according to their ease of recourse. We then present an efficient block-based extension that enables re-ranking at any granularity (e.g., multiple brackets of bank loan interest rates, multiple pages of search engine results). Evaluation on real datasets shows that, while existing methods may even exacerbate recourse unfairness, our solution -- RAGUEL -- significantly improves recourse-aware fairness. RAGUEL outperforms alternatives at improving recourse fairness, through a combined process of counterfactual generation and re-ranking, whilst remaining efficient for large-scale datasets.
    Analysis of Distributed Deep Learning in the Cloud. (arXiv:2208.14344v1 [cs.LG])
    We aim to resolve this problem by introducing a comprehensive distributed deep learning (DDL) profiler, which can determine the various execution "stalls" that DDL suffers from while running on a public cloud. We have implemented the profiler by extending prior work to additionally estimate two types of communication stalls - interconnect and network stalls. We train popular DNN models using the profiler to characterize various AWS GPU instances and list their advantages and shortcomings for users to make an informed decision. We observe that the more expensive GPU instances may not be the most performant for all DNN models and AWS may sub-optimally allocate hardware interconnect resources. Specifically, the intra-machine interconnect can introduce communication overheads up to 90% of DNN training time and network-connected instances can suffer from up to 5x slowdown compared to training on a single instance. Further, we model the impact of DNN macroscopic features such as the number of layers and the number of gradients on communication stalls. Finally, we propose a measurement-based recommendation model for users to lower their public cloud monetary costs for DDL, given a time budget.
    Weakly Supervised Faster-RCNN+FPN to classify animals in camera trap images. (arXiv:2208.14060v1 [cs.CV])
    Camera traps have revolutionized the animal research of many species that were previously nearly impossible to observe due to their habitat or behavior. They are cameras generally fixed to a tree that take a short sequence of images when triggered. Deep learning has the potential to overcome the workload to automate image classification according to taxon or empty images. However, a standard deep neural network classifier fails because animals often represent a small portion of the high-definition images. That is why we propose a workflow named Weakly Object Detection Faster-RCNN+FPN which suits this challenge. The model is weakly supervised because it requires only the animal taxon label per image but doesn't require any manual bounding box annotations. First, it automatically performs the weakly-supervised bounding box annotation using the motion from multiple frames. Then, it trains a Faster-RCNN+FPN model using this weak supervision. Experimental results have been obtained with two datasets from a Papua New Guinea and Missouri biodiversity monitoring campaign, then on an easily reproducible testbed.
    Implicit Regularization and Convergence for Weight Normalization. (arXiv:1911.07956v5 [cs.LG] UPDATED)
    Normalization methods such as batch [Ioffe and Szegedy, 2015], weight [Salimansand Kingma, 2016], instance [Ulyanov et al., 2016], and layer normalization [Baet al., 2016] have been widely used in modern machine learning. Here, we study the weight normalization (WN) method [Salimans and Kingma, 2016] and a variant called reparametrized projected gradient descent (rPGD) for overparametrized least-squares regression. WN and rPGD reparametrize the weights with a scale g and a unit vector w and thus the objective function becomes non-convex. We show that this non-convex formulation has beneficial regularization effects compared to gradient descent on the original objective. These methods adaptively regularize the weights and converge close to the minimum l2 norm solution, even for initializations far from zero. For certain stepsizes of g and w , we show that they can converge close to the minimum norm solution. This is different from the behavior of gradient descent, which converges to the minimum norm solution only when started at a point in the range space of the feature matrix, and is thus more sensitive to initialization.
    Reinforcement Learning for Hardware Security: Opportunities, Developments, and Challenges. (arXiv:2208.13885v1 [cs.CR])
    Reinforcement learning (RL) is a machine learning paradigm where an autonomous agent learns to make an optimal sequence of decisions by interacting with the underlying environment. The promise demonstrated by RL-guided workflows in unraveling electronic design automation problems has encouraged hardware security researchers to utilize autonomous RL agents in solving domain-specific problems. From the perspective of hardware security, such autonomous agents are appealing as they can generate optimal actions in an unknown adversarial environment. On the other hand, the continued globalization of the integrated circuit supply chain has forced chip fabrication to off-shore, untrustworthy entities, leading to increased concerns about the security of the hardware. Furthermore, the unknown adversarial environment and increasing design complexity make it challenging for defenders to detect subtle modifications made by attackers (a.k.a. hardware Trojans). In this brief, we outline the development of RL agents in detecting hardware Trojans, one of the most challenging hardware security problems. Additionally, we outline potential opportunities and enlist the challenges of applying RL to solve hardware security problems.
    MeloForm: Generating Melody with Musical Form based on Expert Systems and Neural Networks. (arXiv:2208.14345v1 [cs.SD])
    Human usually composes music by organizing elements according to the musical form to express music ideas. However, for neural network-based music generation, it is difficult to do so due to the lack of labelled data on musical form. In this paper, we develop MeloForm, a system that generates melody with musical form using expert systems and neural networks. Specifically, 1) we design an expert system to generate a melody by developing musical elements from motifs to phrases then to sections with repetitions and variations according to pre-given musical form; 2) considering the generated melody is lack of musical richness, we design a Transformer based refinement model to improve the melody without changing its musical form. MeloForm enjoys the advantages of precise musical form control by expert systems and musical richness learning via neural models. Both subjective and objective experimental evaluations demonstrate that MeloForm generates melodies with precise musical form control with 97.79% accuracy, and outperforms baseline systems in terms of subjective evaluation score by 0.75, 0.50, 0.86 and 0.89 in structure, thematic, richness and overall quality, without any labelled musical form data. Besides, MeloForm can support various kinds of forms, such as verse and chorus form, rondo form, variational form, sonata form, etc.
    Adversarial Examples for Good: Adversarial Examples Guided Imbalanced Learning. (arXiv:2201.12356v2 [cs.LG] UPDATED)
    Adversarial examples are inputs for machine learning models that have been designed by attackers to cause the model to make mistakes. In this paper, we demonstrate that adversarial examples can also be utilized for good to improve the performance of imbalanced learning. We provide a new perspective on how to deal with imbalanced data: adjust the biased decision boundary by training with Guiding Adversarial Examples (GAEs). Our method can effectively increase the accuracy of minority classes while sacrificing little accuracy on majority classes. We empirically show, on several benchmark datasets, our proposed method is comparable to the state-of-the-art method. To our best knowledge, we are the first to deal with imbalanced learning with adversarial examples.
    Tackling Multimodal Device Distributions in Inverse Photonic Design using Invertible Neural Networks. (arXiv:2208.14212v1 [cs.LG])
    Inverse design, the process of matching a device or process parameters to exhibit a desired performance, is applied in many disciplines ranging from material design over chemical processes and to engineering. Machine learning has emerged as a promising approach to overcome current limitations imposed by the dimensionality of the parameter space and multimodal parameter distributions. Most traditional optimization routines assume an invertible one-to-one mapping between the design parameters and the target performance. However, comparable or even identical performance may be realized by different designs, yielding a multimodal distribution of possible solutions to the inverse design problem which confuses the optimization algorithm. Here, we show how a generative modeling approach based on invertible neural networks can provide the full distribution of possible solutions to the inverse design problem and resolve the ambiguity of nanodevice inverse design problems featuring multimodal distributions. We implement a Conditional Invertible Neural Network (cINN) and apply it to a proof-of-principle nanophotonic problem, consisting in tailoring the transmission spectrum of a metallic film milled by subwavelength indentations. We compare our approach with the commonly used conditional Variational Autoencoder (cVAE) framework and show the superior flexibility and accuracy of the proposed cINNs when dealing with multimodal device distributions. Our work shows that invertible neural networks provide a valuable and versatile toolkit for advancing inverse design in nanoscience and nanotechnology.
    k-MS: A novel clustering algorithm based on morphological reconstruction. (arXiv:2208.14390v1 [cs.LG])
    This work proposes a clusterization algorithm called k-Morphological Sets (k-MS), based on morphological reconstruction and heuristics. k-MS is faster than the CPU-parallel k-Means in worst case scenarios and produces enhanced visualizations of the dataset as well as very distinct clusterizations. It is also faster than similar clusterization methods that are sensitive to density and shapes such as Mitosis and TRICLUST. In addition, k-MS is deterministic and has an intrinsic sense of maximal clusters that can be created for a given input sample and input parameters, differing from k-Means and other clusterization algorithms. In other words, given a constant k, a structuring element and a dataset, k-MS produces k or less clusters without using random/ pseudo-random functions. Finally, the proposed algorithm also provides a straightforward means for removing noise from images or datasets in general.
    TAG: Task-based Accumulated Gradients for Lifelong learning. (arXiv:2105.05155v3 [cs.LG] UPDATED)
    When an agent encounters a continual stream of new tasks in the lifelong learning setting, it leverages the knowledge it gained from the earlier tasks to help learn the new tasks better. In such a scenario, identifying an efficient knowledge representation becomes a challenging problem. Most research works propose to either store a subset of examples from the past tasks in a replay buffer, dedicate a separate set of parameters to each task or penalize excessive updates over parameters by introducing a regularization term. While existing methods employ the general task-agnostic stochastic gradient descent update rule, we propose a task-aware optimizer that adapts the learning rate based on the relatedness among tasks. We utilize the directions taken by the parameters during the updates by accumulating the gradients specific to each task. These task-based accumulated gradients act as a knowledge base that is maintained and updated throughout the stream. We empirically show that our proposed adaptive learning rate not only accounts for catastrophic forgetting but also allows positive backward transfer. We also show that our method performs better than several state-of-the-art methods in lifelong learning on complex datasets with a large number of tasks.
    A Comprehensive Review of Digital Twin -- Part 1: Modeling and Twinning Enabling Technologies. (arXiv:2208.14197v1 [cs.CE])
    As an emerging technology in the era of Industry 4.0, digital twin is gaining unprecedented attention because of its promise to further optimize process design, quality control, health monitoring, decision and policy making, and more, by comprehensively modeling the physical world as a group of interconnected digital models. In a two-part series of papers, we examine the fundamental role of different modeling techniques, twinning enabling technologies, and uncertainty quantification and optimization methods commonly used in digital twins. This first paper presents a thorough literature review of digital twin trends across many disciplines currently pursuing this area of research. Then, digital twin modeling and twinning enabling technologies are further analyzed by classifying them into two main categories: physical-to-virtual, and virtual-to-physical, based on the direction in which data flows. Finally, this paper provides perspectives on the trajectory of digital twin technology over the next decade, and introduces a few emerging areas of research which will likely be of great use in future digital twin research. In part two of this review, the role of uncertainty quantification and optimization are discussed, a battery digital twin is demonstrated, and more perspectives on the future of digital twin are shared.
    Modelling variability in vibration-based PBSHM via a generalised population form. (arXiv:2203.07115v2 [cs.LG] UPDATED)
    Structural health monitoring (SHM) has been an active research area for the last three decades, and has accumulated a number of critical advances over that period, as can be seen in the literature. However, SHM is still facing challenges because of the paucity of damage-state data, operational and environmental fluctuations, repeatability issues, and changes in boundary conditions. These issues present as inconsistencies in the captured features and can have a huge impact on the practical implementation, but more critically, on the generalisation of the technology. Population-based SHM has been designed to address some of these concerns by modelling and transferring missing information using data collected from groups of similar structures. In this work, vibration data were collected from four healthy, nominally-identical, full-scale composite helicopter blades. Manufacturing differences (e.g., slight differences in geometry and/or material properties), among the blades presented as variability in their structural dynamics, which can be very problematic for SHM based on machine learning from vibration data. This work aims to address this variability by defining a general model for the frequency response functions of the blades, called a form, using mixtures of Gaussian processes.
    Prediction-based One-shot Dynamic Parking Pricing. (arXiv:2208.14231v1 [cs.LG])
    Many U.S. metropolitan cities are notorious for their severe shortage of parking spots. To this end, we present a proactive prediction-driven optimization framework to dynamically adjust parking prices. We use state-of-the-art deep learning technologies such as neural ordinary differential equations (NODEs) to design our future parking occupancy rate prediction model given historical occupancy rates and price information. Owing to the continuous and bijective characteristics of NODEs, in addition, we design a one-shot price optimization method given a pre-trained prediction model, which requires only one iteration to find the optimal solution. In other words, we optimize the price input to the pre-trained prediction model to achieve targeted occupancy rates in the parking blocks. We conduct experiments with the data collected in San Francisco and Seattle for years. Our prediction model shows the best accuracy in comparison with various temporal or spatio-temporal forecasting models. Our one-shot optimization method greatly outperforms other black-box and white-box search methods in terms of the search time and always returns the optimal price solution.
    Neural Architecture Search for Improving Latency-Accuracy Trade-off in Split Computing. (arXiv:2208.13968v1 [cs.LG])
    This paper proposes a neural architecture search (NAS) method for split computing. Split computing is an emerging machine-learning inference technique that addresses the privacy and latency challenges of deploying deep learning in IoT systems. In split computing, neural network models are separated and cooperatively processed using edge servers and IoT devices via networks. Thus, the architecture of the neural network model significantly impacts the communication payload size, model accuracy, and computational load. In this paper, we address the challenge of optimizing neural network architecture for split computing. To this end, we proposed NASC, which jointly explores optimal model architecture and a split point to achieve higher accuracy while meeting latency requirements (i.e., smaller total latency of computation and communication than a certain threshold). NASC employs a one-shot NAS that does not require repeating model training for a computationally efficient architecture search. Our performance evaluation using hardware (HW)-NAS-Bench of benchmark data demonstrates that the proposed NASC can improve the ``communication latency and model accuracy" trade-off, i.e., reduce the latency by approximately 40-60% from the baseline, with slight accuracy degradation.
    CUAHN-VIO: Content-and-Uncertainty-Aware Homography Network for Visual-Inertial Odometry. (arXiv:2208.13935v1 [cs.RO])
    Learning-based visual ego-motion estimation is promising yet not ready for navigating agile mobile robots in the real world. In this article, we propose CUAHN-VIO, a robust and efficient monocular visual-inertial odometry (VIO) designed for micro aerial vehicles (MAVs) equipped with a downward-facing camera. The vision frontend is a content-and-uncertainty-aware homography network (CUAHN) that is robust to non-homography image content and failure cases of network prediction. It not only predicts the homography transformation but also estimates its uncertainty. The training is self-supervised, so that it does not require ground truth that is often difficult to obtain. The network has good generalization that enables "plug-and-play" deployment in new environments without fine-tuning. A lightweight extended Kalman filter (EKF) serves as the VIO backend and utilizes the mean prediction and variance estimation from the network for visual measurement updates. CUAHN-VIO is evaluated on a high-speed public dataset and shows rivaling accuracy to state-of-the-art (SOTA) VIO approaches. Thanks to the robustness to motion blur, low network inference time (~23ms), and stable processing latency (~26ms), CUAHN-VIO successfully runs onboard an Nvidia Jetson TX2 embedded processor to navigate a fast autonomous MAV.
    StableFace: Analyzing and Improving Motion Stability for Talking Face Generation. (arXiv:2208.13717v1 [cs.CV] CROSS LISTED)
    While previous speech-driven talking face generation methods have made significant progress in improving the visual quality and lip-sync quality of the synthesized videos, they pay less attention to lip motion jitters which greatly undermine the realness of talking face videos. What causes motion jitters, and how to mitigate the problem? In this paper, we conduct systematic analyses on the motion jittering problem based on a state-of-the-art pipeline that uses 3D face representations to bridge the input audio and output video, and improve the motion stability with a series of effective designs. We find that several issues can lead to jitters in synthesized talking face video: 1) jitters from the input 3D face representations; 2) training-inference mismatch; 3) lack of dependency modeling among video frames. Accordingly, we propose three effective solutions to address this issue: 1) we propose a gaussian-based adaptive smoothing module to smooth the 3D face representations to eliminate jitters in the input; 2) we add augmented erosions on the input data of the neural renderer in training to simulate the distortion in inference to reduce mismatch; 3) we develop an audio-fused transformer generator to model dependency among video frames. Besides, considering there is no off-the-shelf metric for measuring motion jitters in talking face video, we devise an objective metric (Motion Stability Index, MSI), to quantitatively measure the motion jitters by calculating the reciprocal of variance acceleration. Extensive experimental results show the superiority of our method on motion-stable face video generation, with better quality than previous systems.
    Effective Multi-User Delay-Constrained Scheduling with Deep Recurrent Reinforcement Learning. (arXiv:2208.14074v1 [cs.LG])
    Multi-user delay constrained scheduling is important in many real-world applications including wireless communication, live streaming, and cloud computing. Yet, it poses a critical challenge since the scheduler needs to make real-time decisions to guarantee the delay and resource constraints simultaneously without prior information of system dynamics, which can be time-varying and hard to estimate. Moreover, many practical scenarios suffer from partial observability issues, e.g., due to sensing noise or hidden correlation. To tackle these challenges, we propose a deep reinforcement learning (DRL) algorithm, named Recurrent Softmax Delayed Deep Double Deterministic Policy Gradient ($\mathtt{RSD4}$), which is a data-driven method based on a Partially Observed Markov Decision Process (POMDP) formulation. $\mathtt{RSD4}$ guarantees resource and delay constraints by Lagrangian dual and delay-sensitive queues, respectively. It also efficiently tackles partial observability with a memory mechanism enabled by the recurrent neural network (RNN) and introduces user-level decomposition and node-level merging to ensure scalability. Extensive experiments on simulated/real-world datasets demonstrate that $\mathtt{RSD4}$ is robust to system dynamics and partially observable environments, and achieves superior performances over existing DRL and non-DRL-based methods.
    Learning Representations for Hyper-Relational Knowledge Graphs. (arXiv:2208.14322v1 [cs.LG])
    Knowledge graphs (KGs) have gained prominence for their ability to learn representations for uni-relational facts. Recently, research has focused on modeling hyper-relational facts, which move beyond the restriction of uni-relational facts and allow us to represent more complex and real-world information. However, existing approaches for learning representations on hyper-relational KGs majorly focus on enhancing the communication from qualifiers to base triples while overlooking the flow of information from base triple to qualifiers. This can lead to suboptimal qualifier representations, especially when a large amount of qualifiers are presented. It motivates us to design a framework that utilizes multiple aggregators to learn representations for hyper-relational facts: one from the perspective of the base triple and the other one from the perspective of the qualifiers. Experiments demonstrate the effectiveness of our framework for hyper-relational knowledge graph completion across multiple datasets. Furthermore, we conduct an ablation study that validates the importance of the various components in our framework. The code to reproduce our results can be found at \url{https://github.com/HarryShomer/QUAD}.
    Convergence Rates of Training Deep Neural Networks via Alternating Minimization Methods. (arXiv:2208.14318v1 [cs.LG])
    Training deep neural networks (DNNs) is an important and challenging optimization problem in machine learning due to its non-convexity and non-separable structure. The alternating minimization (AM) approaches split the composition structure of DNNs and have drawn great interest in the deep learning and optimization communities. In this paper, we propose a unified framework for analyzing the convergence rate of AM-type network training methods. Our analysis are based on the $j$-step sufficient decrease conditions and the Kurdyka-Lojasiewicz (KL) property, which relaxes the requirement of designing descent algorithms. We show the detailed local convergence rate if the KL exponent $\theta$ varies in $[0,1)$. Moreover, the local R-linear convergence is discussed under a stronger $j$-step sufficient decrease condition.
    A deep learning-based remaining useful life prediction approach for bearings. (arXiv:1812.03315v2 [cs.LG] UPDATED)
    In industrial applications, nearly half the failures of motors are caused by the degradation of rolling element bearings (REBs). Therefore, accurately estimating the remaining useful life (RUL) for REBs are of crucial importance to ensure the reliability and safety of mechanical systems. To tackle this challenge, model-based approaches are often limited by the complexity of mathematical modeling. Conventional data-driven approaches, on the other hand, require massive efforts to extract the degradation features and construct health index. In this paper, a novel online data-driven framework is proposed to exploit the adoption of deep convolutional neural networks (CNN) in predicting the RUL of bearings. More concretely, the raw vibrations of training bearings are first processed using the Hilbert-Huang transform (HHT) and a novel nonlinear degradation indicator is constructed as the label for learning. The CNN is then employed to identify the hidden pattern between the extracted degradation indicator and the vibration of training bearings, which makes it possible to estimate the degradation of the test bearings automatically. Finally, testing bearings' RULs are predicted by using a $\epsilon$-support vector regression model. The superior performance of the proposed RUL estimation framework, compared with the state-of-the-art approaches, is demonstrated through the experimental results. The generality of the proposed CNN model is also validated by transferring to bearings undergoing different operating conditions.
    Fault Detection for Non-Condensing Boilers using Simulated Building Automation System Sensor Data. (arXiv:2205.08418v2 [eess.SP] UPDATED)
    Building performance has been shown to degrade significantly after commissioning, resulting in increased energy consumption and associated greenhouse gas emissions. Continuous Commissioning using existing sensor networks and IoT devices has the potential to minimize this waste by continually identifying system degradation and re-tuning control strategies to adapt to real building performance. Due to its significant contribution to greenhouse gas emissions, the performance of gas boiler systems for building heating is critical. A review of boiler performance studies has been used to develop a set of common faults and degraded performance conditions, which have been integrated into a MATLAB/Simulink emulator. This resulted in a labeled dataset with approximately 10,000 simulations of steady-state performance for each of 14 non-condensing boilers. The collected data is used for training and testing fault classification using K-nearest neighbour, Decision tree, Random Forest, and Support Vector Machines. The results show that the Support Vector Machines method gave the best prediction accuracy, consistently exceeding 90%, and generalization across multiple boilers is not possible due to low classification accuracy.
    Distributed Ensembles of Reinforcement Learning Agents for Electricity Control. (arXiv:2208.14338v1 [cs.LG])
    Deep Reinforcement Learning (or just "RL") is gaining popularity for industrial and research applications. However, it still suffers from some key limits slowing down its widespread adoption. Its performance is sensitive to initial conditions and non-determinism. To unlock those challenges, we propose a procedure for building ensembles of RL agents to efficiently build better local decisions toward long-term cumulated rewards. For the first time, hundreds of experiments have been done to compare different ensemble constructions procedures in 2 electricity control environments. We discovered an ensemble of 4 agents improves accumulated rewards by 46%, improves reproducibility by a factor of 3.6, and can naturally and efficiently train and predict in parallel on GPUs and CPUs.
    Offline Reinforcement Learning: Fundamental Barriers for Value Function Approximation. (arXiv:2111.10919v2 [cs.LG] UPDATED)
    We consider the offline reinforcement learning problem, where the aim is to learn a decision making policy from logged data. Offline RL -- particularly when coupled with (value) function approximation to allow for generalization in large or continuous state spaces -- is becoming increasingly relevant in practice, because it avoids costly and time-consuming online data collection and is well suited to safety-critical domains. Existing sample complexity guarantees for offline value function approximation methods typically require both (1) distributional assumptions (i.e., good coverage) and (2) representational assumptions (i.e., ability to represent some or all $Q$-value functions) stronger than what is required for supervised learning. However, the necessity of these conditions and the fundamental limits of offline RL are not well understood in spite of decades of research. This led Chen and Jiang (2019) to conjecture that concentrability (the most standard notion of coverage) and realizability (the weakest representation condition) alone are not sufficient for sample-efficient offline RL. We resolve this conjecture in the positive by proving that in general, even if both concentrability and realizability are satisfied, any algorithm requires sample complexity polynomial in the size of the state space to learn a non-trivial policy. Our results show that sample-efficient offline reinforcement learning requires either restrictive coverage conditions or representation conditions that go beyond supervised learning, and highlight a phenomenon called over-coverage which serves as a fundamental barrier for offline value function approximation methods. A consequence of our results for reinforcement learning with linear function approximation is that the separation between online and offline RL can be arbitrarily large, even in constant dimension.
    Fair Ranking as Fair Division: Impact-Based Individual Fairness in Ranking. (arXiv:2206.07247v2 [cs.IR] UPDATED)
    Rankings have become the primary interface in two-sided online markets. Many have noted that the rankings not only affect the satisfaction of the users (e.g., customers, listeners, employers, travelers), but that the position in the ranking allocates exposure -- and thus economic opportunity -- to the ranked items (e.g., articles, products, songs, job seekers, restaurants, hotels). This has raised questions of fairness to the items, and most existing works have addressed fairness by explicitly linking item exposure to item relevance. However, we argue that any particular choice of such a link function may be difficult to defend, and we show that the resulting rankings can still be unfair. To avoid these shortcomings, we develop a new axiomatic approach that is rooted in principles of fair division. This not only avoids the need to choose a link function, but also more meaningfully quantifies the impact on the items beyond exposure. Our axioms of envy-freeness and dominance over uniform ranking postulate that for a fair ranking policy every item should prefer their own rank allocation over that of any other item, and that no item should be actively disadvantaged by the rankings. To compute ranking policies that are fair according to these axioms, we propose a new ranking objective related to the Nash Social Welfare. We show that the solution has guarantees regarding its envy-freeness, its dominance over uniform rankings for every item, and its Pareto optimality. In contrast, we show that conventional exposure-based fairness can produce large amounts of envy and have a highly disparate impact on the items. Beyond these theoretical results, we illustrate empirically how our framework controls the trade-off between impact-based individual item fairness and user utility.
    Solving a directed percolation inverse problem. (arXiv:2201.12222v4 [cond-mat.dis-nn] UPDATED)
    We present a directed percolation inverse problem for diode networks: Given information about which pairs of nodes allow current to percolate from one to the other, can one find a configuration of diodes consistent with the observed currents? We implement a divide-and-concur iterative projection method for solving the problem and demonstrate the supremacy of our method over an exhaustive approach for nontrivial instances of the problem. We find that the problem is most difficult when some but not all of the percolation data are hidden, and that the most difficult networks to reconstruct generally are those for which the currents are most sensitive to the addition or removal of a single diode.
    Towards making the most of NLP-based device mapping optimization for OpenCL kernels. (arXiv:2208.14124v1 [cs.LG])
    Nowadays, we are living in an era of extreme device heterogeneity. Despite the high variety of conventional CPU architectures, accelerator devices, such as GPUs and FPGAs, also appear in the foreground exploding the pool of available solutions to execute applications. However, choosing the appropriate device per application needs is an extremely challenging task due to the abstract relationship between hardware and software. Automatic optimization algorithms that are accurate are required to cope with the complexity and variety of current hardware and software. Optimal execution has always relied on time-consuming trial and error approaches. Machine learning (ML) and Natural Language Processing (NLP) has flourished over the last decade with research focusing on deep architectures. In this context, the use of natural language processing techniques to source code in order to conduct autotuning tasks is an emerging field of study. In this paper, we extend the work of Cummins et al., namely Deeptune, that tackles the problem of optimal device selection (CPU or GPU) for accelerated OpenCL kernels. We identify three major limitations of Deeptune and, based on these, we propose four different DNN models that provide enhanced contextual information of source codes. Experimental results show that our proposed methodology surpasses that of Cummins et al. work, providing up to 4\% improvement in prediction accuracy.
    Block Mean Approximation for Efficient Second Order Optimization. (arXiv:1804.05484v4 [cs.LG] UPDATED)
    Advanced optimization algorithms such as Newton method and AdaGrad benefit from second order derivative or second order statistics to achieve better descent directions and faster convergence rates. At their heart, such algorithms need to compute the inverse or inverse square root of a matrix whose size is quadratic of the dimensionality of the search space. For high dimensional search spaces, the matrix inversion or inversion of square root becomes overwhelming which in turn demands for approximate methods. In this work, we propose a new matrix approximation method which divides a matrix into blocks and represents each block by one or two numbers. The method allows efficient computation of matrix inverse and inverse square root. We apply our method to AdaGrad in training deep neural networks. Experiments show encouraging results compared to the diagonal approximation.
    Optimal Rates for Distributed Learning with Random Features. (arXiv:1906.03155v4 [cs.LG] UPDATED)
    In recent studies, the generalization properties for distributed learning and random features assumed the existence of the target concept over the hypothesis space. However, this strict condition is not applicable to the more common non-attainable case. In this paper, using refined proof techniques, we first extend the optimal rates for distributed learning with random features to the non-attainable case. Then, we reduce the number of required random features via data-dependent generating strategy, and improve the allowed number of partitions with additional unlabeled data. Theoretical analysis shows these techniques remarkably reduce computational cost while preserving the optimal generalization accuracy under standard assumptions. Finally, we conduct several experiments on both simulated and real-world datasets, and the empirical results validate our theoretical findings.
    ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural Network Quantization. (arXiv:2208.14286v1 [cs.LG])
    Quantization is a technique to reduce the computation and memory cost of DNN models, which are getting increasingly large. Existing quantization solutions use fixed-point integer or floating-point types, which have limited benefits, as both require more bits to maintain the accuracy of original models. On the other hand, variable-length quantization uses low-bit quantization for normal values and high-precision for a fraction of outlier values. Even though this line of work brings algorithmic benefits, it also introduces significant hardware overheads due to variable-length encoding and decoding. In this work, we propose a fixed-length adaptive numerical data type called ANT to achieve low-bit quantization with tiny hardware overheads. Our data type ANT leverages two key innovations to exploit the intra-tensor and inter-tensor adaptive opportunities in DNN models. First, we propose a particular data type, flint, that combines the advantages of float and int for adapting to the importance of different values within a tensor. Second, we propose an adaptive framework that selects the best type for each tensor according to its distribution characteristics. We design a unified processing element architecture for ANT and show its ease of integration with existing DNN accelerators. Our design results in 2.8$\times$ speedup and 2.5$\times$ energy efficiency improvement over the state-of-the-art quantization accelerators.
    A Discontinuity Capturing Shallow Neural Network for Elliptic Interface Problems. (arXiv:2106.05587v3 [math.NA] UPDATED)
    In this paper, a new Discontinuity Capturing Shallow Neural Network (DCSNN) for approximating $d$-dimensional piecewise continuous functions and for solving elliptic interface problems is developed. There are three novel features in the present network; namely, (i) jump discontinuities are accurately captured, (ii) it is completely shallow, comprising only one hidden layer, (iii) it is completely mesh-free for solving partial differential equations. The crucial idea here is that a $d$-dimensional piecewise continuous function can be extended to a continuous function defined in $(d+1)$-dimensional space, where the augmented coordinate variable labels the pieces of each sub-domain. We then construct a shallow neural network to express this new function. Since only one hidden layer is employed, the number of training parameters (weights and biases) scales linearly with the dimension and the neurons used in the hidden layer. For solving elliptic interface problems, the network is trained by minimizing the mean square error loss that consists of the residual of the governing equation, boundary condition, and the interface jump conditions. We perform a series of numerical tests to demonstrate the accuracy of the present network. Our DCSNN model is efficient due to only a moderate number of parameters needed to be trained (a few hundred parameters used throughout all numerical examples), and the results indicate good accuracy. Compared with the results obtained by the traditional grid-based immersed interface method (IIM), which is designed particularly for elliptic interface problems, our network model shows a better accuracy than IIM. We conclude by solving a six-dimensional problem to demonstrate the capability of the present network for high-dimensional applications.
    Conjugate Natural Selection. (arXiv:2208.13898v1 [cs.LG])
    We prove that natural gradient descent, with respect to the parameters of a machine learning policy, admits a conjugate dynamical description consistent with evolution by natural selection. We characterize these conjugate dynamics as a locally optimal fit to the continuous-time replicator dynamics, and show that the Price equation applies to equivalence classes of functions belonging to a Hilbert space generated by the policy's architecture and parameters. We posit that "conjugate natural selection" intuitively explains the empirical effectiveness of natural gradient descent, while developing a useful analytic approach to the dynamics of machine learning.
    Texture Extraction Methods Based Ensembling Framework for Improved Classification. (arXiv:2206.04158v2 [cs.CV] UPDATED)
    Texture-based classification solutions have proven their significance in many domains, from industrial inspections to health-related applications. New methods have been developed based on texture feature learning and CNN-based architectures to address computer vision use cases for images with rich texture-based features. In recent years, architectures solving texture-based classification problems and demonstrating state-of-the-art results have emerged. Yet, one limitation of these approaches is that they cannot claim to be suitable for all types of image texture patterns. Each technique has an advantage for a specific texture type only. To address this shortcoming, we propose a framework that combines more than one texture-based techniques together, uniquely, with a CNN backbone to extract the most relevant texture features. This enables the model to be trained in a self-selective manner and produce improved results over current published benchmarks -- with almost same number of model parameters. Our proposed framework works well on most texture types simultaneously and allows flexibility for additional texture-based methods to be accommodated to achieve better results than existing architectures. In this work, firstly, we present an analysis on the relative importance of existing techniques when used alone and in combination with other TE methods on benchmark datasets. Secondly, we show that Global Average Pooling which represents the spatial information -- is of less significance in comparison to the TE method(s) applied in the network while training for texture-based classification tasks. Finally, we present state-of-the-art results for several texture-based benchmark datasets by combining three existing texture-based techniques using our proposed framework.
    Achieving Fairness with a Simple Ridge Penalty. (arXiv:2105.13817v3 [cs.LG] UPDATED)
    In this paper we present a general framework for estimating regression models subject to a user-defined level of fairness. We enforce fairness as a model selection step in which we choose the value of a ridge penalty to control the effect of sensitive attributes. We then estimate the parameters of the model conditional on the chosen penalty value. Our proposal is mathematically simple, with a solution that is partly in closed form, and produces estimates of the regression coefficients that are intuitive to interpret as a function of the level of fairness. Furthermore, it is easily extended to generalised linear models, kernelised regression models and other penalties; and it can accommodate multiple definitions of fairness. We compare our approach with the regression model from Komiyama et al. (2018), which implements a provably-optimal linear regression model; and with the fair models from Zafar et al. (2019). We evaluate these approaches empirically on six different data sets, and we find that our proposal provides better goodness of fit and better predictive accuracy for the same level of fairness. In addition, we highlight a source of bias in the original experimental evaluation in Komiyama et al. (2018).
    Attention-based Interpretable Regression of Gene Expression in Histology. (arXiv:2208.13776v1 [q-bio.QM])
    Interpretability of deep learning is widely used to evaluate the reliability of medical imaging models and reduce the risks of inaccurate patient recommendations. For models exceeding human performance, e.g. predicting RNA structure from microscopy images, interpretable modelling can be further used to uncover highly non-trivial patterns which are otherwise imperceptible to the human eye. We show that interpretability can reveal connections between the microscopic appearance of cancer tissue and its gene expression profiling. While exhaustive profiling of all genes from the histology images is still challenging, we estimate the expression values of a well-known subset of genes that is indicative of cancer molecular subtype, survival, and treatment response in colorectal cancer. Our approach successfully identifies meaningful information from the image slides, highlighting hotspots of high gene expression. Our method can help characterise how gene expression shapes tissue morphology and this may be beneficial for patient stratification in the pathology unit. The code is available on GitHub.
    Machine learning in the prediction of cardiac epicardial and mediastinal fat volumes. (arXiv:2208.14374v1 [cs.CV])
    We propose a methodology to predict the cardiac epicardial and mediastinal fat volumes in computed tomography images using regression algorithms. The obtained results indicate that it is feasible to predict these fats with a high degree of correlation, thus alleviating the requirement for manual or automatic segmentation of both fat volumes. Instead, segmenting just one of them suffices, while the volume of the other may be predicted fairly precisely. The correlation coefficient obtained by the Rotation Forest algorithm using MLP Regressor for predicting the mediastinal fat based on the epicardial fat was 0.9876, with a relative absolute error of 14.4% and a root relative squared error of 15.7%. The best correlation coefficient obtained in the prediction of the epicardial fat based on the mediastinal was 0.9683 with a relative absolute error of 19.6% and a relative squared error of 24.9%. Moreover, we analysed the feasibility of using linear regressors, which provide an intuitive interpretation of the underlying approximations. In this case, the obtained correlation coefficient was 0.9534 for predicting the mediastinal fat based on the epicardial, with a relative absolute error of 31.6% and a root relative squared error of 30.1%. On the prediction of the epicardial fat based on the mediastinal fat, the correlation coefficient was 0.8531, with a relative absolute error of 50.43% and a root relative squared error of 52.06%. In summary, it is possible to speed up general medical analyses and some segmentation and quantification methods that are currently employed in the state-of-the-art by using this prediction approach, which consequently reduces costs and therefore enables preventive treatments that may lead to a reduction of health problems.
    Modeling Volatility and Dependence of European Carbon and Energy Prices. (arXiv:2208.14311v1 [q-fin.ST])
    We study the prices of European Emission Allowances (EUA), whereby we analyze their uncertainty and dependencies on related energy markets. We propose a probabilistic multivariate conditional time series model that exploits key characteristics of the data. The forecasting performance of the proposed model and various competing models is evaluated in an extensive rolling window forecasting study, covering almost two years out-of-sample. Thereby, we forecast 30-steps ahead. The accuracy of the multivariate probabilistic forecasts is assessed by the energy score. We discuss our findings focusing on volatility spillovers and time-varying correlations, also in view of the Russian invasion of Ukraine.
    MCTensor: A High-Precision Deep Learning Library with Multi-Component Floating-Point. (arXiv:2207.08867v2 [cs.LG] UPDATED)
    In this paper, we introduce MCTensor, a library based on PyTorch for providing general-purpose and high-precision arithmetic for DL training. MCTensor is used in the same way as PyTorch Tensor: we implement multiple basic, matrix-level computation operators and NN modules for MCTensor with identical PyTorch interface. Our algorithms achieve high precision computation and also benefits from heavily-optimized PyTorch floating-point arithmetic. We evaluate MCTensor arithmetic against PyTorch native arithmetic for a series of tasks, where models using MCTensor in float16 would match or outperform the PyTorch model with float32 or float64 precision.
    A Dataset and Benchmark for Automatically Answering and Generating Machine Learning Final Exams. (arXiv:2206.05442v2 [cs.LG] UPDATED)
    Can a machine learn machine learning? We propose to answer this question using the same criteria we use to answer a similar question: can a human learn machine learning? We automatically answer MIT final exams in Introduction to Machine Learning at a human level. The course is a large undergraduate class with around five hundred students each semester. Recently, program synthesis and few-shot learning solved university-level problem set questions in mathematics and STEM courses at a human level. In this work, we solve questions from final exams that differ from problem sets in several ways: the questions are longer, have multiple parts, are more complicated, and span a broader set of topics. We provide a new dataset and benchmark of questions from eight MIT Introduction to Machine Learning final exams between Fall 2017 and Spring 2022 and provide code for automatically answering these questions and generating new questions. We perform ablation studies comparing zero-shot learning with few-shot learning, chain-of-thought prompting, GPT-3 pre-trained on text and Codex fine-tuned on code on a range of machine learning topics and find that few-shot learning methods perform best. We make our data and code publicly available for the machine learning community.
    The case for fully Bayesian optimisation in small-sample trials. (arXiv:2208.13960v1 [cs.LG])
    While sample efficiency is the main motive for use of Bayesian optimisation when black-box functions are expensive to evaluate, the standard approach based on type II maximum likelihood (ML-II) may fail and result in disappointing performance in small-sample trials. The paper provides three compelling reasons to adopt fully Bayesian optimisation (FBO) as an alternative. First, failures of ML-II are more commonplace than implied by the existing studies using the contrived settings. Second, FBO is more robust than ML-II, and the price of robustness is almost trivial. Third, FBO has become simple to implement and fast enough to be practical. The paper supports the argument using relevant experiments, which reflect the current practice regarding models, algorithms, and software platforms. Since the benefits seem to outweigh the costs, researchers should consider adopting FBO for their applications so that they can guard against potential failures that end up wasting precious research resources.
    Super-model ecosystem: A domain-adaptation perspective. (arXiv:2208.14092v1 [cs.LG])
    This paper attempts to establish the theoretical foundation for the emerging super-model paradigm via domain adaptation, where one first trains a very large-scale model, {\it i.e.}, super model (or foundation model in some other papers), on a large amount of data and then adapts it to various specific domains. Super-model paradigms help reduce computational and data cost and carbon emission, which is critical to AI industry, especially enormous small and medium-sized enterprises. We model the super-model paradigm as a two-stage diffusion process: (1) in the pre-training stage, the model parameter diffuses from random initials and converges to a steady distribution; and (2) in the fine-tuning stage, the model parameter is transported to another steady distribution. Both training stages can be mathematically modeled by the Uhlenbeck-Ornstein process which converges to two Maxwell-Boltzmann distributions, respectively, each of which characterizes the corresponding convergent model. An $\mathcal O(1/\sqrt{N})$ generalization bound is then established via PAC-Bayesian framework. The theory finds that the generalization error of the fine-tuning stage is dominant in domain adaptation. In addition, our theory suggests that the generalization is determined by a new measure that characterizes the domain discrepancy between the source domain and target domain, based on the covariance matrices and the shift of the converged local minimum.
    Unsupervised Representation Learning in Deep Reinforcement Learning: A Review. (arXiv:2208.14226v1 [cs.LG])
    This review addresses the problem of learning abstract representations of the measurement data in the context of Deep Reinforcement Learning (DRL). While the data are often ambiguous, high-dimensional, and complex to interpret, many dynamical systems can be effectively described by a low-dimensional set of state variables. Discovering these state variables from the data is a crucial aspect for improving the data efficiency, robustness and generalization of DRL methods, tackling the curse of dimensionality, and bringing interpretability and insights into black-box DRL. This review provides a comprehensive and complete overview of unsupervised representation learning in DRL by describing the main Deep Learning tools used for learning representations of the world, providing a systematic view of the method and principles, summarizing applications, benchmarks and evaluation strategies, and discussing open challenges and future directions.
    On the Automated Segmentation of Epicardial and Mediastinal Cardiac Adipose Tissues Using Classification Algorithms. (arXiv:2208.14352v1 [eess.IV])
    The quantification of fat depots on the surroundings of the heart is an accurate procedure for evaluating health risk factors correlated with several diseases. However, this type of evaluation is not widely employed in clinical practice due to the required human workload. This work proposes a novel technique for the automatic segmentation of cardiac fat pads. The technique is based on applying classification algorithms to the segmentation of cardiac CT images. Furthermore, we extensively evaluate the performance of several algorithms on this task and discuss which provided better predictive models. Experimental results have shown that the mean accuracy for the classification of epicardial and mediastinal fats has been 98.4% with a mean true positive rate of 96.2%. On average, the Dice similarity index, regarding the segmented patients and the ground truth, was equal to 96.8%. Therfore, our technique has achieved the most accurate results for the automatic segmentation of cardiac fats, to date.
    Automatic Feasibility Study via Data Quality Analysis for ML: A Case-Study on Label Noise. (arXiv:2010.08410v4 [cs.LG] UPDATED)
    In our experience of working with domain experts who are using today's AutoML systems, a common problem we encountered is what we call "unrealistic expectations" -- when users are facing a very challenging task with a noisy data acquisition process, while being expected to achieve startlingly high accuracy with machine learning (ML). Many of these are predestined to fail from the beginning. In traditional software engineering, this problem is addressed via a feasibility study, an indispensable step before developing any software system. In this paper, we present Snoopy, with the goal of supporting data scientists and machine learning engineers performing a systematic and theoretically founded feasibility study before building ML applications. We approach this problem by estimating the irreducible error of the underlying task, also known as the Bayes error rate (BER), which stems from data quality issues in datasets used to train or evaluate ML model artifacts. We design a practical Bayes error estimator that is compared against baseline feasibility study candidates on 6 datasets (with additional real and synthetic noise of different levels) in computer vision and natural language processing. Furthermore, by including our systematic feasibility study with additional signals into the iterative label cleaning process, we demonstrate in end-to-end experiments how users are able to save substantial labeling time and monetary efforts.
    Inferring subhalo effective density slopes from strong lensing observations with neural likelihood-ratio estimation. (arXiv:2208.13796v1 [astro-ph.CO])
    Strong gravitational lensing has emerged as a promising approach for probing dark matter models on sub-galactic scales. Recent work has proposed the subhalo effective density slope as a more reliable observable than the commonly used subhalo mass function. The subhalo effective density slope is a measurement independent of assumptions about the underlying density profile and can be inferred for individual subhalos through traditional sampling methods. To go beyond individual subhalo measurements, we leverage recent advances in machine learning and introduce a neural likelihood-ratio estimator to infer an effective density slope for populations of subhalos. We demonstrate that our method is capable of harnessing the statistical power of multiple subhalos (within and across multiple images) to distinguish between characteristics of different subhalo populations. The computational efficiency warranted by the neural likelihood-ratio estimator over traditional sampling enables statistical studies of dark matter perturbers and is particularly useful as we expect an influx of strong lensing systems from upcoming surveys.
    Identifying Latent Causal Content for Multi-Source Domain Adaptation. (arXiv:2208.14161v1 [cs.LG])
    Multi-source domain adaptation (MSDA) learns to predict the labels in target domain data, under the setting where all data from multiple source domains are labelled and the data from the target domain are unlabeled. To handle this problem, most of methods focus on learning invariant representations across domains. However, their success severely relies on the assumption that label distribution remains unchanged across domains. To mitigate it, we propose a new assumption, latent covariate shift, where the marginal distribution of the latent content variable changes across domains, and the conditional distribution of the label given the latent content remains invariant across domains. We introduce a latent style variable to complement the latent content variable forming a latent causal graph as the data and label generating process. We show that although the latent style variable is unidentifiable due to transitivity property in the latent space, the latent content variable can be identified up to simple scaling under some mild conditions. This motivates us to propose a novel method for MSDA, which learns the invariant label distribution conditional on the latent content variable, instead of learning invariant representations. Empirical evaluation on simulation and real data demonstrates the effectiveness of the proposed method, compared with many state-of-the-art methods based on invariant representation.
    A Diffusion Model Predicts 3D Shapes from 2D Microscopy Images. (arXiv:2208.14125v1 [cs.CV])
    Diffusion models are a class of generative models, showing superior performance as compared to other generative models in creating realistic images when trained on natural image datasets. We introduce DISPR, a diffusion-based model for solving the inverse problem of three-dimensional (3D) cell shape prediction from two-dimensional (2D) single cell microscopy images. Using the 2D microscopy image as a prior, DISPR is conditioned to predict realistic 3D shape reconstructions. To showcase the applicability of DISPR as a data augmentation tool in a feature-based single cell classification task, we extract morphological features from the cells grouped into six highly imbalanced classes. Adding features from predictions of DISPR to the three minority classes improved the macro F1 score from $F1_\text{macro} = 55.2 \pm 4.6\%$ to $F1_\text{macro} = 72.2 \pm 4.9\%$. With our method being the first to employ a diffusion-based model in this context, we demonstrate that diffusion models can be applied to inverse problems in 3D, and that they learn to reconstruct 3D shapes with realistic morphological features from 2D microscopy images.
    Self-Supervised Pyramid Representation Learning for Multi-Label Visual Analysis and Beyond. (arXiv:2208.14439v1 [cs.CV])
    While self-supervised learning has been shown to benefit a number of vision tasks, existing techniques mainly focus on image-level manipulation, which may not generalize well to downstream tasks at patch or pixel levels. Moreover, existing SSL methods might not sufficiently describe and associate the above representations within and across image scales. In this paper, we propose a Self-Supervised Pyramid Representation Learning (SS-PRL) framework. The proposed SS-PRL is designed to derive pyramid representations at patch levels via learning proper prototypes, with additional learners to observe and relate inherent semantic information within an image. In particular, we present a cross-scale patch-level correlation learning in SS-PRL, which allows the model to aggregate and associate information learned across patch scales. We show that, with our proposed SS-PRL for model pre-training, one can easily adapt and fine-tune the models for a variety of applications including multi-label classification, object detection, and instance segmentation.
    Deep Autoencoders for Anomaly Detection in Textured Images using CW-SSIM. (arXiv:2208.14045v1 [cs.CV])
    Detecting anomalous regions in images is a frequently encountered problem in industrial monitoring. A relevant example is the analysis of tissues and other products that in normal conditions conform to a specific texture, while defects introduce changes in the normal pattern. We address the anomaly detection problem by training a deep autoencoder, and we show that adopting a loss function based on Complex Wavelet Structural Similarity (CW-SSIM) yields superior detection performance on this type of images compared to traditional autoencoder loss functions. Our experiments on well-known anomaly detection benchmarks show that a simple model trained with this loss function can achieve comparable or superior performance to state-of-the-art methods leveraging deeper, larger and more computationally demanding neural networks.
    Weight-variant Latent Causal Models. (arXiv:2208.14153v1 [cs.LG])
    Causal representation learning exposes latent high-level causal variables behind low-level observations, which has enormous potential for a set of downstream tasks of interest. Despite this, identifying the true latent causal representation from observed data is a great challenge. In this work we focus on identifying latent causal variables. To this end, we analysis three intrinsic properties in latent space, including transitivity, permutation and scaling. We show that the transitivity severely hinders the identifiability of latent causal variables, while permutation and scaling guide the direction of identifying latent causal variable. To break the transitivity, we assume the underlying latent causal relations to be linear Gaussian models, in which the weights, mean and variance of Gaussian noise are modulated by an additionally observed variable. Under these assumptions we theoretically show that the latent causal variables can be identifiable up to trivial permutation and scaling. Built on this theoretical result, we propose a novel method, termed Structural caUsAl Variational autoEncoder, which directly learns latent causal variables, together with the mapping from the latent causal variables to the observed ones. Experimental results on synthetic and real data demonstrate the identifiable result and the ability of the proposed method for learning latent causal variables.
    Dimension Independent Data Sets Approximation and Applications to Classification. (arXiv:2208.13781v1 [cs.LG])
    We revisit the classical kernel method of approximation/interpolation theory in a very specific context motivated by the desire to obtain a robust procedure to approximate discrete data sets by (super)level sets of functions that are merely continuous at the data set arguments but are otherwise smooth. Special functions, called data signals, are defined for any given data set and are used to succesfully solve supervised classification problems in a robust way that depends continuously on the data set. The efficacy of the method is illustrated with a series of low dimensional examples and by its application to the standard benchmark high dimensional problem of MNIST digit classification.
    DR-DSGD: A Distributionally Robust Decentralized Learning Algorithm over Graphs. (arXiv:2208.13810v1 [cs.LG])
    In this paper, we propose to solve a regularized distributionally robust learning problem in the decentralized setting, taking into account the data distribution shift. By adding a Kullback-Liebler regularization function to the robust min-max optimization problem, the learning problem can be reduced to a modified robust minimization problem and solved efficiently. Leveraging the newly formulated optimization problem, we propose a robust version of Decentralized Stochastic Gradient Descent (DSGD), coined Distributionally Robust Decentralized Stochastic Gradient Descent (DR-DSGD). Under some mild assumptions and provided that the regularization parameter is larger than one, we theoretically prove that DR-DSGD achieves a convergence rate of $\mathcal{O}\left(1/\sqrt{KT} + K/T\right)$, where $K$ is the number of devices and $T$ is the number of iterations. Simulation results show that our proposed algorithm can improve the worst distribution test accuracy by up to $10\%$. Moreover, DR-DSGD is more communication-efficient than DSGD since it requires fewer communication rounds (up to $20$ times less) to achieve the same worst distribution test accuracy target. Furthermore, the conducted experiments reveal that DR-DSGD results in a fairer performance across devices in terms of test accuracy.
    Autoinverse: Uncertainty Aware Inversion of Neural Networks. (arXiv:2208.13780v1 [cs.LG])
    Neural networks are powerful surrogates for numerous forward processes. The inversion of such surrogates is extremely valuable in science and engineering. The most important property of a successful neural inverse method is the performance of its solutions when deployed in the real world, i.e., on the native forward process (and not only the learned surrogate). We propose Autoinverse, a highly automated approach for inverting neural network surrogates. Our main insight is to seek inverse solutions in the vicinity of reliable data which have been sampled form the forward process and used for training the surrogate model. Autoinverse finds such solutions by taking into account the predictive uncertainty of the surrogate and minimizing it during the inversion. Apart from high accuracy, Autoinverse enforces the feasibility of solutions, comes with embedded regularization, and is initialization free. We verify our proposed method through addressing a set of real-world problems in control, fabrication, and design.
    On the effectiveness of persistent homology. (arXiv:2206.10551v2 [math.AT] UPDATED)
    Persistent homology (PH) is one of the most popular methods in Topological Data Analysis. Even though PH has been used in many different types of applications, the reasons behind its success remain elusive; in particular, it is not known for which classes of problems it is most effective, or to what extent it can detect geometric or topological features. The goal of this work is to identify some types of problems where PH performs well or even better than other methods in data analysis. We consider three fundamental shape analysis tasks: the detection of the number of holes, curvature and convexity from 2D and 3D point clouds sampled from shapes. Experiments demonstrate that PH is successful in these tasks, outperforming several baselines, including PointNet, an architecture inspired precisely by the properties of point clouds. In addition, we observe that PH remains effective for limited computational resources and limited training data, as well as out-of-distribution test data, including various data transformations and noise. For convexity detection, we provide a theoretical guarantee that PH is effective for this task, and demonstrate the detection of a convexity measure on the FLAVIA dataset of plant leaf images.
    PGNAA Spectral Classification of Metal with Density Estimations. (arXiv:2208.13836v1 [cs.LG])
    For environmental, sustainable economic and political reasons, recycling processes are becoming increasingly important, aiming at a much higher use of secondary raw materials. Currently, for the copper and aluminium industries, no method for the non-destructive online analysis of heterogeneous materials are available. The Promt Gamma Neutron Activation Analysis (PGNAA) has the potential to overcome this challenge. A difficulty when using PGNAA for real-time classification arises from the small amount of noisy data, due to short-term measurements. In this case, classical evaluation methods using detailed peak by peak analysis fail. Therefore, we propose to view spectral data as probability distributions. Then, we can classify material using maximum log-likelihood with respect to kernel density estimation and use discrete sampling to optimize hyperparameters. For measurements of pure aluminium alloys we achieve near perfect classification of aluminium alloys under 0.25 second.
    Rosenblatt's first theorem and frugality of deep learning. (arXiv:2208.13778v1 [cs.LG])
    First Rosenblatt's theorem about omnipotence of shallow networks states that elementary perceptrons can solve any classification problem if there are no discrepancies in the training set. Minsky and Papert considered elementary perceptrons with restrictions on the neural inputs: a bounded number of connections or a relatively small diameter of the receptive field for each neuron at the hidden layer. They proved that under these constraints, an elementary perceptron cannot solve some problems, such as the connectivity of input images or the parity of pixels in them. In this note, we demonstrated first Rosenblatt's theorem at work, showed how an elementary perceptron can solve a version of the travel maze problem, and analysed the complexity of that solution. We constructed also a deep network algorithm for the same problem. It is much more efficient. The shallow network uses an exponentially large number of neurons on the hidden layer (Rosenblatt's $A$-elements), whereas for the deep network the second order polynomial complexity is sufficient. We demonstrated that for the same complex problem deep network can be much smaller and reveal a heuristic behind this effect.
    Perfusion assessment via local remote photoplethysmography (rPPG). (arXiv:2208.13840v1 [cs.CV])
    This paper presents an approach to assess the perfusion of visible human tissue from RGB video files. We propose metrics derived from remote photoplethysmography (rPPG) signals to detect whether a tissue is adequately supplied with blood. The perfusion analysis is done in three different scales, offering a flexible approach for different applications. We perform a plane-orthogonal-to-skin rPPG independently for locally defined regions of interest on each scale. From the extracted signals, we derive the signal-to-noise ratio, magnitude in the frequency domain, heart rate, perfusion index as well as correlation between specific rPPG signals in order to locally assess the perfusion of a specific region of human tissue. We show that locally resolved rPPG has a broad range of applications. As exemplary applications, we present results in intraoperative perfusion analysis and visualization during skin and organ transplantation as well as an application for liveliness assessment for the detection of presentation attacks to authentication systems.
    An Analysis of Abstracted Model-Based Reinforcement Learning. (arXiv:2208.14407v1 [cs.LG])
    Many methods for Model-based Reinforcement learning (MBRL) provide guarantees for both the accuracy of the Markov decision process (MDP) model they can deliver and the learning efficiency. At the same time, state abstraction techniques allow for a reduction of the size of an MDP while maintaining a bounded loss with respect to the original problem. It may come as a surprise, therefore, that no such guarantees are available when combining both techniques, i.e., where MBRL merely observes abstract states. Our theoretical analysis shows that abstraction can introduce a dependence between samples collected online (e.g., in the real world), which means that most results for MBRL can not be directly extended to this setting. The new results in this work show that concentration inequalities for martingales can be used to overcome this problem and allows for extending the results of algorithms such as R-MAX to the setting with abstraction. Thus producing the first performance guarantees for Abstracted RL: model-based reinforcement learning with an abstracted model.
    Modeling Spatial Trajectories using Coarse-Grained Smartphone Logs. (arXiv:2208.13775v1 [cs.LG])
    Current approaches for points-of-interest (POI) recommendation learn the preferences of a user via the standard spatial features such as the POI coordinates, the social network, etc. These models ignore a crucial aspect of spatial mobility -- every user carries their smartphones wherever they go. In addition, with growing privacy concerns, users refrain from sharing their exact geographical coordinates and their social media activity. In this paper, we present REVAMP, a sequential POI recommendation approach that utilizes the user activity on smartphone applications (or apps) to identify their mobility preferences. This work aligns with the recent psychological studies of online urban users, which show that their spatial mobility behavior is largely influenced by the activity of their smartphone apps. In addition, our proposal of coarse-grained smartphone data refers to data logs collected in a privacy-conscious manner, i.e., consisting only of (a) category of the smartphone app and (b) category of check-in location. Thus, REVAMP is not privy to precise geo-coordinates, social networks, or the specific application being accessed. Buoyed by the efficacy of self-attention models, we learn the POI preferences of a user using two forms of positional encodings -- absolute and relative -- with each extracted from the inter-check-in dynamics in the check-in sequence of a user. Extensive experiments across two large-scale datasets from China show the predictive prowess of REVAMP and its ability to predict app- and POI categories.
    Evolutionary Deep Reinforcement Learning for Dynamic Slice Management in O-RAN. (arXiv:2208.14394v1 [eess.SY])
    The next-generation wireless networks are required to satisfy a variety of services and criteria concurrently. To address upcoming strict criteria, a new open radio access network (O-RAN) with distinguishing features such as flexible design, disaggregated virtual and programmable components, and intelligent closed-loop control was developed. O-RAN slicing is being investigated as a critical strategy for ensuring network quality of service (QoS) in the face of changing circumstances. However, distinct network slices must be dynamically controlled to avoid service level agreement (SLA) variation caused by rapid changes in the environment. Therefore, this paper introduces a novel framework able to manage the network slices through provisioned resources intelligently. Due to diverse heterogeneous environments, intelligent machine learning approaches require sufficient exploration to handle the harshest situations in a wireless network and accelerate convergence. To solve this problem, a new solution is proposed based on evolutionary-based deep reinforcement learning (EDRL) to accelerate and optimize the slice management learning process in the radio access network's (RAN) intelligent controller (RIC) modules. To this end, the O-RAN slicing is represented as a Markov decision process (MDP) which is then solved optimally for resource allocation to meet service demand using the EDRL approach. In terms of reaching service demands, simulation results show that the proposed approach outperforms the DRL baseline by 62.2%.
    Deep Generative Modeling on Limited Data with Regularization by Nontransferable Pre-trained Models. (arXiv:2208.14133v1 [cs.LG])
    Deep generative models (DGMs) are data-eager. Essentially, it is because learning a complex model on limited data suffers from a large variance and easily overfits. Inspired by the \emph{bias-variance dilemma}, we propose \emph{regularized deep generative model} (Reg-DGM), which leverages a nontransferable pre-trained model to reduce the variance of generative modeling with limited data. Formally, Reg-DGM optimizes a weighted sum of a certain divergence between the data distribution and the DGM and the expectation of an energy function defined by the pre-trained model w.r.t. the DGM. Theoretically, we characterize the existence and uniqueness of the global minimum of Reg-DGM in the nonparametric setting and rigorously prove the statistical benefits of Reg-DGM w.r.t. the mean squared error and the expected risk in a simple yet representative Gaussian-fitting example. Empirically, it is quite flexible to specify the DGM and the pre-trained model in Reg-DGM. In particular, with a ResNet-18 classifier pre-trained on ImageNet and a data-dependent energy function, Reg-DGM consistently improves the generation performance of strong DGMs including StyleGAN2 and ADA on several benchmarks with limited data and achieves competitive results to the state-of-the-art methods.
    Finite Sample Identification of Bilinear Dynamical Systems. (arXiv:2208.13915v1 [cs.LG])
    Bilinear dynamical systems are ubiquitous in many different domains and they can also be used to approximate more general control-affine systems. This motivates the problem of learning bilinear systems from a single trajectory of the system's states and inputs. Under a mild marginal mean-square stability assumption, we identify how much data is needed to estimate the unknown bilinear system up to a desired accuracy with high probability. Our sample complexity and statistical error rates are optimal in terms of the trajectory length, the dimensionality of the system and the input size. Our proof technique relies on an application of martingale small-ball condition. This enables us to correctly capture the properties of the problem, specifically our error rates do not deteriorate with increasing instability. Finally, we show that numerical experiments are well-aligned with our theoretical results.
    Memory Classifiers: Two-stage Classification for Robustness in Machine Learning. (arXiv:2206.05323v2 [cs.LG] UPDATED)
    The performance of machine learning models can significantly degrade under distribution shifts of the data. We propose a new method for classification which can improve robustness to distribution shifts, by combining expert knowledge about the ``high-level" structure of the data with standard classifiers. Specifically, we introduce two-stage classifiers called memory classifiers. First, these identify prototypical data points -- memories -- to cluster the training data. This step is based on features designed with expert guidance; for instance, for image data they can be extracted using digital image processing algorithms. Then, within each cluster, we learn local classifiers based on finer discriminating features, via standard models like deep neural networks. We establish generalization bounds for memory classifiers. We illustrate in experiments that they can improve generalization and robustness to distribution shifts on image datasets. We show improvements which push beyond standard data augmentation techniques.
    Addressing Census data problems in race imputation via fully Bayesian Improved Surname Geocoding and name supplements. (arXiv:2205.06129v2 [stat.ML] UPDATED)
    Prediction of individuals' race and ethnicity plays an important role in studies of racial disparity. Bayesian Improved Surname Geocoding (BISG), which relies on detailed Census information, has emerged as a leading methodology for this prediction task. Unfortunately, BISG suffers from two data problems. First, the Census often contains zero counts for minority groups in the locations where members of those groups reside. Second, many surnames -- especially those of minorities -- are missing from the Census data. We introduce a fully Bayesian Improved Surname Geocoding (fBISG) methodology that accounts for Census measurement error by extending the naive Bayesian inference of the BISG methodology. We also use additional data on last, first, and middle names taken from the voter files of six Southern states where self-reported race is available. Our empirical validation shows that the fBISG methodology and name supplements significantly improve the accuracy of race imputation, especially for racial minorities.
    A General End-to-end Diagnosis Framework for Manufacturing Systems. (arXiv:1901.02057v2 [cs.LG] UPDATED)
    The manufacturing sector is envisioned to be heavily influenced by artificial intelligence-based technologies with the extraordinary increases in computational power and data volumes. A central challenge in manufacturing sector lies in the requirement of a general framework to ensure satisfied diagnosis and monitoring performances in different manufacturing applications. Here we propose a general data-driven, end-to-end framework for the monitoring of manufacturing systems. This framework, derived from deep learning techniques, evaluates fused sensory measurements to detect and even predict faults and wearing conditions. This work exploits the predictive power of deep learning to automatically extract hidden degradation features from noisy, time-course data. We have experimented the proposed framework on ten representative datasets drawn from a wide variety of manufacturing applications. Results reveal that the framework performs well in examined benchmark applications and can be applied in diverse contexts, indicating its potential use as a critical corner stone in smart manufacturing.
    Automated recognition of the pericardium contour on processed CT images using genetic algorithms. (arXiv:2208.14375v1 [eess.IV])
    This work proposes the use of Genetic Algorithms (GA) in tracing and recognizing the pericardium contour of the human heart using Computed Tomography (CT) images. We assume that each slice of the pericardium can be modelled by an ellipse, the parameters of which need to be optimally determined. An optimal ellipse would be one that closely follows the pericardium contour and, consequently, separates appropriately the epicardial and mediastinal fats of the human heart. Tracing and automatically identifying the pericardium contour aids in medical diagnosis. Usually, this process is done manually or not done at all due to the effort required. Besides, detecting the pericardium may improve previously proposed automated methodologies that separate the two types of fat associated to the human heart. Quantification of these fats provides important health risk marker information, as they are associated with the development of certain cardiovascular pathologies. Finally, we conclude that GA offers satisfiable solutions in a feasible amount of processing time.
    SB-SSL: Slice-Based Self-Supervised Transformers for Knee Abnormality Classification from MRI. (arXiv:2208.13923v1 [eess.IV])
    The availability of large scale data with high quality ground truth labels is a challenge when developing supervised machine learning solutions for healthcare domain. Although, the amount of digital data in clinical workflows is increasing, most of this data is distributed on clinical sites and protected to ensure patient privacy. Radiological readings and dealing with large-scale clinical data puts a significant burden on the available resources, and this is where machine learning and artificial intelligence play a pivotal role. Magnetic Resonance Imaging (MRI) for musculoskeletal (MSK) diagnosis is one example where the scans have a wealth of information, but require a significant amount of time for reading and labeling. Self-supervised learning (SSL) can be a solution for handling the lack of availability of ground truth labels, but generally requires a large amount of training data during the pretraining stage. Herein, we propose a slice-based self-supervised deep learning framework (SB-SSL), a novel slice-based paradigm for classifying abnormality using knee MRI scans. We show that for a limited number of cases (<1000), our proposed framework is capable to identify anterior cruciate ligament tear with an accuracy of 89.17% and an AUC of 0.954, outperforming state-of-the-art without usage of external data during pretraining. This demonstrates that our proposed framework is suited for SSL in the limited data regime.
    Learned k-NN Distance Estimation. (arXiv:2208.14210v1 [cs.DB])
    Big data mining is well known to be an important task for data science, because it can provide useful observations and new knowledge hidden in given large datasets. Proximity-based data analysis is particularly utilized in many real-life applications. In such analysis, the distances to k nearest neighbors are usually employed, thus its main bottleneck is derived from data retrieval. Much efforts have been made to improve the efficiency of these analyses. However, they still incur large costs, because they essentially need many data accesses. To avoid this issue, we propose a machine-learning technique that quickly and accurately estimates the k-NN distances (i.e., distances to the k nearest neighbors) of a given query. We train a fully connected neural network model and utilize pivots to achieve accurate estimation. Our model is designed to have useful advantages: it infers distances to the k-NNs at a time, its inference time is O(1) (no data accesses are incurred), but it keeps high accuracy. Our experimental results and case studies on real datasets demonstrate the efficiency and effectiveness of our solution.
    Discriminative Learning of Similarity and Group Equivariant Representations. (arXiv:1808.10078v2 [stat.ML] UPDATED)
    One of the most fundamental problems in machine learning is to compare examples: Given a pair of objects we want to return a value which indicates degree of (dis)similarity. Similarity is often task specific, and pre-defined distances can perform poorly, leading to work in metric learning. However, being able to learn a similarity-sensitive distance function also presupposes access to a rich, discriminative representation for the objects at hand. In this dissertation we present contributions towards both ends. In the first part of the thesis, assuming good representations for the data, we present a formulation for metric learning that makes a more direct attempt to optimize for the k-NN accuracy as compared to prior work. We also present extensions of this formulation to metric learning for kNN regression, asymmetric similarity learning and discriminative learning of Hamming distance. In the second part, we consider a situation where we are on a limited computational budget i.e. optimizing over a space of possible metrics would be infeasible, but access to a label aware distance metric is still desirable. We present a simple, and computationally inexpensive approach for estimating a well motivated metric that relies only on gradient estimates, discussing theoretical and experimental results. In the final part, we address representational issues, considering group equivariant convolutional neural networks (GCNNs). Equivariance to symmetry transformations is explicitly encoded in GCNNs; a classical CNN being the simplest example. In particular, we present a SO(3)-equivariant neural network architecture for spherical data, that operates entirely in Fourier space, while also providing a formalism for the design of fully Fourier neural networks that are equivariant to the action of any continuous compact group.
    Airway Tree Modeling Using Dual-channel 3D UNet 3+ with Vesselness Prior. (arXiv:2208.13969v1 [eess.IV])
    The lung airway tree modeling is essential to work for the diagnosis of pulmonary diseases, especially for X-Ray computed tomography (CT). The airway tree modeling on CT images can provide the experts with 3-dimension measurements like wall thickness, etc. This information can tremendously aid the diagnosis of pulmonary diseases like chronic obstructive pulmonary disease [1-4]. Many scholars have attempted various ways to model the lung airway tree, which can be split into two major categories based on its nature. Namely, the model-based approach and the deep learning approach. The performance of a typical model-based approach usually depends on the manual tuning of the model parameter, which can be its advantages and disadvantages. The advantage is its don't require a large amount of training data which can be beneficial for a small dataset like medical imaging. On the other hand, the performance of model-based may be a misconcep-tion [5,6]. In recent years, deep learning has achieved good results in the field of medical image processing, and many scholars have used UNet-based methods in medical image segmentation [7-11]. Among all the variation of UNet, the UNet 3+ [11] have relatively good result compare to the rest of the variation of UNet. Therefor to further improve the accuracy of lung airway tree modeling, this study combines the Frangi filter [5] with UNet 3+ [11] to develop a dual-channel 3D UNet 3+. The Frangi filter is used to extracting vessel-like feature. The vessel-like feature then used as input to guide the dual-channel UNet 3+ training and testing procedures.
    A Self-supervised Riemannian GNN with Time Varying Curvature for Temporal Graph Learning. (arXiv:2208.14073v1 [cs.LG])
    Representation learning on temporal graphs has drawn considerable research attention owing to its fundamental importance in a wide spectrum of real-world applications. Though a number of studies succeed in obtaining time-dependent representations, it still faces significant challenges. On the one hand, most of the existing methods restrict the embedding space with a certain curvature. However, the underlying geometry in fact shifts among the positive curvature hyperspherical, zero curvature Euclidean and negative curvature hyperbolic spaces in the evolvement over time. On the other hand, these methods usually require abundant labels to learn temporal representations, and thereby notably limit their wide use in the unlabeled graphs of the real applications. To bridge this gap, we make the first attempt to study the problem of self-supervised temporal graph representation learning in the general Riemannian space, supporting the time-varying curvature to shift among hyperspherical, Euclidean and hyperbolic spaces. In this paper, we present a novel self-supervised Riemannian graph neural network (SelfRGNN). Specifically, we design a curvature-varying Riemannian GNN with a theoretically grounded time encoding, and formulate a functional curvature over time to model the evolvement shifting among the positive, zero and negative curvature spaces. To enable the self-supervised learning, we propose a novel reweighting self-contrastive approach, exploring the Riemannian space itself without augmentation, and propose an edge-based self-supervised curvature learning with the Ricci curvature. Extensive experiments show the superiority of SelfRGNN, and moreover, the case study shows the time-varying curvature of temporal graph in reality.
    A Survey on Deep Graph Generation: Methods and Applications. (arXiv:2203.06714v2 [cs.LG] UPDATED)
    Graphs are ubiquitous in encoding relational information of real-world objects in many domains. Graph generation, whose purpose is to generate new graphs from a distribution similar to the observed graphs, has received increasing attention thanks to the recent advances of deep learning models. In this paper, we conduct a comprehensive review on the existing literature of graph generation from a variety of emerging methods to its wide application areas. Specifically, we first formulate the problem of deep graph generation and discuss its difference with several related graph learning tasks. Secondly, we divide the state-of-the-art methods into three categories based on model architectures and summarize their generation strategies. Thirdly, we introduce three key application areas of deep graph generation. Lastly, we highlight challenges and opportunities in the future study of deep graph generation.
    Symmetric Pruning in Quantum Neural Networks. (arXiv:2208.14057v1 [quant-ph])
    Many fundamental properties of a quantum system are captured by its Hamiltonian and ground state. Despite the significance of ground states preparation (GSP), this task is classically intractable for large-scale Hamiltonians. Quantum neural networks (QNNs), which exert the power of modern quantum machines, have emerged as a leading protocol to conquer this issue. As such, how to enhance the performance of QNNs becomes a crucial topic in GSP. Empirical evidence showed that QNNs with handcraft symmetric ansatzes generally experience better trainability than those with asymmetric ansatzes, while theoretical explanations have not been explored. To fill this knowledge gap, here we propose the effective quantum neural tangent kernel (EQNTK) and connect this concept with over-parameterization theory to quantify the convergence of QNNs towards the global optima. We uncover that the advance of symmetric ansatzes attributes to their large EQNTK value with low effective dimension, which requests few parameters and quantum circuit depth to reach the over-parameterization regime permitting a benign loss landscape and fast convergence. Guided by EQNTK, we further devise a symmetric pruning (SP) scheme to automatically tailor a symmetric ansatz from an over-parameterized and asymmetric one to greatly improve the performance of QNNs when the explicit symmetry information of Hamiltonian is unavailable. Extensive numerical simulations are conducted to validate the analytical results of EQNTK and the effectiveness of SP.
    Fast Bayesian Optimization of Needle-in-a-Haystack Problems using Zooming Memory-Based Initialization. (arXiv:2208.13771v1 [cs.LG])
    Needle-in-a-Haystack problems exist across a wide range of applications including rare disease prediction, ecological resource management, fraud detection, and material property optimization. A Needle-in-a-Haystack problem arises when there is an extreme imbalance of optimum conditions relative to the size of the dataset. For example, only 0.82% out of 146k total materials in the open-access Materials Project database have a negative Poisson's ratio. However, current state-of-the-art optimization algorithms are not designed with the capabilities to find solutions to these challenging multidimensional Needle-in-a-Haystack problems, resulting in slow convergence to a global optimum or pigeonholing into a local minimum. In this paper, we present a Zooming Memory-Based Initialization algorithm, entitled ZoMBI, that builds on conventional Bayesian optimization principles to quickly and efficiently optimize Needle-in-a-Haystack problems in both less time and fewer experiments by addressing the common convergence and pigeonholing issues. ZoMBI actively extracts knowledge from the previously best-performing evaluated experiments to iteratively zoom in the sampling search bounds towards the global optimum "needle" and then prunes the memory of low-performing historical experiments to accelerate compute times. We validate the algorithm's performance on two real-world 5-dimensional Needle-in-a-Haystack material property optimization datasets: discovery of auxetic Poisson's ratio materials and discovery of high thermoelectric figure of merit materials. The ZoMBI algorithm demonstrates compute time speed-ups of 400x compared to traditional Bayesian optimization as well as efficiently discovering materials in under 100 experiments that are up to 3x more highly optimized than those discovered by current state-of-the-art algorithms.
    Adversarial Scratches: Deployable Attacks to CNN Classifiers. (arXiv:2204.09397v2 [cs.LG] UPDATED)
    A growing body of work has shown that deep neural networks are susceptible to adversarial examples. These take the form of small perturbations applied to the model's input which lead to incorrect predictions. Unfortunately, most literature focuses on visually imperceivable perturbations to be applied to digital images that often are, by design, impossible to be deployed to physical targets. We present Adversarial Scratches: a novel L0 black-box attack, which takes the form of scratches in images, and which possesses much greater deployability than other state-of-the-art attacks. Adversarial Scratches leverage B\'ezier Curves to reduce the dimension of the search space and possibly constrain the attack to a specific location. We test Adversarial Scratches in several scenarios, including a publicly available API and images of traffic signs. Results show that, often, our attack achieves higher fooling rate than other deployable state-of-the-art methods, while requiring significantly fewer queries and modifying very few pixels.
    Expressions Causing Differences in Emotion Recognition in Social Networking Service Documents. (arXiv:2208.14244v1 [cs.CL])
    It is often difficult to correctly infer a writer's emotion from text exchanged online, and differences in recognition between writers and readers can be problematic. In this paper, we propose a new framework for detecting sentences that create differences in emotion recognition between the writer and the reader and for detecting the kinds of expressions that cause such differences. The proposed framework consists of a bidirectional encoder representations from transformers (BERT)-based detector that detects sentences causing differences in emotion recognition and an analysis that acquires expressions that characteristically appear in such sentences. The detector, based on a Japanese SNS-document dataset with emotion labels annotated by both the writer and three readers of the social networking service (SNS) documents, detected "hidden-anger sentences" with AUC = 0.772; these sentences gave rise to differences in the recognition of anger. Because SNS documents contain many sentences whose meaning is extremely difficult to interpret, by analyzing the sentences detected by this detector, we obtained several expressions that appear characteristically in hidden-anger sentences. The detected sentences and expressions do not convey anger explicitly, and it is difficult to infer the writer's anger, but if the implicit anger is pointed out, it becomes possible to guess why the writer is angry. Put into practical use, this framework would likely have the ability to mitigate problems based on misunderstandings.
    Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization. (arXiv:2103.17182v5 [cs.LG] UPDATED)
    It is well-known that stochastic gradient noise (SGN) acts as implicit regularization for deep learning and is essentially important for both optimization and generalization of deep networks. Some works attempted to artificially simulate SGN by injecting random noise to improve deep learning. However, it turned out that the injected simple random noise cannot work as well as SGN, which is anisotropic and parameter-dependent. For simulating SGN at low computational costs and without changing the learning rate or batch size, we propose the Positive-Negative Momentum (PNM) approach that is a powerful alternative to conventional Momentum in classic optimizers. The introduced PNM method maintains two approximate independent momentum terms. Then, we can control the magnitude of SGN explicitly by adjusting the momentum difference. We theoretically prove the convergence guarantee and the generalization advantage of PNM over Stochastic Gradient Descent (SGD). By incorporating PNM into the two conventional optimizers, SGD with Momentum and Adam, our extensive experiments empirically verified the significant advantage of the PNM-based variants over the corresponding conventional Momentum-based optimizers.
    Towards Adversarial Purification using Denoising AutoEncoders. (arXiv:2208.13838v1 [cs.LG])
    With the rapid advancement and increased use of deep learning models in image identification, security becomes a major concern to their deployment in safety-critical systems. Since the accuracy and robustness of deep learning models are primarily attributed from the purity of the training samples, therefore the deep learning architectures are often susceptible to adversarial attacks. Adversarial attacks are often obtained by making subtle perturbations to normal images, which are mostly imperceptible to humans, but can seriously confuse the state-of-the-art machine learning models. We propose a framework, named APuDAE, leveraging Denoising AutoEncoders (DAEs) to purify these samples by using them in an adaptive way and thus improve the classification accuracy of the target classifier networks that have been attacked. We also show how using DAEs adaptively instead of using them directly, improves classification accuracy further and is more robust to the possibility of designing adaptive attacks to fool them. We demonstrate our results over MNIST, CIFAR-10, ImageNet dataset and show how our framework (APuDAE) provides comparable and in most cases better performance to the baseline methods in purifying adversaries. We also design adaptive attack specifically designed to attack our purifying model and demonstrate how our defense is robust to that.
    Differentiable Programming for Earth System Modeling. (arXiv:2208.13825v1 [cs.LG])
    Earth System Models (ESMs) are the primary tools for investigating future Earth system states at time scales from decades to centuries, especially in response to anthropogenic greenhouse gas release. State-of-the-art ESMs can reproduce the observational global mean temperature anomalies of the last 150 years. Nevertheless, ESMs need further improvements, most importantly regarding (i) the large spread in their estimates of climate sensitivity, i.e., the temperature response to increases in atmospheric greenhouse gases, (ii) the modeled spatial patterns of key variables such as temperature and precipitation, (iii) their representation of extreme weather events, and (iv) their representation of multistable Earth system components and their ability to predict associated abrupt transitions. Here, we argue that making ESMs automatically differentiable has huge potential to advance ESMs, especially with respect to these key shortcomings. First, automatic differentiability would allow objective calibration of ESMs, i.e., the selection of optimal values with respect to a cost function for a large number of free parameters, which are currently tuned mostly manually. Second, recent advances in Machine Learning (ML) and in the amount, accuracy, and resolution of observational data promise to be helpful with at least some of the above aspects because ML may be used to incorporate additional information from observations into ESMs. Automatic differentiability is an essential ingredient in the construction of such hybrid models, combining process-based ESMs with ML components. We document recent work showcasing the potential of automatic differentiation for a new generation of substantially improved, data-informed ESMs.
    Persistence Initialization: A novel adaptation of the Transformer architecture for Time Series Forecasting. (arXiv:2208.14236v1 [cs.LG])
    Time series forecasting is an important problem, with many real world applications. Ensembles of deep neural networks have recently achieved impressive forecasting accuracy, but such large ensembles are impractical in many real world settings. Transformer models been successfully applied to a diverse set of challenging problems. We propose a novel adaptation of the original Transformer architecture focusing on the task of time series forecasting, called Persistence Initialization. The model is initialized as a naive persistence model by using a multiplicative gating mechanism combined with a residual skip connection. We use a decoder Transformer with ReZero normalization and Rotary positional encodings, but the adaptation is applicable to any auto-regressive neural network model. We evaluate our proposed architecture on the challenging M4 dataset, achieving competitive performance compared to ensemble based methods. We also compare against existing recently proposed Transformer models for time series forecasting, showing superior performance on the M4 dataset. Extensive ablation studies show that Persistence Initialization leads to better performance and faster convergence. As the size of the model increases, only the models with our proposed adaptation gain in performance. We also perform an additional ablation study to determine the importance of the choice of normalization and positional encoding, and find both the use of Rotary encodings and ReZero normalization to be essential for good forecasting performance.
    MRL: Learning to Mix with Attention and Convolutions. (arXiv:2208.13975v1 [cs.CV])
    In this paper, we present a new neural architectural block for the vision domain, named Mixing Regionally and Locally (MRL), developed with the aim of effectively and efficiently mixing the provided input features. We bifurcate the input feature mixing task as mixing at a regional and local scale. To achieve an efficient mix, we exploit the domain-wide receptive field provided by self-attention for regional-scale mixing and convolutional kernels restricted to local scale for local-scale mixing. More specifically, our proposed method mixes regional features associated with local features within a defined region, followed by a local-scale features mix augmented by regional features. Experiments show that this hybridization of self-attention and convolution brings improved capacity, generalization (right inductive bias), and efficiency. Under similar network settings, MRL outperforms or is at par with its counterparts in classification, object detection, and segmentation tasks. We also show that our MRL-based network architecture achieves state-of-the-art performance for H&E histology datasets. We achieved DICE of 0.843, 0.855, and 0.892 for Kumar, CoNSep, and CPM-17 datasets, respectively, while highlighting the versatility offered by the MRL framework by incorporating layers like group convolutions to improve dataset-specific generalization.  ( 2 min )
    Using Taylor-Approximated Gradients to Improve the Frank-Wolfe Method for Empirical Risk Minimization. (arXiv:2208.13933v1 [cs.LG])
    The Frank-Wolfe method has become increasingly useful in statistical and machine learning applications, due to the structure-inducing properties of the iterates, and especially in settings where linear minimization over the feasible set is more computationally efficient than projection. In the setting of Empirical Risk Minimization -- one of the fundamental optimization problems in statistical and machine learning -- the computational effectiveness of Frank-Wolfe methods typically grows linearly in the number of data observations $n$. This is in stark contrast to the case for typical stochastic projection methods. In order to reduce this dependence on $n$, we look to second-order smoothness of typical smooth loss functions (least squares loss and logistic loss, for example) and we propose amending the Frank-Wolfe method with Taylor series-approximated gradients, including variants for both deterministic and stochastic settings. Compared with current state-of-the-art methods in the regime where the optimality tolerance $\varepsilon$ is sufficiently small, our methods are able to simultaneously reduce the dependence on large $n$ while obtaining optimal convergence rates of Frank-Wolfe methods, in both the convex and non-convex settings. We also propose a novel adaptive step-size approach for which we have computational guarantees. Last of all, we present computational experiments which show that our methods exhibit very significant speed-ups over existing methods on real-world datasets for both convex and non-convex binary classification problems.  ( 3 min )
    Data Isotopes for Data Provenance in DNNs. (arXiv:2208.13893v1 [cs.CR])
    Today, creators of data-hungry deep neural networks (DNNs) scour the Internet for training fodder, leaving users with little control over or knowledge of when their data is appropriated for model training. To empower users to counteract unwanted data use, we design, implement and evaluate a practical system that enables users to detect if their data was used to train an DNN model. We show how users can create special data points we call isotopes, which introduce "spurious features" into DNNs during training. With only query access to a trained model and no knowledge of the model training process, or control of the data labels, a user can apply statistical hypothesis testing to detect if a model has learned the spurious features associated with their isotopes by training on the user's data. This effectively turns DNNs' vulnerability to memorization and spurious correlations into a tool for data provenance. Our results confirm efficacy in multiple settings, detecting and distinguishing between hundreds of isotopes with high accuracy. We further show that our system works on public ML-as-a-service platforms and larger models such as ImageNet, can use physical objects instead of digital marks, and remains generally robust against several adaptive countermeasures.  ( 2 min )
    A Deep Neural Networks ensemble workflow from hyperparameter search to inference leveraging GPU clusters. (arXiv:2208.14046v1 [cs.LG])
    Automated Machine Learning with ensembling (or AutoML with ensembling) seeks to automatically build ensembles of Deep Neural Networks (DNNs) to achieve qualitative predictions. Ensemble of DNNs are well known to avoid over-fitting but they are memory and time consuming approaches. Therefore, an ideal AutoML would produce in one single run time different ensembles regarding accuracy and inference speed. While previous works on AutoML focus to search for the best model to maximize its generalization ability, we rather propose a new AutoML to build a larger library of accurate and diverse individual models to then construct ensembles. First, our extensive benchmarks show asynchronous Hyperband is an efficient and robust way to build a large number of diverse models to combine them. Then, a new ensemble selection method based on a multi-objective greedy algorithm is proposed to generate accurate ensembles by controlling their computing cost. Finally, we propose a novel algorithm to optimize the inference of the DNNs ensemble in a GPU cluster based on allocation optimization. The produced AutoML with ensemble method shows robust results on two datasets using efficiently GPU clusters during both the training phase and the inference phase.  ( 3 min )
    Exploring and Evaluating Personalized Models for Code Generation. (arXiv:2208.13928v1 [cs.SE])
    Large Transformer models achieved the state-of-the-art status for Natural Language Understanding tasks and are increasingly becoming the baseline model architecture for modeling source code. Transformers are usually pre-trained on large unsupervised corpora, learning token representations and transformations relevant to modeling generally available text, and are then fine-tuned on a particular downstream task of interest. While fine-tuning is a tried-and-true method for adapting a model to a new domain -- for example, question-answering on a given topic -- generalization remains an on-going challenge. In this paper, we explore and evaluate transformer model fine-tuning for personalization. In the context of generating unit tests for Java methods, we evaluate learning to personalize to a specific software project using several personalization techniques. We consider three key approaches: (i) custom fine-tuning, which allows all the model parameters to be tuned; (ii) lightweight fine-tuning, which freezes most of the model's parameters, allowing tuning of the token embeddings and softmax layer only or the final layer alone; (iii) prefix tuning, which keeps model parameters frozen, but optimizes a small project-specific prefix vector. Each of these techniques offers a trade-off in total compute cost and predictive performance, which we evaluate by code and task-specific metrics, training time, and total computational operations. We compare these fine-tuning strategies for code generation and discuss the potential generalization and cost benefits of each in various deployment scenarios.  ( 3 min )
    Prediction of Red Wine Quality Using One-dimensional Convolutional Neural Networks. (arXiv:2208.14008v1 [cs.LG])
    As an alcoholic beverage, wine has remained prevalent for thousands of years, and the quality assessment of wines has been significant in wine production and trade. Scholars have proposed various deep learning and machine learning algorithms for wine quality prediction, such as Support vector machine (SVM), Random Forest (RF), K-nearest neighbors (KNN), Deep neural network (DNN), and Logistic regression (LR). However, these methods ignore the inner relationship between the physical and chemical properties of the wine, for example, the correlations between pH values, fixed acidity, citric acid, and so on. To fill the gap, this paper conducts the Pearson correlation analysis, PCA analysis, and Shapiro-Wilk test on those properties and incorporates 1D-CNN architecture to capture the correlations among neighboring features. In addition, it implemented dropout and batch normalization techniques to improve the robustness of the proposed model. Massive experiments have shown that our method can outperform baseline approaches in wine quality prediction. Moreover, ablation experiments also demonstrate the effectiveness of incorporating the 1-D CNN module, Dropout, and normalization techniques.  ( 2 min )
    Virtual impactor-based label-free bio-aerosol detection using holography and deep learning. (arXiv:2208.13979v1 [physics.app-ph])
    Exposure to bio-aerosols such as mold spores and pollen can lead to adverse health effects. There is a need for a portable and cost-effective device for long-term monitoring and quantification of various bio-aerosols. To address this need, we present a mobile and cost-effective label-free bio-aerosol sensor that takes holographic images of flowing particulate matter concentrated by a virtual impactor, which selectively slows down and guides particles larger than ~6 microns to fly through an imaging window. The flowing particles are illuminated by a pulsed laser diode, casting their inline holograms on a CMOS image sensor in a lens-free mobile imaging device. The illumination contains three short pulses with a negligible shift of the flowing particle within one pulse, and triplicate holograms of the same particle are recorded at a single frame before it exits the imaging field-of-view, revealing different perspectives of each particle. The particles within the virtual impactor are localized through a differential detection scheme, and a deep neural network classifies the aerosol type in a label-free manner, based on the acquired holographic images. We demonstrated the success of this mobile bio-aerosol detector with a virtual impactor using different types of pollen (i.e., bermuda, elm, oak, pine, sycamore, and wheat) and achieved a blind classification accuracy of 92.91%. This mobile and cost-effective device weighs ~700 g and can be used for label-free sensing and quantification of various bio-aerosols over extended periods since it is based on a cartridge-free virtual impactor that does not capture or immobilize particulate matter.  ( 3 min )
    EchoGNN: Explainable Ejection Fraction Estimation with Graph Neural Networks. (arXiv:2208.14003v1 [eess.IV])
    Ejection fraction (EF) is a key indicator of cardiac function, allowing identification of patients prone to heart dysfunctions such as heart failure. EF is estimated from cardiac ultrasound videos known as echocardiograms (echo) by manually tracing the left ventricle and estimating its volume on certain frames. These estimations exhibit high inter-observer variability due to the manual process and varying video quality. Such sources of inaccuracy and the need for rapid assessment necessitate reliable and explainable machine learning techniques. In this work, we introduce EchoGNN, a model based on graph neural networks (GNNs) to estimate EF from echo videos. Our model first infers a latent echo-graph from the frames of one or multiple echo cine series. It then estimates weights over nodes and edges of this graph, indicating the importance of individual frames that aid EF estimation. A GNN regressor uses this weighted graph to predict EF. We show, qualitatively and quantitatively, that the learned graph weights provide explainability through identification of critical frames for EF estimation, which can be used to determine when human intervention is required. On EchoNet-Dynamic public EF dataset, EchoGNN achieves EF prediction performance that is on par with state of the art and provides explainability, which is crucial given the high inter-observer variability inherent in this task.  ( 3 min )
  • Open

    Addressing Census data problems in race imputation via fully Bayesian Improved Surname Geocoding and name supplements. (arXiv:2205.06129v2 [stat.ML] UPDATED)
    Prediction of individuals' race and ethnicity plays an important role in studies of racial disparity. Bayesian Improved Surname Geocoding (BISG), which relies on detailed Census information, has emerged as a leading methodology for this prediction task. Unfortunately, BISG suffers from two data problems. First, the Census often contains zero counts for minority groups in the locations where members of those groups reside. Second, many surnames -- especially those of minorities -- are missing from the Census data. We introduce a fully Bayesian Improved Surname Geocoding (fBISG) methodology that accounts for Census measurement error by extending the naive Bayesian inference of the BISG methodology. We also use additional data on last, first, and middle names taken from the voter files of six Southern states where self-reported race is available. Our empirical validation shows that the fBISG methodology and name supplements significantly improve the accuracy of race imputation, especially for racial minorities.
    Implicit Regularization and Convergence for Weight Normalization. (arXiv:1911.07956v5 [cs.LG] UPDATED)
    Normalization methods such as batch [Ioffe and Szegedy, 2015], weight [Salimansand Kingma, 2016], instance [Ulyanov et al., 2016], and layer normalization [Baet al., 2016] have been widely used in modern machine learning. Here, we study the weight normalization (WN) method [Salimans and Kingma, 2016] and a variant called reparametrized projected gradient descent (rPGD) for overparametrized least-squares regression. WN and rPGD reparametrize the weights with a scale g and a unit vector w and thus the objective function becomes non-convex. We show that this non-convex formulation has beneficial regularization effects compared to gradient descent on the original objective. These methods adaptively regularize the weights and converge close to the minimum l2 norm solution, even for initializations far from zero. For certain stepsizes of g and w , we show that they can converge close to the minimum norm solution. This is different from the behavior of gradient descent, which converges to the minimum norm solution only when started at a point in the range space of the feature matrix, and is thus more sensitive to initialization.
    Tree-based Subgroup Discovery In Electronic Health Records: Heterogeneity of Treatment Effects for DTG-containing Therapies. (arXiv:2208.14329v1 [stat.ME])
    The rich longitudinal individual level data available from electronic health records (EHRs) can be used to examine treatment effect heterogeneity. However, estimating treatment effects using EHR data poses several challenges, including time-varying confounding, repeated and temporally non-aligned measurements of covariates, treatment assignments and outcomes, and loss-to-follow-up due to dropout. Here, we develop the Subgroup Discovery for Longitudinal Data (SDLD) algorithm, a tree-based algorithm for discovering subgroups with heterogeneous treatment effects using longitudinal data by combining the generalized interaction tree algorithm, a general data-driven method for subgroup discovery, with longitudinal targeted maximum likelihood estimation. We apply the algorithm to EHR data to discover subgroups of people living with human immunodeficiency virus (HIV) who are at higher risk of weight gain when receiving dolutegravir-containing antiretroviral therapies (ARTs) versus when receiving non dolutegravir-containing ARTs.
    A deep learning-based remaining useful life prediction approach for bearings. (arXiv:1812.03315v2 [cs.LG] UPDATED)
    In industrial applications, nearly half the failures of motors are caused by the degradation of rolling element bearings (REBs). Therefore, accurately estimating the remaining useful life (RUL) for REBs are of crucial importance to ensure the reliability and safety of mechanical systems. To tackle this challenge, model-based approaches are often limited by the complexity of mathematical modeling. Conventional data-driven approaches, on the other hand, require massive efforts to extract the degradation features and construct health index. In this paper, a novel online data-driven framework is proposed to exploit the adoption of deep convolutional neural networks (CNN) in predicting the RUL of bearings. More concretely, the raw vibrations of training bearings are first processed using the Hilbert-Huang transform (HHT) and a novel nonlinear degradation indicator is constructed as the label for learning. The CNN is then employed to identify the hidden pattern between the extracted degradation indicator and the vibration of training bearings, which makes it possible to estimate the degradation of the test bearings automatically. Finally, testing bearings' RULs are predicted by using a $\epsilon$-support vector regression model. The superior performance of the proposed RUL estimation framework, compared with the state-of-the-art approaches, is demonstrated through the experimental results. The generality of the proposed CNN model is also validated by transferring to bearings undergoing different operating conditions.
    AutoWS-Bench-101: Benchmarking Automated Weak Supervision with 100 Labels. (arXiv:2208.14362v1 [cs.LG])
    Weak supervision (WS) is a powerful method to build labeled datasets for training supervised models in the face of little-to-no labeled data. It replaces hand-labeling data with aggregating multiple noisy-but-cheap label estimates expressed by labeling functions (LFs). While it has been used successfully in many domains, weak supervision's application scope is limited by the difficulty of constructing labeling functions for domains with complex or high-dimensional features. To address this, a handful of methods have proposed automating the LF design process using a small set of ground truth labels. In this work, we introduce AutoWS-Bench-101: a framework for evaluating automated WS (AutoWS) techniques in challenging WS settings -- a set of diverse application domains on which it has been previously difficult or impossible to apply traditional WS techniques. While AutoWS is a promising direction toward expanding the application-scope of WS, the emergence of powerful methods such as zero-shot foundation models reveals the need to understand how AutoWS techniques compare or cooperate with modern zero-shot or few-shot learners. This informs the central question of AutoWS-Bench-101: given an initial set of 100 labels for each task, we ask whether a practitioner should use an AutoWS method to generate additional labels or use some simpler baseline, such as zero-shot predictions from a foundation model or supervised learning. We observe that in many settings, it is necessary for AutoWS methods to incorporate signal from foundation models if they are to outperform simple few-shot baselines, and AutoWS-Bench-101 promotes future research in this direction. We conclude with a thorough ablation study of AutoWS methods.
    FDB: Fraud Dataset Benchmark. (arXiv:2208.14417v1 [cs.LG])
    Standardized datasets and benchmarks have spurred innovations in computer vision, natural language processing, multi-modal and tabular settings. We note that, as compared to other well researched fields fraud detection has numerous differences. The differences include a high class imbalance, diverse feature types, frequently changing fraud patterns, and adversarial nature of the problem. Due to these differences, the modeling approaches that are designed for other classification tasks may not work well for the fraud detection. We introduce Fraud Dataset Benchmark (FDB), a compilation of publicly available datasets catered to fraud detection. FDB comprises variety of fraud related tasks, ranging from identifying fraudulent card-not-present transactions, detecting bot attacks, classifying malicious URLs, predicting risk of loan to content moderation. The Python based library from FDB provides consistent API for data loading with standardized training and testing splits. For reference, we also provide baseline evaluations of different modeling approaches on FDB. Considering the increasing popularity of Automated Machine Learning (AutoML) for various research and business problems, we used AutoML frameworks for our baseline evaluations. For fraud prevention, the organizations that operate with limited resources and lack ML expertise often hire a team of investigators, use blocklists and manual rules, all of which are inefficient and do not scale well. Such organizations can benefit from AutoML solutions that are easy to deploy in production and pass the bar of fraud prevention requirements. We hope that FDB helps in the development of customized fraud detection techniques catered to different fraud modus operandi (MOs) as well as in the improvement of AutoML systems that can work well for all datasets in the benchmark.
    Optimal Rates for Distributed Learning with Random Features. (arXiv:1906.03155v4 [cs.LG] UPDATED)
    In recent studies, the generalization properties for distributed learning and random features assumed the existence of the target concept over the hypothesis space. However, this strict condition is not applicable to the more common non-attainable case. In this paper, using refined proof techniques, we first extend the optimal rates for distributed learning with random features to the non-attainable case. Then, we reduce the number of required random features via data-dependent generating strategy, and improve the allowed number of partitions with additional unlabeled data. Theoretical analysis shows these techniques remarkably reduce computational cost while preserving the optimal generalization accuracy under standard assumptions. Finally, we conduct several experiments on both simulated and real-world datasets, and the empirical results validate our theoretical findings.
    Principal Component Analysis based frameworks for efficient missing data imputation algorithms. (arXiv:2205.15150v2 [cs.LG] UPDATED)
    Missing data is a commonly occurring problem in practice. Many imputation methods have been developed to fill in the missing entries. However, not all of them can scale to high-dimensional data, especially the multiple imputation techniques. Meanwhile, the data nowadays tends toward high-dimensional. Therefore, in this work, we propose Principal Component Analysis Imputation (PCAI), a simple but versatile framework based on Principal Component Analysis (PCA) to speed up the imputation process and alleviate memory issues of many available imputation techniques, without sacrificing the imputation quality in term of MSE. In addition, the frameworks can be used even when some or all of the missing features are categorical, or when the number of missing features is large. Next, we introduce PCA Imputation - Classification (PIC), an application of PCAI for classification problems with some adjustments. We validate our approach by experiments on various scenarios, which shows that PCAI and PIC can work with various imputation algorithms, including the state-of-the-art ones and improve the imputation speed significantly, while achieving competitive mean square error/classification accuracy compared to direct imputation (i.e., impute directly on the missing data).
    Using Taylor-Approximated Gradients to Improve the Frank-Wolfe Method for Empirical Risk Minimization. (arXiv:2208.13933v1 [cs.LG])
    The Frank-Wolfe method has become increasingly useful in statistical and machine learning applications, due to the structure-inducing properties of the iterates, and especially in settings where linear minimization over the feasible set is more computationally efficient than projection. In the setting of Empirical Risk Minimization -- one of the fundamental optimization problems in statistical and machine learning -- the computational effectiveness of Frank-Wolfe methods typically grows linearly in the number of data observations $n$. This is in stark contrast to the case for typical stochastic projection methods. In order to reduce this dependence on $n$, we look to second-order smoothness of typical smooth loss functions (least squares loss and logistic loss, for example) and we propose amending the Frank-Wolfe method with Taylor series-approximated gradients, including variants for both deterministic and stochastic settings. Compared with current state-of-the-art methods in the regime where the optimality tolerance $\varepsilon$ is sufficiently small, our methods are able to simultaneously reduce the dependence on large $n$ while obtaining optimal convergence rates of Frank-Wolfe methods, in both the convex and non-convex settings. We also propose a novel adaptive step-size approach for which we have computational guarantees. Last of all, we present computational experiments which show that our methods exhibit very significant speed-ups over existing methods on real-world datasets for both convex and non-convex binary classification problems.
    Block Mean Approximation for Efficient Second Order Optimization. (arXiv:1804.05484v4 [cs.LG] UPDATED)
    Advanced optimization algorithms such as Newton method and AdaGrad benefit from second order derivative or second order statistics to achieve better descent directions and faster convergence rates. At their heart, such algorithms need to compute the inverse or inverse square root of a matrix whose size is quadratic of the dimensionality of the search space. For high dimensional search spaces, the matrix inversion or inversion of square root becomes overwhelming which in turn demands for approximate methods. In this work, we propose a new matrix approximation method which divides a matrix into blocks and represents each block by one or two numbers. The method allows efficient computation of matrix inverse and inverse square root. We apply our method to AdaGrad in training deep neural networks. Experiments show encouraging results compared to the diagonal approximation.
    Super-model ecosystem: A domain-adaptation perspective. (arXiv:2208.14092v1 [cs.LG])
    This paper attempts to establish the theoretical foundation for the emerging super-model paradigm via domain adaptation, where one first trains a very large-scale model, {\it i.e.}, super model (or foundation model in some other papers), on a large amount of data and then adapts it to various specific domains. Super-model paradigms help reduce computational and data cost and carbon emission, which is critical to AI industry, especially enormous small and medium-sized enterprises. We model the super-model paradigm as a two-stage diffusion process: (1) in the pre-training stage, the model parameter diffuses from random initials and converges to a steady distribution; and (2) in the fine-tuning stage, the model parameter is transported to another steady distribution. Both training stages can be mathematically modeled by the Uhlenbeck-Ornstein process which converges to two Maxwell-Boltzmann distributions, respectively, each of which characterizes the corresponding convergent model. An $\mathcal O(1/\sqrt{N})$ generalization bound is then established via PAC-Bayesian framework. The theory finds that the generalization error of the fine-tuning stage is dominant in domain adaptation. In addition, our theory suggests that the generalization is determined by a new measure that characterizes the domain discrepancy between the source domain and target domain, based on the covariance matrices and the shift of the converged local minimum.
    Convergence Rates for Regularized Optimal Transport via Quantization. (arXiv:2208.14391v1 [math.OC])
    We study the convergence of divergence-regularized optimal transport as the regularization parameter vanishes. Sharp rates for general divergences including relative entropy or $L^{p}$ regularization, general transport costs and multi-marginal problems are obtained. A novel methodology using quantization and martingale couplings is suitable for non-compact marginals and achieves, in particular, the sharp leading-order term of entropically regularized 2-Wasserstein distance for all marginals with finite $(2+\delta)$-moment.
    Modelling variability in vibration-based PBSHM via a generalised population form. (arXiv:2203.07115v2 [cs.LG] UPDATED)
    Structural health monitoring (SHM) has been an active research area for the last three decades, and has accumulated a number of critical advances over that period, as can be seen in the literature. However, SHM is still facing challenges because of the paucity of damage-state data, operational and environmental fluctuations, repeatability issues, and changes in boundary conditions. These issues present as inconsistencies in the captured features and can have a huge impact on the practical implementation, but more critically, on the generalisation of the technology. Population-based SHM has been designed to address some of these concerns by modelling and transferring missing information using data collected from groups of similar structures. In this work, vibration data were collected from four healthy, nominally-identical, full-scale composite helicopter blades. Manufacturing differences (e.g., slight differences in geometry and/or material properties), among the blades presented as variability in their structural dynamics, which can be very problematic for SHM based on machine learning from vibration data. This work aims to address this variability by defining a general model for the frequency response functions of the blades, called a form, using mixtures of Gaussian processes.
    Weight-variant Latent Causal Models. (arXiv:2208.14153v1 [cs.LG])
    Causal representation learning exposes latent high-level causal variables behind low-level observations, which has enormous potential for a set of downstream tasks of interest. Despite this, identifying the true latent causal representation from observed data is a great challenge. In this work we focus on identifying latent causal variables. To this end, we analysis three intrinsic properties in latent space, including transitivity, permutation and scaling. We show that the transitivity severely hinders the identifiability of latent causal variables, while permutation and scaling guide the direction of identifying latent causal variable. To break the transitivity, we assume the underlying latent causal relations to be linear Gaussian models, in which the weights, mean and variance of Gaussian noise are modulated by an additionally observed variable. Under these assumptions we theoretically show that the latent causal variables can be identifiable up to trivial permutation and scaling. Built on this theoretical result, we propose a novel method, termed Structural caUsAl Variational autoEncoder, which directly learns latent causal variables, together with the mapping from the latent causal variables to the observed ones. Experimental results on synthetic and real data demonstrate the identifiable result and the ability of the proposed method for learning latent causal variables.
    Online Active Regression. (arXiv:2207.05945v2 [cs.LG] UPDATED)
    Active regression considers a linear regression problem where the learner receives a large number of data points but can only observe a small number of labels. Since online algorithms can deal with incremental training data and take advantage of low computational cost, we consider an online extension of the active regression problem: the learner receives data points one by one and immediately decides whether it should collect the corresponding labels. The goal is to efficiently maintain the regression of received data points with a small budget of label queries. We propose novel algorithms for this problem under $\ell_p$ loss where $p\in[1,2]$. To achieve a $(1+\epsilon)$-approximate solution, our proposed algorithms only require $\tilde{\mathcal{O}}(\epsilon^{-1} d \log(n\kappa))$ queries of labels, where $n$ is the number of data points and $\kappa$ is a quantity, called the condition number, of the data points. The numerical results verify our theoretical results and show that our methods have comparable performance with offline active regression algorithms.
    Offline Reinforcement Learning: Fundamental Barriers for Value Function Approximation. (arXiv:2111.10919v2 [cs.LG] UPDATED)
    We consider the offline reinforcement learning problem, where the aim is to learn a decision making policy from logged data. Offline RL -- particularly when coupled with (value) function approximation to allow for generalization in large or continuous state spaces -- is becoming increasingly relevant in practice, because it avoids costly and time-consuming online data collection and is well suited to safety-critical domains. Existing sample complexity guarantees for offline value function approximation methods typically require both (1) distributional assumptions (i.e., good coverage) and (2) representational assumptions (i.e., ability to represent some or all $Q$-value functions) stronger than what is required for supervised learning. However, the necessity of these conditions and the fundamental limits of offline RL are not well understood in spite of decades of research. This led Chen and Jiang (2019) to conjecture that concentrability (the most standard notion of coverage) and realizability (the weakest representation condition) alone are not sufficient for sample-efficient offline RL. We resolve this conjecture in the positive by proving that in general, even if both concentrability and realizability are satisfied, any algorithm requires sample complexity polynomial in the size of the state space to learn a non-trivial policy. Our results show that sample-efficient offline reinforcement learning requires either restrictive coverage conditions or representation conditions that go beyond supervised learning, and highlight a phenomenon called over-coverage which serves as a fundamental barrier for offline value function approximation methods. A consequence of our results for reinforcement learning with linear function approximation is that the separation between online and offline RL can be arbitrarily large, even in constant dimension.
    Identifying Latent Causal Content for Multi-Source Domain Adaptation. (arXiv:2208.14161v1 [cs.LG])
    Multi-source domain adaptation (MSDA) learns to predict the labels in target domain data, under the setting where all data from multiple source domains are labelled and the data from the target domain are unlabeled. To handle this problem, most of methods focus on learning invariant representations across domains. However, their success severely relies on the assumption that label distribution remains unchanged across domains. To mitigate it, we propose a new assumption, latent covariate shift, where the marginal distribution of the latent content variable changes across domains, and the conditional distribution of the label given the latent content remains invariant across domains. We introduce a latent style variable to complement the latent content variable forming a latent causal graph as the data and label generating process. We show that although the latent style variable is unidentifiable due to transitivity property in the latent space, the latent content variable can be identified up to simple scaling under some mild conditions. This motivates us to propose a novel method for MSDA, which learns the invariant label distribution conditional on the latent content variable, instead of learning invariant representations. Empirical evaluation on simulation and real data demonstrates the effectiveness of the proposed method, compared with many state-of-the-art methods based on invariant representation.
    A General End-to-end Diagnosis Framework for Manufacturing Systems. (arXiv:1901.02057v2 [cs.LG] UPDATED)
    The manufacturing sector is envisioned to be heavily influenced by artificial intelligence-based technologies with the extraordinary increases in computational power and data volumes. A central challenge in manufacturing sector lies in the requirement of a general framework to ensure satisfied diagnosis and monitoring performances in different manufacturing applications. Here we propose a general data-driven, end-to-end framework for the monitoring of manufacturing systems. This framework, derived from deep learning techniques, evaluates fused sensory measurements to detect and even predict faults and wearing conditions. This work exploits the predictive power of deep learning to automatically extract hidden degradation features from noisy, time-course data. We have experimented the proposed framework on ten representative datasets drawn from a wide variety of manufacturing applications. Results reveal that the framework performs well in examined benchmark applications and can be applied in diverse contexts, indicating its potential use as a critical corner stone in smart manufacturing.
    Dynamic Network Sampling for Community Detection. (arXiv:2208.13921v1 [cs.SI])
    We propose a dynamic network sampling scheme to optimize block recovery for stochastic blockmodel (SBM) in the case where it is prohibitively expensive to observe the entire graph. Theoretically, we provide justification of our proposed Chernoff-optimal dynamic sampling scheme via the Chernoff information. Practically, we evaluate the performance, in terms of block recovery, of our method on several real datasets from different domains. Both theoretically and practically results suggest that our method can identify vertices that have the most impact on block structure so that one can only check whether there are edges between them to save significant resources but still recover the block structure.
    Discriminative Learning of Similarity and Group Equivariant Representations. (arXiv:1808.10078v2 [stat.ML] UPDATED)
    One of the most fundamental problems in machine learning is to compare examples: Given a pair of objects we want to return a value which indicates degree of (dis)similarity. Similarity is often task specific, and pre-defined distances can perform poorly, leading to work in metric learning. However, being able to learn a similarity-sensitive distance function also presupposes access to a rich, discriminative representation for the objects at hand. In this dissertation we present contributions towards both ends. In the first part of the thesis, assuming good representations for the data, we present a formulation for metric learning that makes a more direct attempt to optimize for the k-NN accuracy as compared to prior work. We also present extensions of this formulation to metric learning for kNN regression, asymmetric similarity learning and discriminative learning of Hamming distance. In the second part, we consider a situation where we are on a limited computational budget i.e. optimizing over a space of possible metrics would be infeasible, but access to a label aware distance metric is still desirable. We present a simple, and computationally inexpensive approach for estimating a well motivated metric that relies only on gradient estimates, discussing theoretical and experimental results. In the final part, we address representational issues, considering group equivariant convolutional neural networks (GCNNs). Equivariance to symmetry transformations is explicitly encoded in GCNNs; a classical CNN being the simplest example. In particular, we present a SO(3)-equivariant neural network architecture for spherical data, that operates entirely in Fourier space, while also providing a formalism for the design of fully Fourier neural networks that are equivariant to the action of any continuous compact group.
    Conjugate Natural Selection. (arXiv:2208.13898v1 [cs.LG])
    We prove that natural gradient descent, with respect to the parameters of a machine learning policy, admits a conjugate dynamical description consistent with evolution by natural selection. We characterize these conjugate dynamics as a locally optimal fit to the continuous-time replicator dynamics, and show that the Price equation applies to equivalence classes of functions belonging to a Hilbert space generated by the policy's architecture and parameters. We posit that "conjugate natural selection" intuitively explains the empirical effectiveness of natural gradient descent, while developing a useful analytic approach to the dynamics of machine learning.
    A Non-Classical Parameterization for Density Estimation Using Sample Moments. (arXiv:2201.04786v3 [stat.ML] UPDATED)
    Moment methods are an important means of density estimation, but they are generally strongly dependent on the choice of feasible functions, which severely affects the performance. We propose a non-classical parameterization for density estimation using the sample moments, which does not require the choice of such functions. The parameterization is induced by the Kullback-Leibler distance, and the solution of it, which is proved to exist and be unique subject to simple prior that does not depend on data, can be obtained by convex optimization. Simulation results show the performance of the proposed estimator in estimating multi-modal densities which are mixtures of different types of functions.
    Reducing Certified Regression to Certified Classification. (arXiv:2208.13904v1 [cs.LG])
    Adversarial training instances can severely distort a model's behavior. This work investigates certified regression defenses, which provide guaranteed limits on how much a regressor's prediction may change under a training-set attack. Our key insight is that certified regression reduces to certified classification when using median as a model's primary decision function. Coupling our reduction with existing certified classifiers, we propose six new provably-robust regressors. To the extent of our knowledge, this is the first work that certifies the robustness of individual regression predictions without any assumptions about the data distribution and model architecture. We also show that existing state-of-the-art certified classifiers often make overly-pessimistic assumptions that can degrade their provable guarantees. We introduce a tighter analysis of model robustness, which in many cases results in significantly improved certified guarantees. Lastly, we empirically demonstrate our approaches' effectiveness on both regression and classification data, where the accuracy of up to 50% of test predictions can be guaranteed under 1% training-set corruption and up to 30% of predictions under 4% corruption. Our source code is available at https://github.com/ZaydH/certified-regression.
    SpeqNets: Sparsity-aware Permutation-equivariant Graph Networks. (arXiv:2203.13913v3 [cs.LG] UPDATED)
    While (message-passing) graph neural networks have clear limitations in approximating permutation-equivariant functions over graphs or general relational data, more expressive, higher-order graph neural networks do not scale to large graphs. They either operate on $k$-order tensors or consider all $k$-node subgraphs, implying an exponential dependence on $k$ in memory requirements, and do not adapt to the sparsity of the graph. By introducing new heuristics for the graph isomorphism problem, we devise a class of universal, permutation-equivariant graph networks, which, unlike previous architectures, offer a fine-grained control between expressivity and scalability and adapt to the sparsity of the graph. These architectures lead to vastly reduced computation times compared to standard higher-order graph networks in the supervised node- and graph-level classification and regression regime while significantly improving over standard graph neural network and graph kernel architectures in terms of predictive performance.
    The case for fully Bayesian optimisation in small-sample trials. (arXiv:2208.13960v1 [cs.LG])
    While sample efficiency is the main motive for use of Bayesian optimisation when black-box functions are expensive to evaluate, the standard approach based on type II maximum likelihood (ML-II) may fail and result in disappointing performance in small-sample trials. The paper provides three compelling reasons to adopt fully Bayesian optimisation (FBO) as an alternative. First, failures of ML-II are more commonplace than implied by the existing studies using the contrived settings. Second, FBO is more robust than ML-II, and the price of robustness is almost trivial. Third, FBO has become simple to implement and fast enough to be practical. The paper supports the argument using relevant experiments, which reflect the current practice regarding models, algorithms, and software platforms. Since the benefits seem to outweigh the costs, researchers should consider adopting FBO for their applications so that they can guard against potential failures that end up wasting precious research resources.
    A Diffusion Model Predicts 3D Shapes from 2D Microscopy Images. (arXiv:2208.14125v1 [cs.CV])
    Diffusion models are a class of generative models, showing superior performance as compared to other generative models in creating realistic images when trained on natural image datasets. We introduce DISPR, a diffusion-based model for solving the inverse problem of three-dimensional (3D) cell shape prediction from two-dimensional (2D) single cell microscopy images. Using the 2D microscopy image as a prior, DISPR is conditioned to predict realistic 3D shape reconstructions. To showcase the applicability of DISPR as a data augmentation tool in a feature-based single cell classification task, we extract morphological features from the cells grouped into six highly imbalanced classes. Adding features from predictions of DISPR to the three minority classes improved the macro F1 score from $F1_\text{macro} = 55.2 \pm 4.6\%$ to $F1_\text{macro} = 72.2 \pm 4.9\%$. With our method being the first to employ a diffusion-based model in this context, we demonstrate that diffusion models can be applied to inverse problems in 3D, and that they learn to reconstruct 3D shapes with realistic morphological features from 2D microscopy images.
    Finite Sample Identification of Bilinear Dynamical Systems. (arXiv:2208.13915v1 [cs.LG])
    Bilinear dynamical systems are ubiquitous in many different domains and they can also be used to approximate more general control-affine systems. This motivates the problem of learning bilinear systems from a single trajectory of the system's states and inputs. Under a mild marginal mean-square stability assumption, we identify how much data is needed to estimate the unknown bilinear system up to a desired accuracy with high probability. Our sample complexity and statistical error rates are optimal in terms of the trajectory length, the dimensionality of the system and the input size. Our proof technique relies on an application of martingale small-ball condition. This enables us to correctly capture the properties of the problem, specifically our error rates do not deteriorate with increasing instability. Finally, we show that numerical experiments are well-aligned with our theoretical results.

  • Open

    [P] Introducing Eqxvision: A library for computer vision models built using Equinox
    About Equinox blends in the ease of PyTorch-like API with the functionality of JAX. Eqxvision is intended to be an extension to Equinox catered for computer vision research. Github, Documentation With Eqxvision, the idea is to support torchvision like API for various computer vision models. As a one stop solution, this cuts down the need to look for an ideal (and working) github implementation of your favourite network. It started off as a hobby project for me to get familiar with JAX and having an existing understanding of PyTorch, Equinox served as an excellent starting point. Usage As simple as model = eqxvision.models.resnet50(pretrained=True) Replace resnet50 with your favourite classification model. At the moment, the library supports almost all of the image classification models along with their pre-trained weights ported from torchvision. Checkout Torchvision vs. Eqxvision ImageNet-1k accuracy comparison here. What's Next? I plan to expand by introducing CV models from the domains of segmentation, object detection ... Introducing some advanced tutorials to cover API usage and different problems in computer vision. Some basic tutorials are listed here. Acknowledgements Working on Eqxvision and Equinox, I got the chance to learn a lot from u/patrickkidger. Thanks! Also, a shout-out to torchvision and timm, for doing much of the heavy lifting. submitted by /u/PaganPasta [link] [comments]  ( 90 min )
    "[R]" Collaborative Learning: averaging and pruning difference? "[R]"
    Hi, I am reading the concept of collaborative learning, and I came to know that it is also called federated learning, but the following link performs averaging: federated learning or collaborative learning When locally trained, the updated models are sent back to the central server via encrypted communication channels. Well, here it’s worth noting that the server doesn’t get any real data, but only trained parameters of the model (e.g., weights in neural networks). The updates from all clients are averaged and aggregated into a single shared model, improving its accuracy. But the following text uses pruning, The training method implicit in the framework could be described as a semi-supervised and halfway model pruning collaborative learning method, denoted by SSHFMP. I got the above text from the following paper: Following is the link of the paper (NetSpirit: A Smart Collaborative Learning Framework for DDoS Attack Detection): NetSprint Both the paper and web article focus on collaborative learning. Somebody, please guide me on why the link https://www.altexsoft.com/blog/federated-learning/ uses the concept of averaging and why the research paper uses the concept of pruning. Zulfi. submitted by /u/Snoo20972 [link] [comments]  ( 89 min )
    "[R]" Problem with Collaborative Learning: How does pruning help in Collaborative Learning?"[R]"
    Hi, I have a problem in understanding the problem of collaborative learning as discussed in the following text: Fine-grained pruning usually refers to weight pruning, which forces a portion of each layer’s weights to zero according to the L1 or L2 norm, without affecting the structure of the model. In collaborative learning, before the server sends the parameters to the participant, a certain compression algorithm (e.g., gzip) will be applied to the pruned weights. The higher the pruning ratio, the more zeros are set, and the smaller the size of the compressed parameters. Following is the link of the paper (NetSpirit: A Smart Collaborative Learning Framework for DDoS Attack Detection): http://www.thucsnet.com/wp-content/papers/ke_network2021.pdf Somebody, please guide me. Zulfi. submitted by /u/Snoo20972 [link] [comments]  ( 90 min )
    [D] Would you work as a ML Engineer, even if you prefer research?
    Throwaway because reasons. I always thought of myself as a researcher kind of person. However, not having a CS background (but a very recent PhD with a "new" ML method nonetheless) and being from Europe , I do not have a track record of publication. It is actually rather puny tbh. Recently, a recruiter from Meta reached out to offer me a role as an ML engineer. I was happy at first, but a little disappointed now, since it is not research. On the other hand, working for Meta could help me improve my publication record I guess, but I am not at all sure what to do. What would you do? tldr; I wanted to be a researcher with almost no publications, but Meta offered me a role as ML Engineer. What to do? ​ View Poll submitted by /u/burn-with-fire [link] [comments]  ( 91 min )
    [R] Flight Trajectory Planning using DL or ML
    Hello, I am doing research on flight trajectory planning. I am currently gathering research papers but I want to post this discussion if anyone has insights on previous works (code preferable but not necessary). submitted by /u/Environmental-Fee753 [link] [comments]  ( 88 min )
    [D] AutoML thoughts
    AutoML discussion Hi everyone Recently I came across AutoML platforms/tools (e.g. PyCaret, H2OAutoML), I was blown-away by the amount of such tools that I wondered what the community thinks about them. When would one use a tool like that? Is there any tools that are considered "standard" for someone to know? I would really like to know people's expirence with it and thoughts about it. submitted by /u/SpiritofPleasure [link] [comments]  ( 91 min )
    Web scraping and AI analyze [P]
    Last night I watched movie "Snowden" and I got inspiration from that movie. I thought about making project where I will make program that scrapes web and saves data. That data should tell me about local news/gossips in my local area, but thats after AI "comb through" all that data. This is my first AI project and I am very interested in AI last few months. What software should I use? What programming language? Litereally I dont know how to start. I realized that, in order to do this, I need know: To make web scraper How to store all this data (databases) To implement AI (this is most unknown to me so if you have any advices, be free to write) Would you add something? I know that programs from movie Snowden were based on real events and I knoe that it was made by hundreds of people, so thats why I am trying to make it for my local area. Even if I dont succeed, it is precious experience that I'll gain going through this project. submitted by /u/wannacry011 [link] [comments]  ( 90 min )
    [D] LSTM predicts same value for different inputs
    Hi guys, I'm working on a time-series problem (with ~240 datapoints) and every LSTM I train keeps predicting the same value for different inputs. I tried every suggested solution on stackoverflow and similar pages (changing activation functions, dropout, lag, layers, amount of nodes, different lags) but it still keeps predicting the same value (with a few models I got a repetitive pattern that was predicted with three different values). The last possibility I read is, that I have to few data, which makes sense. Still, wanted to check what you guys think - thanks in advance. submitted by /u/PrinceDome [link] [comments]  ( 90 min )
    [R] Designing DNA sequences that control gene expression using generative deep learning
    Happy to share that our study on designing DNA sequences to control gene expression using generative deep learning is just out! https://www.nature.com/articles/s41467-022-32818-8 This is a continuation of our previous work, where we learned to 'read' regulatory DNA using deep neural nets, accurately predicting gene expression levels in multiple organisms and finding predictive regulatory grammar across whole gene regulatory regions. https://www.nature.com/articles/s41467-020-19921-4 Here we combine the predictive models with advanced generative models in an architecture termed ExpressionGAN that can be used to 'write' (design) de novo regulatory DNA with target gene expression levels. submitted by /u/janimezzz [link] [comments]  ( 108 min )
    [D] I've done a little experiment on the effect of initial conditions on the 0 hidden layers MLP trained on XOR, results are a bit confusing.
    As we know, the 2>1 MLP cannot model the XOR operation, which is (I've been told) the reason why multi-layer MLP's/NN's were created as single-layer MLP's cannot model everything. It is fair to see that, given a set of data points, the single layer MLP when trained on a problem like XOR is very sensitive to initial conditions and those initial conditions will dictate what sort of "XOR" MLP we're going to get. ​ So I've setup an experiment, I vary the initial conditions and fix the bias of a single-layer MLP then train it on a small set of randomly generated data using Gradient Descent with eta=0.1 to see what error are we going to get out of it; i.e: Which XOR test will fail. I obtained this small animation: Results https://reddit.com/link/x1k0nm/video/gx8k1cctcvk91/player The failures I've defined are as such: Symmetric OR Test: 1 XOR 0 != 0 XOR 1 Xclusive Test: 1 XOR 1 = 1 Negative OR Test: 0 XOR 0 = 1 Note: More than 1 Failure includes failing 1 XOR 0 = 1 and 0 XOR 1 = 1 I have four observations/questions: Why is it not failing the Negative OR Test What's causing the discontinuity at 0.5 Why is the result not symmetric with respect to the bias, It should be... If I multiply the weights by -1 I should get the same result I assume Why does it appear "fractal" What's happening in the negative-negative region, this could be modeled further; Though the data takes a bit long to get (Taken me over an hour and 15 minutes on a powerful CPU while running in parallel on 4 cores in Julia submitted by /u/qfl3x [link] [comments]  ( 109 min )
    [D] Why are ensemble models like XGBoost rarely used as primary methods or baselines in applied papers?
    Hi, I'm inquisitive about this, I've read some applied fields like biology papers, and they seldom used ensemble models as a baseline or primary model. Still, these models are proven to have excellent performance in competitions, why? submitted by /u/PaintTop5512 [link] [comments]  ( 95 min )
    [P] Simple Implementation of Pix2Seq in PyTorch - Object Detection w/ Transformers
    ​ My Implementation of Pix2Seq in PyTorch | Image By Me Object Detection with Transformers and Language Modeling! I've implemented Pix2Seq paper and written an in-depth tutorial on it. Link to my Towards AI blog Link to the project on my GitHub. Link to the Colab Notebook submitted by /u/moein-shariatnia [link] [comments]  ( 88 min )
    Understanding research paper on extraction of form like documents [D]
    Below is an excerpt from a paper titled "Representation Learning for Information Extraction from Form-like Documents" : "As shown in Figure 2, we represent the position of a candidate and each of its neighbors using the 2-D Cartesian coordinates of the centroids of their respective bounding boxes. These coordinates are normalized by dividing by the corresponding page dimensions so that the features are independent of the pixel resolution of the input documents." Below is an image which they have referred to : ​ ​ https://preview.redd.it/vzkh11h25uk91.png?width=1182&format=png&auto=webp&s=de776e50c89c8699f115ed36c8c4931d4d27569f I am trying to wrap my head around how normalizing helps in making the feature independent of pixel resolution of the input document. Can someone help explain this. #documentintelligence submitted by /u/fountainhop [link] [comments]  ( 89 min )
    [D] Fusion of embeddings of different dimensions
    I'm trying to fuse two embeddings (from pretrained models) of different dimensions, 1024 and 2048. What's the right way of going about this? Would it be okay for instance, to pass the 2048-dimensional embedding through a Linear layer with output size 1024, and then proceed with standard fusion strategies? Or is there a better way? submitted by /u/fullgoopy_alchemist [link] [comments]  ( 89 min )
    Source/tool for "clean" bibtex entries? [R]
    I'm finding myself spending what feels like a stupid amount of time cleaning up the Bibtex entries I get from arxiv/dblp/whatever so that the citations actually look how I want when references get generated. This includes removing extraneous fields, cleaning up conference names, force-capitalizing words like Bayesian, etc. But the thing that pains me more is I know there are thousands of other people out there wasting thousands of hours doing exactly the same thing, to get the citations to look essentially the same. Is there a tool for doing this automatically? I'm thinking something like MathpixSnip but for bibtex entries specifically, but something hard-coded could maybe work too? Or some other hacky way of going about this? Any input would be much appreciated! :-) submitted by /u/_-__-____ [link] [comments]  ( 89 min )
    [D] How do you think Tesla's neural rendering for simulation works?
    I was just amazed at how realistic Tesla's simulation was during its "Tesla AI Day". Here's the clip I'm talking about: https://www.youtube.com/watch?v=-gBha9l-ZiA Anyone know what kind of neural rendering techniques could have been used? I feel like it can't be style transfer since there'll be some weird temporal artifacts as shown in this Two Minute Papers 3D style transfer video: https://www.youtube.com/watch?v=fcnjHmBcLNQ Also specifically how do you think they made the color distribution of the simulation match exactly the real images? submitted by /u/Snoo_1043 [link] [comments]  ( 91 min )
    [D] When dealing with text sequences containing more than one language, is using a multilingual model the only way to go?
    Hi. I'm currently trying to deal with text data that contains more than one language. I've resorted to using a multilingual language model in order to properly encode the sequence, but I'm wondering if this is the only way to go about it. I've thought of using multiple models to encode the sequence in each language (assuming they're using the same tokenization algorithm) and combining their representations to minimize the proportion of unknown tokens, but am not exactly sure if this is a viable or efficient approach. Are there any other ways that I haven't thought of? Any tips or pointers are appreciated, thanks! submitted by /u/Seankala [link] [comments]  ( 90 min )
  • Open

    DSC Weekly 30 August 2022 – Metaverse Misfires
    One of the central concepts of the metaverse is the notion that every person has a single sign-on controlled by the vendors, which can, in turn, be used to track users as they move from one virtual world to another. The post DSC Weekly 30 August 2022 – Metaverse Misfires appeared first on Data Science Central.  ( 22 min )
    Semantic Graph as the Next Step for Web Data Architecture
    Webby data architects and modelers–the spider-like ones who use intelligent graph design and a bit of glue or another sticky substance to achieve their objectives–are focused on making joinery much more efficient and scaling a lot more useful with the help of more contextualized data. The post Semantic Graph as the Next Step for Web Data Architecture appeared first on Data Science Central.  ( 20 min )
    How to Make Black-box Systems more Transparent
    This article is intended to users relying on machine learning solutions offered by third party vendors. It applies to platforms, dashboards, traditional software, or even external pieces of code that are too time consuming to modify. One of the goals is to turn such systems into explainable AI. The post How to Make Black-box Systems more Transparent appeared first on Data Science Central.  ( 23 min )
    A Model For Technical Debt In Machine Learning Systems
    Technical Debt describes what results when development teams take conscious actions to expedite the delivery of a piece of functionality or a project which later needs to be remediated via refactoring. The post A Model For Technical Debt In Machine Learning Systems appeared first on Data Science Central.  ( 21 min )
    How to Overcome Challenges in Digital Transformation
    Digital transformation isn’t just about technology.  It’s the holistic approach to developing better processes, improving employee engagement, and optimizing your business for long-term growth by adopting the proper technology.  Complex endeavor, you should expect some resistance along the way and be prepared to overcome challenges.  What is digital transformation?  Let’s start with the basics.  A… Read More »How to Overcome Challenges in Digital Transformation The post How to Overcome Challenges in Digital Transformation appeared first on Data Science Central.  ( 23 min )
    Blockchain Technology: The Future of the Manufacturing Industry
    The potential of blockchain in the manufacturing industry to deliver business value is high. The use of blockchain technology increases visibility and security in every segment of the industry, including suppliers, sourcing, procurement, quality check, floor operations, and monitoring. In brief, it delivers a new business model in the industry. The blockchain industry is booming… Read More »Blockchain Technology: The Future of the Manufacturing Industry The post Blockchain Technology: The Future of the Manufacturing Industry appeared first on Data Science Central.  ( 20 min )
    We Live in a Bayesian World – Part 2
    Those darn humans have a few learning impediments that impact our ability to effectively learn from integrating a historical perspective with new evidence or data. Let’s explore some impediments that hinder our ability to learn more effectively. The post We Live in a Bayesian World – Part 2 appeared first on Data Science Central.  ( 21 min )
  • Open

    I'm building an AI app that generates workouts plan and diets based on your data (work in progress)
    submitted by /u/SickBruhh [link] [comments]  ( 87 min )
    Weekly China AI News: China Vows to Support 10 AI Implementations; Baidu Unveils Its First Quantum Computer; Chinese AI Champion Loses Momentum with 1st Revenue Decline in 5 Years
    submitted by /u/trcytony [link] [comments]  ( 87 min )
    Is there any AIs of this yet?
    There are a lot of text-to-image AIs now, but I am wondering if there is any AIs that take a image and text as an input. I.E? Let's say you take a picture of yourself. You input that image, and then you also write "Holding a banana" and it would essentially photoshop a banana in your hand. Is this a thing yet? submitted by /u/ResearchFew9041 [link] [comments]  ( 87 min )
    Hi.
    Hi. I have a strange question, a request. Is there an artificial intelligence site where I can generate daily events that can happen to a person throughout his life? It doesn't matter if it's paid or free. I don't know if there is an artificial intelligence that can do this, but I wanted to ask anyway. ​ I'm talking about any event in daily life. For example, the character may splash water on him while walking, his dog may die, his lover may cheat, or he may spill his food on him. So there is no limit. I want any daily event. ​ Maybe it would be even better if I could categorize it. For example, can I write what will happen to a cook? ​ Waiting for your ideas. Sorry for my bad English. submitted by /u/yungfrxzn [link] [comments]  ( 88 min )
    LLMs have not learned our language — we’re trying to learn theirs
    submitted by /u/bendee983 [link] [comments]  ( 87 min )
    Results of implementing a Nvidia paper
    submitted by /u/Rindsroulade [link] [comments]  ( 87 min )
    Knowledge graphs
    Any knowledge graphs resouces links !? And also curious to know its real world applications. Im aspiring to startup narrowing to KG submitted by /u/rexstiener [link] [comments]  ( 91 min )
    Knowledge graphs
    Any knowledge graphs resources!? And its applications in real world. submitted by /u/rexstiener [link] [comments]  ( 86 min )
    Apocalyptic Dieselpunk Sedan
    submitted by /u/widgia [link] [comments]  ( 87 min )
    8 Excellent Python Courses on Udemy (2022)
    submitted by /u/Jan_Prince [link] [comments]  ( 99 min )
    The Best Machine Learning Courses on Udemy (2022)
    submitted by /u/Jan_Prince [link] [comments]  ( 103 min )
    Goofy AI text completion websites
    OK, this is a trivial request but I actually find the crazy responses you get from AI text production to be hilarious. As an example, this video from Kyle Hill had me laughing my ass off: https://www.youtube.com/watch?v=cKfnjxMS2RM   So can anyone recommend a free website that uses AI that can churn out nonsense like this? It's almost better if it isn't actually very good at sounding human. I just want something that can give me a laugh when I give it a little bit of text to work off.   Thanks in advance to anyone who can help me. submitted by /u/ClerkMark [link] [comments]  ( 87 min )
    AI for managing files?
    Are there any AI that can help organize and manage files? I don't have idea how it will work though. Because file organization is subjective. But sometimes I think all these cluttered files. That I think will be useful but never find them except by accident later. Maybe at least a local search engine. The amount of data we have locally is now huge going to many terabytes. However the search system is very primitive. submitted by /u/Mister-Khalifa [link] [comments]  ( 87 min )
    A Complete Roadmap for learning Machine Learning with many valuable resources + staying up-to-date with the news. Intended for anyone having zero or a small background in programming, maths, and machine learning.
    submitted by /u/OnlyProggingForFun [link] [comments]  ( 87 min )
    Stable Diffusion Is the Most Important AI Art Model Ever
    submitted by /u/girl_meets_tech [link] [comments]  ( 88 min )
    Marble statue with shadow veins
    submitted by /u/acropanthus [link] [comments]  ( 86 min )
    🧮 Does this exist?
    I am a classroom teacher. I would LOVE an app where my students could enter a math problem (or take a photo of it) and out comes a generated step-by-step explainer video. Think of Khan Academy style explainer videos for the exact problem that you’re trying to solve. Does anyone know if this exists? Thanks! submitted by /u/elliottzelinskas [link] [comments]  ( 87 min )
    PyTorch introduces ‘nvFuser’: a Deep Learning Compiler for NVIDIA GPUs that automatically just-in-time compiles fast and flexible kernels to reliably accelerate users’ networks
    The PyTorch team recently released a Deep Learning Compiler for NVIDIA GPUs called nvFuser. This compiler automatically creates quick, adaptable kernels, speeding up user networks. Creating quick bespoke “fusion” kernels at runtime also significantly accelerates deep learning networks running on Volta and later CUDA accelerators. The new and updated compiler, nvFuser, supports a variety of network architectures as well as applications with dynamic inputs of different shapes and strides and has been specially created to address the particular needs of the PyTorch community. In order to optimize and accelerate PyTorch operations, nvFuser uses graphical representations. Users’ PyTorch operations are not directly accessible as a complete program that a system like nvFuser can optimize because PyTorch uses an eager execution approach. As a result, there is a need for intermediary systems that can translate user programs into a format that nvFuser can optimize. These more advanced methods send the captured operations to nvFuser, which can subsequently tailor the user’s script execution for NVIDIA GPUs. Continue reading |Github link | Reference article submitted by /u/ai-lover [link] [comments]  ( 95 min )
    Leave the legal requirements aside. What's the impact of explainable AI on the internal data science team's ability to innovate?
    submitted by /u/Thuwarakesh [link] [comments]  ( 87 min )
  • Open

    NVIDIA and VMware CEOs Discuss New Era of Enterprise Computing
    Reinventing enterprise computing for the modern era, VMware CEO Raghu Raghuram Tuesday announced the availability of the VMware vSphere 8 enterprise workload platform running on NVIDIA DPUs, or data processing units, an initiative formerly known as Project Monterey. Placing the announcement in context, Raghuram and NVIDIA founder and CEO Jensen Huang discussed how running VMware Read article > The post NVIDIA and VMware CEOs Discuss New Era of Enterprise Computing appeared first on NVIDIA Blog.  ( 7 min )
  • Open

    Everything You Need To Know About Modelling Click Stream Data[Article]
    submitted by /u/JoshuaDaD [link] [comments]  ( 87 min )
    DARPA reported research that used brain implants to conduct transfer learning like artificial neural networks.
    SkewsMe.com/cyborgs is the latest draft of my paper which was credited in 2005 as the basis for the Wikipedia article about brain implants after being ordered to stand down about cyborgs in 1997. submitted by /u/TheSkewsMe [link] [comments]  ( 87 min )
  • Open

    The Best Machine Learning Courses on Udemy (2022)
    submitted by /u/Jan_Prince [link] [comments]  ( 113 min )
    Which papers are milestones in Multi Agent (Deep) Reinforcement Learning?
    I figured out "Emergent tool use from multi-agent autocurricula" from Open AI, I am wondering about other candidates. submitted by /u/Lostefra [link] [comments]  ( 87 min )
    Neural language processing IRL with Zenodia Charpy & Eric Johnson
    submitted by /u/asc2450 [link] [comments]  ( 87 min )
    If you were interviewing a candidate who claimed to have used RL to a significant degree, what knowledge would you test that candidate on ?
    submitted by /u/rower22 [link] [comments]  ( 103 min )
    Deepmind's Player of Games
    Has anyone seen an implementation of this anywhere? I've been looking for a drill down beyond the paper on youtube/GitHub but can't seem to find anything. Has anyone else had any luck? submitted by /u/atomicburn125 [link] [comments]  ( 87 min )
    Very Basic question of Iterative Policy Evaluation of Sutton Barto
    I have a little knowledge of RL. Years ago I completed the University of Alberta course on Coursera. But time flows, I forgotten almost everything and I'm studying Sutton Barto a second time: sorry if the question looks stupid. I'm refreshing Dynamic Programming chapter. I have seen this pseudocode: https://preview.redd.it/dqnq46es2tk91.png?width=1018&format=png&auto=webp&s=ec8f8b27a8c3715661ac383928100c7fd254722c For me it's wrong. There is maybe something very simple I'm missing but I cannot understand how this can be correct. For me the correct version is as above but with 2 lines of difference: Δ <- +inf .... Δ <- min(Δ, |v-V(s)|) submitted by /u/Moist_Turnip [link] [comments]  ( 88 min )
  • Open

    Create a batch recommendation pipeline using Amazon Personalize with no code
    With personalized content more likely to drive customer engagement, businesses continuously seek to provide tailored content based on their customer’s profile and behavior. Recommendation systems in particular seek to predict the preference an end-user would give to an item. Some common use cases include product recommendations on online retail stores, personalizing newsletters, generating music playlist […]  ( 9 min )
  • Open

    AI that can learn the patterns of human language
    On its own, a new machine-learning model discovers linguistic rules that often match up with those created by human experts.  ( 9 min )
  • Open

    Modal logic posts
    Blog posts about modal logic Typesetting modal logic Modal and temporal logic for security Word problems, modal logic, and regular expressions Axioms for S1 through S5 Naming and numbering modal logic systems Modal logic and science fiction Dual axioms Temporal and polymodal logics I created the image above by asking DALL-E 2 for an image […] Modal logic posts first appeared on John D. Cook.  ( 4 min )
  • Open

    Robust Distributed Bayesian Learning with Stragglers via Consensus Monte Carlo. (arXiv:2112.09794v2 [cs.LG] UPDATED)
    This paper studies distributed Bayesian learning in a setting encompassing a central server and multiple workers by focusing on the problem of mitigating the impact of stragglers. The standard one-shot, or embarrassingly parallel, Bayesian learning protocol known as consensus Monte Carlo (CMC) is generalized by proposing two straggler-resilient solutions based on grouping and coding. Two main challenges in designing straggler-resilient algorithms for CMC are the need to estimate the statistics of the workers' outputs across multiple shots, and the joint non-linear post-processing of the outputs of the workers carried out at the server. This is in stark contrast to other distributed settings like gradient coding, which only require the per-shot sum of the workers' outputs. The proposed methods, referred to as Group-based CMC (G-CMC) and Coded CMC (C-CMC), leverage redundant computing at the workers in order to enable the estimation of global posterior samples at the server based on partial outputs from the workers. Simulation results show that C-CMC may outperform G-CMC for a small number of workers, while G-CMC is generally preferable for a larger number of workers.
    Lagrangian Method for Q-Function Learning (with Applications to Machine Translation). (arXiv:2207.11161v2 [cs.LG] UPDATED)
    This paper discusses a new approach to the fundamental problem of learning optimal Q-functions. In this approach, optimal Q-functions are formulated as saddle points of a nonlinear Lagrangian function derived from the classic Bellman optimality equation. The paper shows that the Lagrangian enjoys strong duality, in spite of its nonlinearity, which paves the way to a general Lagrangian method to Q-function learning. As a demonstration, the paper develops an imitation learning algorithm based on the duality theory, and applies the algorithm to a state-of-the-art machine translation benchmark. The paper then turns to demonstrate a symmetry breaking phenomenon regarding the optimality of the Lagrangian saddle points, which justifies a largely overlooked direction in developing the Lagrangian method.
    Robust Fine-Tuning of Deep Neural Networks with Hessian-based Generalization Guarantees. (arXiv:2206.02659v2 [cs.LG] UPDATED)
    We consider transfer learning approaches that fine-tune a pretrained deep neural network on a target task. We study generalization properties of fine-tuning to understand the problem of overfitting, which commonly occurs in practice. Previous works have shown that constraining the distance from the initialization of fine-tuning improves generalization. Using a PAC-Bayesian analysis, we observe that besides distance from initialization, Hessians affect generalization through the noise stability of deep neural networks against noise injections. Motivated by the observation, we develop Hessian distance-based generalization bounds for a wide range of fine-tuning methods. Additionally, we study the robustness of fine-tuning in the presence of noisy labels. Motivated by our theory, we design an algorithm that incorporates consistent losses and distance-based regularization for fine-tuning, along with a generalization error guarantee under class conditional independent noise in the training set labels. We perform a detailed empirical study of our algorithm on various noisy environments and architectures. On six image classification tasks whose training labels are generated with programmatic labeling, we find a 3.26% accuracy gain over prior fine-tuning methods. Meanwhile, the Hessian distance measure of the fine-tuned model decreases by six times more than existing approaches.
    Bayesian Graph Contrastive Learning. (arXiv:2112.07823v4 [cs.LG] UPDATED)
    Contrastive learning has become a key component of self-supervised learning approaches for graph-structured data. Despite their success, existing graph contrastive learning methods are incapable of uncertainty quantification for node representations or their downstream tasks, limiting their application in high-stakes domains. In this paper, we propose a novel Bayesian perspective of graph contrastive learning methods showing random augmentations leads to stochastic encoders. As a result, our proposed method represents each node by a distribution in the latent space in contrast to existing techniques which embed each node to a deterministic vector. By learning distributional representations, we provide uncertainty estimates in downstream graph analytics tasks and increase the expressive power of the predictive model. In addition, we propose a Bayesian framework to infer the probability of perturbations in each view of the contrastive model, eliminating the need for a computationally expensive search for hyperparameter tuning. We empirically show a considerable improvement in performance compared to existing state-of-the-art methods on several benchmark datasets.
    PrivFairFL: Privacy-Preserving Group Fairness in Federated Learning. (arXiv:2205.11584v2 [cs.LG] UPDATED)
    Group fairness ensures that the outcome of machine learning (ML) based decision making systems are not biased towards a certain group of people defined by a sensitive attribute such as gender or ethnicity. Achieving group fairness in Federated Learning (FL) is challenging because mitigating bias inherently requires using the sensitive attribute values of all clients, while FL is aimed precisely at protecting privacy by not giving access to the clients' data. As we show in this paper, this conflict between fairness and privacy in FL can be resolved by combining FL with Secure Multiparty Computation (MPC) and Differential Privacy (DP). In doing so, we propose a method for training group-fair ML models in cross-device FL under complete and formal privacy guarantees, without requiring the clients to disclose their sensitive attribute values.
    Neural Network Verification with Proof Production. (arXiv:2206.00512v2 [cs.LO] UPDATED)
    Deep neural networks (DNNs) are increasingly being employed in safety-critical systems, and there is an urgent need to guarantee their correctness. Consequently, the verification community has devised multiple techniques and tools for verifying DNNs. When DNN verifiers discover an input that triggers an error, that is easy to confirm; but when they report that no error exists, there is no way to ensure that the verification tool itself is not flawed. As multiple errors have already been observed in DNN verification tools, this calls the applicability of DNN verification into question. In this work, we present a novel mechanism for enhancing Simplex-based DNN verifiers with proof production capabilities: the generation of an easy-to-check witness of unsatisfiability, which attests to the absence of errors. Our proof production is based on an efficient adaptation of the well-known Farkas' lemma, combined with mechanisms for handling piecewise-linear functions and numerical precision errors. As a proof of concept, we implemented our technique on top of the Marabou DNN verifier. Our evaluation on a safety-critical system for airborne collision avoidance shows that proof production succeeds in almost all cases and requires only minimal overhead.
    Optimal Estimation of Generic Dynamics by Path-Dependent Neural Jump ODEs. (arXiv:2206.14284v2 [stat.ML] UPDATED)
    This paper studies the problem of forecasting general stochastic processes using an extension of the Neural Jump ODE (NJ-ODE) framework. While NJ-ODE was the first framework to establish convergence guarantees for the prediction of irregularly observed time-series, these results were limited to data stemming from It\^o-diffusions with complete observations, in particular Markov processes where all coordinates are observed simultaneously. In this work, we generalise these results to generic, possibly non-Markovian or discontinuous, stochastic processes with incomplete observations, by utilising the reconstruction properties of the signature transform. These theoretical results are supported by empirical studies, where it is shown that the path-dependent NJ-ODE outperforms the original NJ-ODE framework in the case of non-Markovian data. Moreover, we show that PD-NJ-ODE can be applied successfully to limit order book (LOB) data.
    PL-Net: Progressive Learning Network for Medical Image Segmentation. (arXiv:2110.14484v2 [eess.IV] UPDATED)
    In recent years, segmentation methods based on deep convolutional neural networks (CNNs) have made state-of-the-art achievements for many medical analysis tasks. However, most of these approaches improve performance by optimizing the structure or adding new functional modules of the U-Net, which ignoring the complementation and fusion of the coarse-grained and fine-grained semantic information. To solve the above problems, we propose a medical image segmentation framework called progressive learning network (PL-Net), which includes internal progressive learning (IPL) and external progressive learning (EPL). PL-Net has the following advantages: (1) IPL divides feature extraction into two "steps", which can mix different size receptive fields and capture semantic information from coarse to fine granularity without introducing additional parameters; (2) EPL divides the training process into two "stages" to optimize parameters, and realizes the fusion of coarse-grained information in the previous stage and fine-grained information in the latter stage. We evaluate our method in different medical image analysis tasks, and the results show that the segmentation performance of PL-Net is better than the state-of-the-art methods of U-Net and its variants.
    Compound virtual screening by learning-to-rank with gradient boosting decision tree and enrichment-based cumulative gain. (arXiv:2205.02169v2 [q-bio.BM] UPDATED)
    Learning-to-rank, a machine learning technique widely used in information retrieval, has recently been applied to the problem of ligand-based virtual screening, to accelerate the early stages of new drug development. Ranking prediction models learn based on ordinal relationships, making them suitable for integrating assay data from various environments. Existing studies of rank prediction in compound screening have generally used a learning-to-rank method called RankSVM. However, they have not been compared with or validated against the gradient boosting decision tree (GBDT)-based learning-to-rank methods that have gained popularity recently. Furthermore, although the ranking metric called Normalized Discounted Cumulative Gain (NDCG) is widely used in information retrieval, it only determines whether the predictions are better than those of other models. In other words, NDCG is incapable of recognizing when a prediction model produces worse than random results. Nevertheless, NDCG is still used in the performance evaluation of compound screening using learning-to-rank. This study used the GBDT model with ranking loss functions, called lambdarank and lambdaloss, for ligand-based virtual screening; results were compared with existing RankSVM methods and GBDT models using regression. We also proposed a new ranking metric, Normalized Enrichment Discounted Cumulative Gain (NEDCG), which aims to properly evaluate the goodness of ranking predictions. Results showed that the GBDT model with learning-to-rank outperformed existing regression methods using GBDT and RankSVM on diverse datasets. Moreover, NEDCG showed that predictions by regression were comparable to random predictions in multi-assay, multi-family datasets, demonstrating its usefulness for a more direct assessment of compound screening performance.
    Scenario generation for market risk models using generative neural networks. (arXiv:2109.10072v4 [cs.LG] UPDATED)
    In this research, we show how to expand existing approaches of using generative adversarial networks (GANs) as economic scenario generators (ESG) to a whole internal market risk model - with enough risk factors to model the full band-width of investments for an insurance company and for a one year time horizon as required in Solvency 2. We demonstrate that the results of a GAN-based internal model are similar to regulatory approved internal models in Europe. Therefore, GAN-based models can be seen as a data-driven alternative way of market risk modeling.
    On stabilizing reinforcement learning without Lyapunov functions. (arXiv:2207.08730v6 [eess.SY] UPDATED)
    Reinforcement learning remains one of the major directions of the contemporary development of control engineering and machine learning. Nice intuition, flexible settings, ease of application are among the many perks of this methodology. From the standpoint of machine learning, the main strength of a reinforcement learning agent is its ability to "capture" (learn) the optimal behavior in the given environment. Typically, the agent is built on neural networks and it is their approximation abilities that give rise to the above belief. From the standpoint of control engineering, however, reinforcement learning has serious deficiencies. The most significant one is the lack of stability guarantee of the agent-environment closed loop. A great deal of research was and is being made towards stabilizing reinforcement learning. Speaking of stability, the celebrated Lyapunov theory is the de facto tool. It is thus no wonder that so many techniques of stabilizing reinforcement learning rely on the Lyapunov theory in one way or another. In control theory, there is an intricate connection between a stabilizing controller and a Lyapunov function. Employing such a pair seems thus quite attractive to design stabilizing reinforcement learning. However, computation of a Lyapunov function is generally a cumbersome process. In this note, we show how to construct a stabilizing reinforcement learning agent that does not employ such a function at all. We only assume that a Lyapunov function exists, which is a natural thing to do if the given system (read: environment) is stabilizable, but we do not need to compute one.
    Learning to Get Up. (arXiv:2205.00307v2 [cs.GR] UPDATED)
    Getting up from an arbitrary fallen state is a basic human skill. Existing methods for learning this skill often generate highly dynamic and erratic get-up motions, which do not resemble human get-up strategies, or are based on tracking recorded human get-up motions. In this paper, we present a staged approach using reinforcement learning, without recourse to motion capture data. The method first takes advantage of a strong character model, which facilitates the discovery of solution modes. A second stage then learns to adapt the control policy to work with progressively weaker versions of the character. Finally, a third stage learns control policies that can reproduce the weaker get-up motions at much slower speeds. We show that across multiple runs, the method can discover a diverse variety of get-up strategies, and execute them at a variety of speeds. The results usually produce policies that use a final stand-up strategy that is common to the recovery motions seen from all initial states. However, we also find policies for which different strategies are seen for prone and supine initial fallen states. The learned get-up control strategies often have significant static stability, i.e., they can be paused at a variety of points during the get-up motion. We further test our method on novel constrained scenarios, such as having a leg and an arm in a cast.
    Developing hierarchical anticipations via neural network-based event segmentation. (arXiv:2206.02042v2 [cs.LG] UPDATED)
    Humans can make predictions on various time scales and hierarchical levels. Thereby, the learning of event encodings seems to play a crucial role. In this work we model the development of hierarchical predictions via autonomously learned latent event codes. We present a hierarchical recurrent neural network architecture, whose inductive learning biases foster the development of sparsely changing latent state that compress sensorimotor sequences. A higher level network learns to predict the situations in which the latent states tend to change. Using a simulated robotic manipulator, we demonstrate that the system (i) learns latent states that accurately reflect the event structure of the data, (ii) develops meaningful temporal abstract predictions on the higher level, and (iii) generates goal-anticipatory behavior similar to gaze behavior found in eye-tracking studies with infants. The architecture offers a step towards the autonomous learning of compressed hierarchical encodings of gathered experiences and the exploitation of these encodings to generate adaptive behavior.
    Improved and Interpretable Defense to Transferred Adversarial Examples by Jacobian Norm with Selective Input Gradient Regularization. (arXiv:2207.13036v3 [cs.LG] UPDATED)
    Deep neural networks (DNNs) are known to be vulnerable to adversarial examples that are crafted with imperceptible perturbations, i.e., a small change in an input image can induce a mis-classification, and thus threatens the reliability of deep learning based deployment systems. Adversarial training (AT) is often adopted to improve robustness through training a mixture of corrupted and clean data. However, most of AT based methods are ineffective in dealing with transferred adversarial examples which are generated to fool a wide spectrum of defense models, and thus cannot satisfy the generalization requirement raised in real-world scenarios. Moreover, adversarially training a defense model in general cannot produce interpretable predictions towards the inputs with perturbations, whilst a highly interpretable robust model is required by different domain experts to understand the behaviour of a DNN. In this work, we propose a novel approach based on Jacobian norm and Selective Input Gradient Regularization (J-SIGR), which suggests the linearized robustness through Jacobian normalization and also regularizes the perturbation-based saliency maps to imitate the model's interpretable predictions. As such, we achieve both the improved defense and high interpretability of DNNs. Finally, we evaluate our method across different architectures against powerful adversarial attacks. Experiments demonstrate that the proposed J-SIGR confers improved robustness against transferred adversarial attacks, and we also show that the predictions from the neural network are easy to interpret.
    Node-Level Differentially Private Graph Neural Networks. (arXiv:2111.15521v3 [cs.LG] UPDATED)
    Graph Neural Networks (GNNs) are a popular technique for modelling graph-structured data and computing node-level representations via aggregation of information from the neighborhood of each node. However, this aggregation implies an increased risk of revealing sensitive information, as a node can participate in the inference for multiple nodes. This implies that standard privacy-preserving machine learning techniques, such as differentially private stochastic gradient descent (DP-SGD) - which are designed for situations where each data point participates in the inference for one point only - either do not apply, or lead to inaccurate models. In this work, we formally define the problem of learning GNN parameters with node-level privacy, and provide an algorithmic solution with a strong differential privacy guarantee. We employ a careful sensitivity analysis and provide a non-trivial extension of the privacy-by-amplification technique to the GNN setting. An empirical evaluation on standard benchmark datasets demonstrates that our method is indeed able to learn accurate privacy-preserving GNNs which outperform both private and non-private methods that completely ignore graph information.
    Low-degree learning and the metric entropy of polynomials. (arXiv:2203.09659v2 [cs.LG] UPDATED)
    Let $\mathscr{F}_{n,d}$ be the class of all functions $f:\{-1,1\}^n\to[-1,1]$ on the $n$-dimensional discrete hypercube of degree at most $d$. In the first part of this paper, we prove that any (deterministic or randomized) algorithm which learns $\mathscr{F}_{n,d}$ with $L_2$-accuracy $\varepsilon$ requires at least $\Omega((1-\sqrt{\varepsilon})2^d\log n)$ queries for large enough $n$, thus establishing the sharpness as $n\to\infty$ of a recent upper bound of Eskenazis and Ivanisvili (2021). To do this, we show that the $L_2$-packing numbers $\mathsf{M}(\mathscr{F}_{n,d},\|\cdot\|_{L_2},\varepsilon)$ of the concept class $\mathscr{F}_{n,d}$ satisfy the two-sided estimate $$c(1-\varepsilon)2^d\log n \leq \log \mathsf{M}(\mathscr{F}_{n,d},\|\cdot\|_{L_2},\varepsilon) \leq \frac{2^{Cd}\log n}{\varepsilon^4}$$ for large enough $n$, where $c, C>0$ are universal constants. In the second part of the paper, we present a logarithmic upper bound for the randomized query complexity of classes of bounded approximate polynomials whose Fourier spectra are concentrated on few subsets. As an application, we prove new estimates for the number of random queries required to learn approximate juntas of a given degree, functions with rapidly decaying Fourier tails and constant depth circuits of given size. Finally, we obtain bounds for the number of queries required to learn the polynomial class $\mathscr{F}_{n,d}$ without error in the query and random example models.
    Towards Interpretable Deep Reinforcement Learning Models via Inverse Reinforcement Learning. (arXiv:2203.16464v2 [cs.LG] UPDATED)
    Artificial intelligence, particularly through recent advancements in deep learning, has achieved exceptional performances in many tasks in fields such as natural language processing and computer vision. In addition to desirable evaluation metrics, a high level of interpretability is often required for these models to be reliably utilized. Therefore, explanations that offer insight into the process by which a model maps its inputs onto its outputs are much sought-after. Unfortunately, the current black box nature of machine learning models is still an unresolved issue and this very nature prevents researchers from learning and providing explicative descriptions for a model's behavior and final predictions. In this work, we propose a novel framework utilizing Adversarial Inverse Reinforcement Learning that can provide global explanations for decisions made by a Reinforcement Learning model and capture intuitive tendencies that the model follows by summarizing the model's decision-making process.
    Equivariant Mesh Attention Networks. (arXiv:2205.10662v2 [cs.LG] UPDATED)
    Equivariance to symmetries has proven to be a powerful inductive bias in deep learning research. Recent works on mesh processing have concentrated on various kinds of natural symmetries, including translations, rotations, scaling, node permutations, and gauge transformations. To date, no existing architecture is equivariant to all of these transformations. In this paper, we present an attention-based architecture for mesh data that is provably equivariant to all transformations mentioned above. Our pipeline relies on the use of relative tangential features: a simple, effective, equivariance-friendly alternative to raw node positions as inputs. Experiments on the FAUST and TOSCA datasets confirm that our proposed architecture achieves improved performance on these benchmarks and is indeed equivariant, and therefore robust, to a wide variety of local/global transformations.
    Learning High-Dimensional McKean-Vlasov Forward-Backward Stochastic Differential Equations with General Distribution Dependence. (arXiv:2204.11924v2 [math.OC] UPDATED)
    One of the core problems in mean-field control and mean-field games is to solve the corresponding McKean-Vlasov forward-backward stochastic differential equations (MV-FBSDEs). Most existing methods are tailored to special cases in which the mean-field interaction only depends on expectation or other moments and thus inadequate to solve problems when the mean-field interaction has full distribution dependence. In this paper, we propose a novel deep learning method for computing MV-FBSDEs with a general form of mean-field interactions. Specifically, built on fictitious play, we recast the problem into repeatedly solving standard FBSDEs with explicit coefficient functions. These coefficient functions are used to approximate the MV-FBSDEs' model coefficients with full distribution dependence, and are updated by solving another supervising learning problem using training data simulated from the last iteration's FBSDE solutions. We use deep neural networks to solve standard BSDEs and approximate coefficient functions in order to solve high-dimensional MV-FBSDEs. Under proper assumptions on the learned functions, we prove that the convergence of the proposed method is free of the curse of dimensionality (CoD) by using a class of integral probability metrics previously developed in [Han, Hu and Long, arXiv:2104.12036]. The proved theorem shows the advantage of the method in high dimensions. We present the numerical performance in high-dimensional MV-FBSDE problems, including a mean-field game example of the well-known Cucker-Smale model whose cost depends on the full distribution of the forward process.
    Personalized PageRank Graph Attention Networks. (arXiv:2205.14259v2 [cs.LG] UPDATED)
    There has been a rising interest in graph neural networks (GNNs) for representation learning over the past few years. GNNs provide a general and efficient framework to learn from graph-structured data. However, GNNs typically only use the information of a very limited neighborhood for each node to avoid over-smoothing. A larger neighborhood would be desirable to provide the model with more information. In this work, we incorporate the limit distribution of Personalized PageRank (PPR) into graph attention networks (GATs) to reflect the larger neighbor information without introducing over-smoothing. Intuitively, message aggregation based on Personalized PageRank corresponds to infinitely many neighborhood aggregation layers. We show that our models outperform a variety of baseline models for four widely used benchmark datasets. Our implementation is publicly available online.
    Sum-of-Squares Relaxations for Information Theory and Variational Inference. (arXiv:2206.13285v2 [cs.IT] UPDATED)
    We consider extensions of the Shannon relative entropy, referred to as f-divergences. Three classical related computational problems are typically associated with these divergences: (a) estimation from moments, (b) computing normalizing integrals, and (c) variational inference in probabilistic models. These problems are related to one another through convex duality, and for all them, there are many applications throughout data science, and we aim for computationally tractable approximation algorithms that preserve properties of the original problem such as potential convexity or monotonicity. In order to achieve this, we derive a sequence of convex relaxations for computing these divergences from non-centered covariance matrices associated with a given feature vector: starting from the typically non-tractable optimal lower-bound, we consider an additional relaxation based on "sums-of-squares", which is is now computable in polynomial time as a semidefinite program, as well as further computationally more efficient relaxations based on spectral information divergences from quantum information theory. For all of the tasks above, beyond proposing new relaxations, we derive tractable algorithms based on augmented Lagrangians and first-order methods, and we present illustrations on multivariate trigonometric polynomials and functions on the Boolean hypercube.
    Approximation of quantum control correction scheme using deep neural networks. (arXiv:1803.05193v2 [quant-ph] CROSS LISTED)
    We study the functional relationship between quantum control pulses in the idealized case and the pulses in the presence of an unwanted drift. We show that a class of artificial neural networks called LSTM is able to model this functional relationship with high efficiency, and hence the correction scheme required to counterbalance the effect of the drift. Our solution allows studying the mapping from quantum control pulses to system dynamics and then analysing the robustness of the latter against local variations in the control profile.
    Unsupervised Scale-Invariant Multispectral Shape Matching. (arXiv:2012.10685v2 [cs.CV] UPDATED)
    Alignment between non-rigid stretchable structures is one of the most challenging tasks in computer vision, as the invariant properties are hard to define, and there is no labeled data for real datasets. We present unsupervised neural network architecture based upon the spectral domain of scale-invariant geometry. We build on top of the functional maps architecture, but show that learning local features, as done until now, is not enough once the isometry assumption breaks. We demonstrate the use of multiple scale-invariant geometries for solving this problem. Our method is agnostic to local-scale deformations and shows superior performance for matching shapes from different domains when compared to existing spectral state-of-the-art solutions.
    Variance estimation in graphs with the fused lasso. (arXiv:2207.12638v2 [math.ST] UPDATED)
    We study the problem of variance estimation in general graph-structured problems. First, we develop a linear time estimator for the homoscedastic case that can consistently estimate the variance in general graphs. We show that our estimator attains minimax rates for the chain and 2D grid graphs when the mean signal has a total variation with canonical scaling. Furthermore, we provide general upper bounds on the mean squared error performance of the fused lasso estimator in general graphs under a moment condition and a bound on the tail behavior of the errors. These upper bounds allow us to generalize for broader classes of distributions, such as sub-Exponential, many existing results on the fused lasso that are only known to hold with the assumption that errors are sub-Gaussian random variables. Exploiting our upper bounds, we then study a simple total variation regularization estimator for estimating the signal of variances in the heteroscedastic case. Our results show that the variance estimator attains minimax rates for estimating signals of bounded variation in grid graphs, $K$-nearest neighbor graphs with very mild assumptions, and it is consistent for estimating the variances in any connected graph. In addition, extensive numerical results show that our proposed estimators perform reasonably well in a variety of graph-structured models.
    Quantum Differential Privacy: An Information Theory Perspective. (arXiv:2202.10717v2 [quant-ph] UPDATED)
    Differential privacy has been an exceptionally successful concept when it comes to providing provable security guarantees for classical computations. More recently, the concept was generalized to quantum computations. While classical computations are essentially noiseless and differential privacy is often achieved by artificially adding noise, near-term quantum computers are inherently noisy and it was observed that this leads to natural differential privacy as a feature. In this work we discuss quantum differential privacy in an information theoretic framework by casting it as a quantum divergence. A main advantage of this approach is that differential privacy becomes a property solely based on the output states of the computation, without the need to check it for every measurement. This leads to simpler proofs and generalized statements of its properties as well as several new bounds for both, general and specific, noise models. In particular, these include common representations of quantum circuits and quantum machine learning concepts. Here, we focus on the difference in the amount of noise required to achieve certain levels of differential privacy versus the amount that would make any computation useless. Finally, we also generalize the classical concepts of local differential privacy, R\'enyi differential privacy and the hypothesis testing interpretation to the quantum setting, providing several new properties and insights.
    A Dynamic Mode Decomposition Approach for Decentralized Spectral Clustering of Graphs. (arXiv:2203.00004v2 [cs.LG] UPDATED)
    We propose a novel robust decentralized graph clustering algorithm that is provably equivalent to the popular spectral clustering approach. Our proposed method uses the existing wave equation clustering algorithm that is based on propagating waves through the graph. However, instead of using a fast Fourier transform (FFT) computation at every node, our proposed approach exploits the Koopman operator framework. Specifically, we show that propagating waves in the graph followed by a local dynamic mode decomposition (DMD) computation at every node is capable of retrieving the eigenvalues and the local eigenvector components of the graph Laplacian, thereby providing local cluster assignments for all nodes. We demonstrate that the DMD computation is more robust than the existing FFT based approach and requires 20 times fewer steps of the wave equation to accurately recover the clustering information and reduces the relative error by orders of magnitude. We demonstrate the decentralized approach on a range of graph clustering problems.
    Comparing two samples through stochastic dominance: a graphical approach. (arXiv:2203.07889v3 [stat.ML] UPDATED)
    Non-deterministic measurements are common in real-world scenarios: the performance of a stochastic optimization algorithm or the total reward of a reinforcement learning agent in a chaotic environment are just two examples in which unpredictable outcomes are common. These measures can be modeled as random variables and compared among each other via their expected values or more sophisticated tools such as null hypothesis statistical tests. In this paper, we propose an alternative framework to visually compare two samples according to their estimated cumulative distribution functions. First, we introduce a dominance measure for two random variables that quantifies the proportion in which the cumulative distribution function of one of the random variables scholastically dominates the other one. Then, we present a graphical method that decomposes in quantiles i) the proposed dominance measure and ii) the probability that one of the random variables takes lower values than the other. With illustrative purposes, we re-evaluate the experimentation of an already published work with the proposed methodology and we show that additional conclusions (missed by the rest of the methods) can be inferred. Additionally, the software package RVCompare was created as a convenient way of applying and experimenting with the proposed framework.
    Social Processes: Self-Supervised Meta-Learning over Conversational Groups for Forecasting Nonverbal Social Cues. (arXiv:2107.13576v3 [cs.LG] UPDATED)
    Free-standing social conversations constitute a yet underexplored setting for human behavior forecasting. While the task of predicting pedestrian trajectories has received much recent attention, an intrinsic difference between these settings is how groups form and disband. Evidence from social psychology suggests that group members in a conversation explicitly self-organize to sustain the interaction by adapting to one another's behaviors. Crucially, the same individual is unlikely to adapt similarly across different groups; contextual factors such as perceived relationships, attraction, rapport, etc., influence the entire spectrum of participants' behaviors. A question arises: how can we jointly forecast the mutually dependent futures of conversation partners by modeling the dynamics unique to every group? In this paper, we propose the Social Process (SP) models, taking a novel meta-learning and stochastic perspective of group dynamics. Training group-specific forecasting models hinders generalization to unseen groups and is challenging given limited conversation data. In contrast, our SP models treat interaction sequences from a single group as a meta-dataset: we condition forecasts for a sequence from a given group on other observed-future sequence pairs from the same group. In this way, an SP model learns to adapt its forecasts to the unique dynamics of the interacting partners, generalizing to unseen groups in a data-efficient manner. Additionally, we first rethink the task formulation itself, motivating task requirements from social science literature that prior formulations have overlooked. For our formulation of Social Cue Forecasting, we evaluate the empirical performance of our SP models against both non-meta-learning and meta-learning approaches with similar assumptions. The SP models yield improved performance on synthetic and real-world behavior datasets.
    Frame Averaging for Equivariant Shape Space Learning. (arXiv:2112.01741v2 [cs.CV] UPDATED)
    The task of shape space learning involves mapping a train set of shapes to and from a latent representation space with good generalization properties. Often, real-world collections of shapes have symmetries, which can be defined as transformations that do not change the essence of the shape. A natural way to incorporate symmetries in shape space learning is to ask that the mapping to the shape space (encoder) and mapping from the shape space (decoder) are equivariant to the relevant symmetries. In this paper, we present a framework for incorporating equivariance in encoders and decoders by introducing two contributions: (i) adapting the recent Frame Averaging (FA) framework for building generic, efficient, and maximally expressive Equivariant autoencoders; and (ii) constructing autoencoders equivariant to piecewise Euclidean motions applied to different parts of the shape. To the best of our knowledge, this is the first fully piecewise Euclidean equivariant autoencoder construction. Training our framework is simple: it uses standard reconstruction losses and does not require the introduction of new losses. Our architectures are built of standard (backbone) architectures with the appropriate frame averaging to make them equivariant. Testing our framework on both rigid shapes dataset using implicit neural representations, and articulated shape datasets using mesh-based neural networks show state-of-the-art generalization to unseen test shapes, improving relevant baselines by a large margin. In particular, our method demonstrates significant improvement in generalizing to unseen articulated poses.
    A Graph-based U-Net Model for Predicting Traffic in unseen Cities. (arXiv:2202.06725v4 [cs.LG] UPDATED)
    Accurate traffic prediction is a key ingredient to enable traffic management like rerouting cars to reduce road congestion or regulating traffic via dynamic speed limits to maintain a steady flow. A way to represent traffic data is in the form of temporally changing heatmaps visualizing attributes of traffic, such as speed and volume. In recent works, U-Net models have shown SOTA performance on traffic forecasting from heatmaps. We propose to combine the U-Net architecture with graph layers which improves spatial generalization to unseen road networks compared to a Vanilla U-Net. In particular, we specialize existing graph operations to be sensitive to geographical topology and generalize pooling and upsampling operations to be applicable to graphs.
    Theoretical Analysis of Deep Neural Networks in Physical Layer Communication. (arXiv:2202.09954v2 [eess.SP] UPDATED)
    Recently, deep neural network (DNN)-based physical layer communication techniques have attracted considerable interest. Although their potential to enhance communication systems and superb performance have been validated by simulation experiments, little attention has been paid to the theoretical analysis. Specifically, most studies in the physical layer have tended to focus on the application of DNN models to wireless communication problems but not to theoretically understand how does a DNN work in a communication system. In this paper, we aim to quantitatively analyze why DNNs can achieve comparable performance in the physical layer comparing with traditional techniques, and also drive their cost in terms of computational complexity. To achieve this goal, we first analyze the encoding performance of a DNN-based transmitter and compare it to a traditional one. And then, we theoretically analyze the performance of DNN-based estimator and compare it with traditional estimators. Third, we investigate and validate how information is flown in a DNN-based communication system under the information theoretic concepts. Our analysis develops a concise way to open the "black box" of DNNs in physical layer communication, which can be applied to support the design of DNN-based intelligent communication techniques and help to provide explainable performance assessment.
    Mesa: A Memory-saving Training Framework for Transformers. (arXiv:2111.11124v3 [cs.CV] UPDATED)
    There has been an explosion of interest in designing high-performance Transformers. While Transformers have delivered significant performance improvements, training such networks is extremely memory intensive owing to storing all intermediate activations that are needed for gradient computation during backpropagation, especially for long sequences. To this end, we present Mesa, a memory-saving training framework for Transformers. Specifically, Mesa uses exact activations during forward pass while storing a low-precision version of activations to reduce memory consumption during training. The low-precision activations are then dequantized during back-propagation to compute gradients. Besides, to address the heterogeneous activation distributions in the multi-head self-attention layers, we propose a head-wise activation quantization strategy, which quantizes activations based on the statistics of each head to minimize the approximation error. To further boost training efficiency, we learn quantization parameters by running estimates. More importantly, by re-investing the saved memory in employing a larger batch size or scaling up model size, we may further improve the performance under constrained computational resources. Extensive experiments on ImageNet, CIFAR-100 and ADE20K demonstrate that Mesa can achieve flexible memory-savings (up to 50%) during training while achieving comparable or even better performance. Code is available at https://github.com/ziplab/Mesa.
    Neural Network Approximation of Lipschitz Functions in High Dimensions with Applications to Inverse Problems. (arXiv:2208.13305v1 [stat.ML])
    The remarkable successes of neural networks in a huge variety of inverse problems have fueled their adoption in disciplines ranging from medical imaging to seismic analysis over the past decade. However, the high dimensionality of such inverse problems has simultaneously left current theory, which predicts that networks should scale exponentially in the dimension of the problem, unable to explain why the seemingly small networks used in these settings work as well as they do in practice. To reduce this gap between theory and practice, a general method for bounding the complexity required for a neural network to approximate a Lipschitz function on a high-dimensional set with a low-complexity structure is provided herein. The approach is based on the observation that the existence of a linear Johnson-Lindenstrauss embedding $\mathbf{A} \in \mathbb{R}^{d \times D}$ of a given high-dimensional set $\mathcal{S} \subset \mathbb{R}^D$ into a low dimensional cube $[-M,M]^d$ implies that for any Lipschitz function $f : \mathcal{S}\to \mathbb{R}^p$, there exists a Lipschitz function $g : [-M,M]^d \to \mathbb{R}^p$ such that $g(\mathbf{A}\mathbf{x}) = f(\mathbf{x})$ for all $\mathbf{x} \in \mathcal{S}$. Hence, if one has a neural network which approximates $g : [-M,M]^d \to \mathbb{R}^p$, then a layer can be added which implements the JL embedding $\mathbf{A}$ to obtain a neural network which approximates $f : \mathcal{S} \to \mathbb{R}^p$. By pairing JL embedding results along with results on approximation of Lipschitz functions by neural networks, one then obtains results which bound the complexity required for a neural network to approximate Lipschitz functions on high dimensional sets. The end result is a general theoretical framework which can then be used to better explain the observed empirical successes of smaller networks in a wider variety of inverse problems than current theory allows.
    Predicting Wind-Driven Spatial Deposition through Simulated Color Images using Deep Autoencoders. (arXiv:2202.01762v2 [cs.LG] UPDATED)
    For centuries, scientists have observed nature to understand the laws that govern the physical world. The traditional process of turning observations into physical understanding is slow. Imperfect models are constructed and tested to explain relationships in data. Powerful new algorithms can enable computers to learn physics by observing images and videos. Inspired by this idea, instead of training machine learning models using physical quantities, we used images, that is, pixel information. For this work, and as a proof of concept, the physics of interest are wind-driven spatial patterns. These phenomena include features in Aeolian dunes and volcanic ash deposition, wildfire smoke, and air pollution plumes. We use computer model simulations of spatial deposition patterns to approximate images from a hypothetical imaging device whose outputs are red, green, and blue (RGB) color images with channel values ranging from 0 to 255. In this paper, we explore deep convolutional neural network-based autoencoders to exploit relationships in wind-driven spatial patterns, which commonly occur in geosciences, and reduce their dimensionality. Reducing the data dimension size with an encoder enables training deep, fully connected neural network models linking geographic and meteorological scalar input quantities to the encoded space. Once this is achieved, full spatial patterns are reconstructed using the decoder. We demonstrate this approach on images of spatial deposition from a pollution source, where the encoder compresses the dimensionality to 0.02% of the original size, and the full predictive model performance on test data achieves an accuracy of 92%.
    On the Universal Transformation of Data-Driven Models to Control Systems. (arXiv:2102.04722v2 [math.OC] UPDATED)
    The advances in data science and machine learning have resulted in significant improvements regarding the modeling and simulation of nonlinear dynamical systems. It is nowadays possible to make accurate predictions of complex systems such as the weather, disease models or the stock market. Predictive methods are often advertised to be useful for control, but the specifics are frequently left unanswered due to the higher system complexity, the requirement of larger data sets and an increased modeling effort. In other words, surrogate modeling for autonomous systems is much easier than for control systems. In this paper we present the framework QuaSiModO (Quantization-Simulation-Modeling-Optimization) to transform arbitrary predictive models into control systems and thus render the tremendous advances in data-driven surrogate modeling accessible for control. Our main contribution is that we trade control efficiency by autonomizing the dynamics - which yields mixed-integer control problems - to gain access to arbitrary, ready-to-use autonomous surrogate modeling techniques. We then recover the complexity of the original problem by leveraging recent results from mixed-integer optimization. The advantages of QuaSiModO are a linear increase in data requirements with respect to the control dimension, performance guarantees that rely exclusively on the accuracy of the predictive model in use, and little prior knowledge requirements in control theory to solve complex control problems.
    Maximum-Likelihood Quantum State Tomography by Soft-Bayes. (arXiv:2012.15498v3 [cs.LG] UPDATED)
    Quantum state tomography (QST), the task of estimating an unknown quantum state given measurement outcomes, is essential to building reliable quantum computing devices. Whereas computing the maximum-likelihood (ML) estimate corresponds to solving a finite-sum convex optimization problem, the objective function is not smooth nor Lipschitz, so most existing convex optimization methods lack sample complexity guarantees; moreover, both the sample size and dimension grow exponentially with the number of qubits in a QST experiment, so a desired algorithm should be highly scalable with respect to the dimension and sample size, just like stochastic gradient descent. In this paper, we propose a stochastic first-order algorithm that computes an $\varepsilon$-approximate ML estimate in $O( ( D \log D ) / \varepsilon ^ 2 )$ iterations with $O( D^3 )$ per-iteration time complexity, where $D$ denotes the dimension of the unknown quantum state and $\varepsilon$ denotes the optimization error. Our algorithm is an extension of Soft-Bayes to the quantum setup.
    Learning Heterogeneous Interaction Strengths by Trajectory Prediction with Graph Neural Network. (arXiv:2208.13179v1 [cs.LG])
    Dynamical systems with interacting agents are universal in nature, commonly modeled by a graph of relationships between their constituents. Recently, various works have been presented to tackle the problem of inferring those relationships from the system trajectories via deep neural networks, but most of the studies assume binary or discrete types of interactions for simplicity. In the real world, the interaction kernels often involve continuous interaction strengths, which cannot be accurately approximated by discrete relations. In this work, we propose the relational attentive inference network (RAIN) to infer continuously weighted interaction graphs without any ground-truth interaction strengths. Our model employs a novel pairwise attention (PA) mechanism to refine the trajectory representations and a graph transformer to extract heterogeneous interaction weights for each pair of agents. We show that our RAIN model with the PA mechanism accurately infers continuous interaction strengths for simulated physical systems in an unsupervised manner. Further, RAIN with PA successfully predicts trajectories from motion capture data with an interpretable interaction graph, demonstrating the virtue of modeling unknown dynamics with continuous weights.
    Empirical Gateaux Derivatives for Causal Inference. (arXiv:2208.13701v1 [stat.ME])
    We study a constructive algorithm that approximates Gateaux derivatives for statistical functionals by finite-differencing, with a focus on causal inference functionals. We consider the case where probability distributions are not known a priori but also need to be estimated from data. These estimated distributions lead to empirical Gateaux derivatives, and we study the relationships between empirical, numerical, and analytical Gateaux derivatives. Starting with a case study of counterfactual mean estimation, we instantiate the exact relationship between finite-differences and the analytical Gateaux derivative. We then derive requirements on the rates of numerical approximation in perturbation and smoothing that preserve the statistical benefits of one-step adjustments, such as rate-double-robustness. We then study more complicated functionals such as dynamic treatment regimes and the linear-programming formulation for policy optimization in infinite-horizon Markov decision processes. The newfound ability to approximate bias adjustments in the presence of arbitrary constraints illustrates the usefulness of constructive approaches for Gateaux derivatives. We also find that the statistical structure of the functional (rate-double robustness) can permit less conservative rates of finite-difference approximation. This property, however, can be specific to particular functionals, e.g. it occurs for the counterfactual mean but not the infinite-horizon MDP policy value.
    Dealing with Expert Bias in Collective Decision-Making. (arXiv:2106.13539v2 [cs.AI] UPDATED)
    Quite some real-world problems can be formulated as decision-making problems wherein one must repeatedly make an appropriate choice from a set of alternatives. Multiple expert judgements, whether human or artificial, can help in taking correct decisions, especially when exploration of alternative solutions is costly. As expert opinions might deviate, the problem of finding the right alternative can be approached as a collective decision making problem (CDM) via aggregation of independent judgements. Current state-of-the-art approaches focus on efficiently finding the optimal expert, and thus perform poorly if all experts are not qualified or if they are overly biased, thereby potentially derailing the decision-making process. In this paper, we propose a new algorithmic approach based on contextual multi-armed bandit problems (CMAB) to identify and counteract such biased expertise. We explore homogeneous, heterogeneous and polarised expert groups and show that this approach is able to effectively exploit the collective expertise, outperforming state-of-the-art methods, especially when the quality of the provided expertise degrades. Our novel CMAB-inspired approach achieves a higher final performance and does so while converging more rapidly than previous adaptive algorithms.
    The PWLR Graph Representation: A Persistent Weisfeiler-Lehman scheme with Random Walks for Graph Classification. (arXiv:2208.13427v1 [cs.LG])
    This paper presents the Persistent Weisfeiler-Lehman Random walk scheme (abbreviated as PWLR) for graph representations, a novel mathematical framework which produces a collection of explainable low-dimensional representations of graphs with discrete and continuous node features. The proposed scheme effectively incorporates normalized Weisfeiler-Lehman procedure, random walks on graphs, and persistent homology. We thereby integrate three distinct properties of graphs, which are local topological features, node degrees, and global topological invariants, while preserving stability from graph perturbations. This generalizes many variants of Weisfeiler-Lehman procedures, which are primarily used to embed graphs with discrete node labels. Empirical results suggest that these representations can be efficiently utilized to produce comparable results to state-of-the-art techniques in classifying graphs with discrete node labels, and enhanced performances in classifying those with continuous node features.
    Online Learning in Iterated Prisoner's Dilemma to Mimic Human Behavior. (arXiv:2006.06580v3 [cs.GT] UPDATED)
    As an important psychological and social experiment, the Iterated Prisoner's Dilemma (IPD) treats the choice to cooperate or defect as an atomic action. We propose to study the behaviors of online learning algorithms in the Iterated Prisoner's Dilemma (IPD) game, where we investigate the full spectrum of reinforcement learning agents: multi-armed bandits, contextual bandits and reinforcement learning. We evaluate them based on a tournament of iterated prisoner's dilemma where multiple agents can compete in a sequential fashion. This allows us to analyze the dynamics of policies learned by multiple self-interested independent reward-driven agents, and also allows us study the capacity of these algorithms to fit the human behaviors. Results suggest that considering the current situation to make decision is the worst in this kind of social dilemma game. Multiples discoveries on online learning behaviors and clinical validations are stated, as an effort to connect artificial intelligence algorithms with human behaviors and their abnormal states in neuropsychiatric conditions.
    GSA-Forecaster: Forecasting Graph-Based Time-Dependent Data with Graph Sequence Attention. (arXiv:2104.05914v3 [cs.LG] UPDATED)
    Forecasting graph-based time-dependent data has many practical applications. This task is challenging as models need not only to capture spatial dependency and temporal dependency within the data, but also to leverage useful auxiliary information for accurate predictions. In this paper, we analyze limitations of state-of-the-art models on dealing with temporal dependency. To address this limitation, we propose GSA-Forecaster, a new deep learning model for forecasting graph-based time-dependent data. GSA-Forecaster leverages graph sequence attention (GSA), a new attention mechanism proposed in this paper, for effectively capturing temporal dependency. GSA-Forecaster embeds the graph structure of the data into its architecture to address spatial dependency. GSA-Forecaster also accounts for auxiliary information to further improve predictions. We evaluate GSA-Forecaster with large-scale real-world graph-based time-dependent data and demonstrate its effectiveness over state-of-the-art models with 6.7% RMSE and 5.8% MAPE reduction.
    TrojViT: Trojan Insertion in Vision Transformers. (arXiv:2208.13049v1 [cs.LG])
    Vision Transformers (ViTs) have demonstrated the state-of-the-art performance in various vision-related tasks. The success of ViTs motivates adversaries to perform backdoor attacks on ViTs. Although the vulnerability of traditional CNNs to backdoor attacks is well-known, backdoor attacks on ViTs are seldom-studied. Compared to CNNs capturing pixel-wise local features by convolutions, ViTs extract global context information through patches and attentions. Na\"ively transplanting CNN-specific backdoor attacks to ViTs yields only a low clean data accuracy and a low attack success rate. In this paper, we propose a stealth and practical ViT-specific backdoor attack $TrojViT$. Rather than an area-wise trigger used by CNN-specific backdoor attacks, TrojViT generates a patch-wise trigger designed to build a Trojan composed of some vulnerable bits on the parameters of a ViT stored in DRAM memory through patch salience ranking and attention-target loss. TrojViT further uses minimum-tuned parameter update to reduce the bit number of the Trojan. Once the attacker inserts the Trojan into the ViT model by flipping the vulnerable bits, the ViT model still produces normal inference accuracy with benign inputs. But when the attacker embeds a trigger into an input, the ViT model is forced to classify the input to a predefined target class. We show that flipping only few vulnerable bits identified by TrojViT on a ViT model using the well-known RowHammer can transform the model into a backdoored one. We perform extensive experiments of multiple datasets on various ViT models. TrojViT can classify $99.64\%$ of test images to a target class by flipping $345$ bits on a ViT for ImageNet.
    RUAD: unsupervised anomaly detection in HPC systems. (arXiv:2208.13169v1 [cs.LG])
    The increasing complexity of modern high-performance computing (HPC) systems necessitates the introduction of automated and data-driven methodologies to support system administrators' effort toward increasing the system's availability. Anomaly detection is an integral part of improving the availability as it eases the system administrator's burden and reduces the time between an anomaly and its resolution. However, current state-of-the-art (SoA) approaches to anomaly detection are supervised and semi-supervised, so they require a human-labelled dataset with anomalies - this is often impractical to collect in production HPC systems. Unsupervised anomaly detection approaches based on clustering, aimed at alleviating the need for accurate anomaly data, have so far shown poor performance. In this work, we overcome these limitations by proposing RUAD, a novel Recurrent Unsupervised Anomaly Detection model. RUAD achieves better results than the current semi-supervised and unsupervised SoA approaches. This is achieved by considering temporal dependencies in the data and including long-short term memory cells in the model architecture. The proposed approach is assessed on a complete ten-month history of a Tier-0 system (Marconi100 from CINECA with 980 nodes). RUAD achieves an area under the curve (AUC) of 0.763 in semi-supervised training and an AUC of 0.767 in unsupervised training, which improves upon the SoA approach that achieves an AUC of 0.747 in semi-supervised training and an AUC of 0.734 in unsupervised training. It also vastly outperforms the current SoA unsupervised anomaly detection approach based on clustering, achieving the AUC of 0.548.
    HGKT: Introducing Hierarchical Exercise Graph for Knowledge Tracing. (arXiv:2006.16915v6 [cs.CY] UPDATED)
    Knowledge tracing (KT) which aims at predicting learner's knowledge mastery plays an important role in the computer-aided educational system. In recent years, many deep learning models have been applied to tackle the KT task, which have shown promising results. However, limitations still exist. Most existing methods simplify the exercising records as knowledge sequences, which fail to explore rich information that existed in exercises. Besides, the existing diagnosis results of knowledge tracing are not convincing enough since they neglect prior relations between exercises. To solve the above problems, we propose a hierarchical graph knowledge tracing model called HGKT to explore the latent hierarchical relations between exercises. Specifically, we introduce the concept of problem schema to construct a hierarchical exercise graph that could model the exercise learning dependencies. Moreover, we employ two attention mechanisms to highlight the important historical states of learners. In the testing stage, we present a K&S diagnosis matrix that could trace the transition of mastery of knowledge and problem schema, which can be more easily applied to different applications. Extensive experiments show the effectiveness and interpretability of our proposed models.
    Online Bidding Algorithms for Return-on-Spend Constrained Advertisers. (arXiv:2208.13713v1 [cs.LG])
    Online advertising has recently grown into a highly competitive and complex multi-billion-dollar industry, with advertisers bidding for ad slots at large scales and high frequencies. This has resulted in a growing need for efficient "auto-bidding" algorithms that determine the bids for incoming queries to maximize advertisers' targets subject to their specified constraints. This work explores efficient online algorithms for a single value-maximizing advertiser under an increasingly popular constraint: Return-on-Spend (RoS). We quantify efficiency in terms of regret relative to the optimal algorithm, which knows all queries a priori. We contribute a simple online algorithm that achieves near-optimal regret in expectation while always respecting the specified RoS constraint when the input sequence of queries are i.i.d. samples from some distribution. We also integrate our results with the previous work of Balseiro, Lu, and Mirrokni [BLM20] to achieve near-optimal regret while respecting both RoS and fixed budget constraints. Our algorithm follows the primal-dual framework and uses online mirror descent (OMD) for the dual updates. However, we need to use a non-canonical setup of OMD, and therefore the classic low-regret guarantee of OMD, which is for the adversarial setting in online learning, no longer holds. Nonetheless, in our case and more generally where low-regret dynamics are applied in algorithm design, the gradients encountered by OMD can be far from adversarial but influenced by our algorithmic choices. We exploit this key insight to show our OMD setup achieves low regret in the realm of our algorithm.
    Rethinking and Improving Natural Language Generation with Layer-Wise Multi-View Decoding. (arXiv:2005.08081v7 [cs.CL] UPDATED)
    In sequence-to-sequence learning, e.g., natural language generation, the decoder relies on the attention mechanism to efficiently extract information from the encoder. While it is common practice to draw information from only the last encoder layer, recent work has proposed to use representations from different encoder layers for diversified levels of information. Nonetheless, the decoder still obtains only a single view of the source sequences, which might lead to insufficient training of the encoder layer stack due to the hierarchy bypassing problem. In this work, we propose layer-wise multi-view decoding, where for each decoder layer, together with the representations from the last encoder layer, which serve as a global view, those from other encoder layers are supplemented for a stereoscopic view of the source sequences. Systematic experiments and analyses show that we successfully address the hierarchy bypassing problem, require almost negligible parameter increase, and substantially improve the performance of sequence-to-sequence learning with deep representations on five diverse tasks, i.e., machine translation, abstractive summarization, image captioning, video captioning, medical report generation, and paraphrase generation. In particular, our approach achieves new state-of-the-art results on ten benchmark datasets, including a low-resource machine translation dataset and two low-resource medical report generation datasets.
    Optimising Placement of Pollution Sensors in Windy Environments. (arXiv:2012.10770v2 [cs.LG] UPDATED)
    Air pollution is one of the most important causes of mortality in the world. Monitoring air pollution is useful to learn more about the link between health and pollutants, and to identify areas for intervention. Such monitoring is expensive, so it is important to place sensors as efficiently as possible. Bayesian optimisation has proven useful in choosing sensor locations, but typically relies on kernel functions that neglect the statistical structure of air pollution, such as the tendency of pollution to propagate in the prevailing wind direction. We describe two new wind-informed kernels and investigate their advantage for the task of actively learning locations of maximum pollution using Bayesian optimisation.
    Generalization In Multi-Objective Machine Learning. (arXiv:2208.13499v1 [cs.LG])
    Modern machine learning tasks often require considering not just one but multiple objectives. For example, besides the prediction quality, this could be the efficiency, robustness or fairness of the learned models, or any of their combinations. Multi-objective learning offers a natural framework for handling such problems without having to commit to early trade-offs. Surprisingly, statistical learning theory so far offers almost no insight into the generalization properties of multi-objective learning. In this work, we make first steps to fill this gap: we establish foundational generalization bounds for the multi-objective setting as well as generalization and excess bounds for learning with scalarizations. We also provide the first theoretical analysis of the relation between the Pareto-optimal sets of the true objectives and the Pareto-optimal sets of their empirical approximations from training data. In particular, we show a surprising asymmetry: all Pareto-optimal solutions can be approximated by empirically Pareto-optimal ones, but not vice versa.
    Understanding the Limits of Poisoning Attacks in Episodic Reinforcement Learning. (arXiv:2208.13663v1 [cs.LG])
    To understand the security threats to reinforcement learning (RL) algorithms, this paper studies poisoning attacks to manipulate \emph{any} order-optimal learning algorithm towards a targeted policy in episodic RL and examines the potential damage of two natural types of poisoning attacks, i.e., the manipulation of \emph{reward} and \emph{action}. We discover that the effect of attacks crucially depend on whether the rewards are bounded or unbounded. In bounded reward settings, we show that only reward manipulation or only action manipulation cannot guarantee a successful attack. However, by combining reward and action manipulation, the adversary can manipulate any order-optimal learning algorithm to follow any targeted policy with $\tilde{\Theta}(\sqrt{T})$ total attack cost, which is order-optimal, without any knowledge of the underlying MDP. In contrast, in unbounded reward settings, we show that reward manipulation attacks are sufficient for an adversary to successfully manipulate any order-optimal learning algorithm to follow any targeted policy using $\tilde{O}(\sqrt{T})$ amount of contamination. Our results reveal useful insights about what can or cannot be achieved by poisoning attacks, and are set to spur more works on the design of robust RL algorithms.
    Information FOMO: The unhealthy fear of missing out on information. A method for removing misleading data for healthier models. (arXiv:2208.13080v1 [cs.LG])
    Not all data are equal. Misleading or unnecessary data can critically hinder the accuracy of Machine Learning (ML) models. When data is plentiful, misleading effects can be overcome, but in many real-world applications data is sparse and expensive to acquire. We present a method that substantially reduces the data size necessary to accurately train ML models, potentially opening the door for many new, limited-data applications in ML. Our method extracts the most informative data, while ignoring and omitting data that misleads the ML model to inferior generalization properties. Specifically, the method eliminates the phenomena of "double descent", where more data leads to worse performance. This approach brings several key features to the ML community. Notably, the method naturally converges and removes the traditional need to divide the dataset into training, testing, and validation data. Instead, the selection metric inherently assesses testing error. This ensures that key information is never wasted in testing or validation.
    Comprehensive study of good model training for prostate segmentation in volumetric MRI. (arXiv:2208.13671v1 [eess.IV])
    Prostate cancer was the third most common cancer in 2020 internationally, coming after breast cancer and lung cancer. Furthermore, in recent years prostate cancer has shown an increasing trend. According to clinical experience, if this problem is detected and treated early, there can be a high chance of survival for the patient. One task that helps diagnose prostate cancer is prostate segmentation from magnetic resonance imaging. Manual segmentation performed by clinical experts has its drawbacks such as: the high time and concentration required from observers; and inter- and intra-observer variability. This is why in recent years automatic approaches to segment a prostate based on convolutional neural networks have emerged. Many of them have novel proposed architectures. In this paper I make an exhaustive study of several deep learning models by adjusting them to the task of prostate prediction. I do not use novel architectures, but focus my work more on how to train the networks. My approach is based on a ResNext101 3D encoder and a Unet3D decoder. I provide a study of the importance of resolutions in resampling data, something that no one else has done before.
    A Variance-Reduced Stochastic Gradient Tracking Algorithm for Decentralized Optimization with Orthogonality Constraints. (arXiv:2208.13643v1 [math.OC])
    Decentralized optimization with orthogonality constraints is found widely in scientific computing and data science. Since the orthogonality constraints are nonconvex, it is quite challenging to design efficient algorithms. Existing approaches leverage the geometric tools from Riemannian optimization to solve this problem at the cost of high sample and communication complexities. To relieve this difficulty, based on two novel techniques that can waive the orthogonality constraints, we propose a variance-reduced stochastic gradient tracking (VRSGT) algorithm with the convergence rate of $O(1 / k)$ to a stationary point. To the best of our knowledge, VRSGT is the first algorithm for decentralized optimization with orthogonality constraints that reduces both sampling and communication complexities simultaneously. In the numerical experiments, VRSGT has a promising performance in a real-world autonomous driving application.
    Predicting IMDb Rating of TV Series with Deep Learning: The Case of Arrow. (arXiv:2208.13302v1 [cs.LG])
    Context: The number of TV series offered nowadays is very high. Due to its large amount, many series are canceled due to a lack of originality that generates a low audience. Problem: Having a decision support system that can show why some shows are a huge success or not would facilitate the choices of renewing or starting a show. Solution: We studied the case of the series Arrow broadcasted by CW Network and used descriptive and predictive modeling techniques to predict the IMDb rating. We assumed that the theme of the episode would affect its evaluation by users, so the dataset is composed only by the director of the episode, the number of reviews that episode got, the percentual of each theme extracted by the Latent Dirichlet Allocation (LDA) model of an episode, the number of viewers from Wikipedia and the rating from IMDb. The LDA model is a generative probabilistic model of a collection of documents made up of words. Method: In this prescriptive research, the case study method was used, and its results were analyzed using a quantitative approach. Summary of Results: With the features of each episode, the model that performed the best to predict the rating was Catboost due to a similar mean squared error of the KNN model but a better standard deviation during the test phase. It was possible to predict IMDb ratings with an acceptable root mean squared error of 0.55.
    Adversarial Robustness for Tabular Data through Cost and Utility Awareness. (arXiv:2208.13058v1 [cs.LG])
    Many machine learning problems use data in the tabular domains. Adversarial examples can be especially damaging for these applications. Yet, existing works on adversarial robustness mainly focus on machine-learning models in the image and text domains. We argue that due to the differences between tabular data and images or text, existing threat models are inappropriate for tabular domains. These models do not capture that cost can be more important than imperceptibility, nor that the adversary could ascribe different value to the utility obtained from deploying different adversarial examples. We show that due to these differences the attack and defence methods used for images and text cannot be directly applied to the tabular setup. We address these issues by proposing new cost and utility-aware threat models tailored to the adversarial capabilities and constraints of attackers targeting tabular domains. We introduce a framework that enables us to design attack and defence mechanisms which result in models protected against cost or utility-aware adversaries, e.g., adversaries constrained by a certain dollar budget. We show that our approach is effective on three tabular datasets corresponding to applications for which adversarial examples can have economic and social implications.
    On GANs perpetuating biases for face verification. (arXiv:2208.13061v1 [cs.CV])
    DeepLearningsystemsneedlargedatafortraining.Datasets for training face verification systems are difficult to obtain and prone to privacy issues. Synthetic data generated by generative models such as GANs can be a good alternative. However, we show that data generated from GANs are prone to bias and fairness issues. Specifically GANs trained on FFHQ dataset show bias towards generating white faces in the age group of 20-29. We also demonstrate that synthetic faces cause disparate impact, specifically for race attribute, when used for fine tuning face verification systems. This is measured using $DoB_{fv}$ metric, which is defined as standard deviation of GAR@FAR for face verification.
    PECAN: A Product-Quantized Content Addressable Memory Network. (arXiv:2208.13571v1 [cs.LG])
    A novel deep neural network (DNN) architecture is proposed wherein the filtering and linear transform are realized solely with product quantization (PQ). This results in a natural implementation via content addressable memory (CAM), which transcends regular DNN layer operations and requires only simple table lookup. Two schemes are developed for the end-to-end PQ prototype training, namely, through angle- and distance-based similarities, which differ in their multiplicative and additive natures with different complexity-accuracy tradeoffs. Even more, the distance-based scheme constitutes a truly multiplier-free DNN solution. Experiments confirm the feasibility of such Product-Quantized Content Addressable Memory Network (PECAN), which has strong implication on hardware-efficient deployments especially for in-memory computing.
    Generative Modelling of the Ageing Heart with Cross-Sectional Imaging and Clinical Data. (arXiv:2208.13146v1 [eess.IV])
    Cardiovascular disease, the leading cause of death globally, is an age-related disease. Understanding the morphological and functional changes of the heart during ageing is a key scientific question, the answer to which will help us define important risk factors of cardiovascular disease and monitor disease progression. In this work, we propose a novel conditional generative model to describe the changes of 3D anatomy of the heart during ageing. The proposed model is flexible and allows integration of multiple clinical factors (e.g. age, gender) into the generating process. We train the model on a large-scale cross-sectional dataset of cardiac anatomies and evaluate on both cross-sectional and longitudinal datasets. The model demonstrates excellent performance in predicting the longitudinal evolution of the ageing heart and modelling its data distribution.
    Smooth Monotone Stochastic Variational Inequalities and Saddle Point Problems -- Survey. (arXiv:2208.13592v1 [math.OC])
    This paper is a survey of methods for solving smooth (strongly) monotone stochastic variational inequalities. To begin with, we give the deterministic foundation from which the stochastic methods eventually evolved. Then we review methods for the general stochastic formulation, and look at the finite sum setup. The last parts of the paper are devoted to various recent (not necessarily stochastic) advances in algorithms for variational inequalities.
    Open-Set Semi-Supervised Object Detection. (arXiv:2208.13722v1 [cs.CV])
    Recent developments for Semi-Supervised Object Detection (SSOD) have shown the promise of leveraging unlabeled data to improve an object detector. However, thus far these methods have assumed that the unlabeled data does not contain out-of-distribution (OOD) classes, which is unrealistic with larger-scale unlabeled datasets. In this paper, we consider a more practical yet challenging problem, Open-Set Semi-Supervised Object Detection (OSSOD). We first find the existing SSOD method obtains a lower performance gain in open-set conditions, and this is caused by the semantic expansion, where the distracting OOD objects are mispredicted as in-distribution pseudo-labels for the semi-supervised training. To address this problem, we consider online and offline OOD detection modules, which are integrated with SSOD methods. With the extensive studies, we found that leveraging an offline OOD detector based on a self-supervised vision transformer performs favorably against online OOD detectors due to its robustness to the interference of pseudo-labeling. In the experiment, our proposed framework effectively addresses the semantic expansion issue and shows consistent improvements on many OSSOD benchmarks, including large-scale COCO-OpenImages. We also verify the effectiveness of our framework under different OSSOD conditions, including varying numbers of in-distribution classes, different degrees of supervision, and different combinations of unlabeled sets.
    Shaken, and Stirred: Long-Range Dependencies Enable Robust Outlier Detection with PixelCNN++. (arXiv:2208.13579v1 [cs.LG])
    Reliable outlier detection is critical for real-world applications of deep learning models. Likelihoods produced by deep generative models, although extensively studied, have been largely dismissed as being impractical for outlier detection. For one, deep generative model likelihoods are readily biased by low-level input statistics. Second, many recent solutions for correcting these biases are computationally expensive or do not generalize well to complex, natural datasets. Here, we explore outlier detection with a state-of-the-art deep autoregressive model: PixelCNN++. We show that biases in PixelCNN++ likelihoods arise primarily from predictions based on local dependencies. We propose two families of bijective transformations that we term "shaking" and "stirring", which ameliorate low-level biases and isolate the contribution of long-range dependencies to the PixelCNN++ likelihood. These transformations are computationally inexpensive and readily applied at evaluation time. We evaluate our approaches extensively with five grayscale and six natural image datasets and show that they achieve or exceed state-of-the-art outlier detection performance. In sum, lightweight remedies suffice to achieve robust outlier detection on images with deep generative models.
    Fluorescence molecular optomic signatures improve identification of tumors in head and neck specimens. (arXiv:2208.13314v1 [cs.LG])
    In this study, a radiomics approach was extended to optical fluorescence molecular imaging data for tissue classification, termed 'optomics'. Fluorescence molecular imaging is emerging for precise surgical guidance during head and neck squamous cell carcinoma (HNSCC) resection. However, the tumor-to-normal tissue contrast is confounded by intrinsic physiological limitations of heterogeneous expression of the target molecule, epidermal growth factor receptor (EGFR). Optomics seek to improve tumor identification by probing textural pattern differences in EGFR expression conveyed by fluorescence. A total of 1,472 standardized optomic features were extracted from fluorescence image samples. A supervised machine learning pipeline involving a support vector machine classifier was trained with 25 top-ranked features selected by minimum redundancy maximum relevance criterion. Model predictive performance was compared to fluorescence intensity thresholding method by classifying testing set image patches of resected tissue with histologically confirmed malignancy status. The optomics approach provided consistent improvement in prediction accuracy on all test set samples, irrespective of dose, compared to fluorescence intensity thresholding method (mean accuracies of 89% vs. 81%; P = 0.0072). The improved performance demonstrates that extending the radiomics approach to fluorescence molecular imaging data offers a promising image analysis technique for cancer detection in fluorescence-guided surgery.
    Billion-user Customer Lifetime Value Prediction: An Industrial-scale Solution from Kuaishou. (arXiv:2208.13358v1 [cs.LG])
    Customer Life Time Value (LTV) is the expected total revenue that a single user can bring to a business. It is widely used in a variety of business scenarios to make operational decisions when acquiring new customers. Modeling LTV is a challenging problem, due to its complex and mutable data distribution. Existing approaches either directly learn from posterior feature distributions or leverage statistical models that make strong assumption on prior distributions, both of which fail to capture those mutable distributions. In this paper, we propose a complete set of industrial-level LTV modeling solutions. Specifically, we introduce an Order Dependency Monotonic Network (ODMN) that models the ordered dependencies between LTVs of different time spans, which greatly improves model performance. We further introduce a Multi Distribution Multi Experts (MDME) module based on the Divide-and-Conquer idea, which transforms the severely imbalanced distribution modeling problem into a series of relatively balanced sub-distribution modeling problems hence greatly reduces the modeling complexity. In addition, a novel evaluation metric Mutual Gini is introduced to better measure the distribution difference between the estimated value and the ground-truth label based on the Lorenz Curve. The ODMN framework has been successfully deployed in many business scenarios of Kuaishou, and achieved great performance. Extensive experiments on real-world industrial data demonstrate the superiority of the proposed methods compared to state-of-the-art baselines including ZILN and Two-Stage XGBoost models.
    Neural Enhancement of Factor Graph-based Symbol Detection. (arXiv:2203.03333v2 [cs.IT] UPDATED)
    We study the application of the factor graph framework for symbol detection on linear inter-symbol interference channels. Cyclic factor graphs have the potential to yield low-complexity symbol detectors, but are suboptimal if the ubiquitous sum-product algorithm is applied. In this paper, we present and evaluate strategies to improve the performance of cyclic factor graph-based symbol detection algorithms by means of neural enhancement. In particular, we apply neural belief propagation as an effective way to counteract the effect of cycles within the factor graph. We further propose the application and optimization of a linear preprocessor of the channel output. By modifying the observation model, the preprocessing can effectively change the underlying factor graph, thereby significantly improving the detection performance as well as reducing the complexity.
    Cross-domain Cross-architecture Black-box Attacks on Fine-tuned Models with Transferred Evolutionary Strategies. (arXiv:2208.13182v1 [cs.LG])
    Fine-tuning can be vulnerable to adversarial attacks. Existing works about black-box attacks on fine-tuned models (BAFT) are limited by strong assumptions. To fill the gap, we propose two novel BAFT settings, cross-domain and cross-domain cross-architecture BAFT, which only assume that (1) the target model for attacking is a fine-tuned model, and (2) the source domain data is known and accessible. To successfully attack fine-tuned models under both settings, we propose to first train an adversarial generator against the source model, which adopts an encoder-decoder architecture and maps a clean input to an adversarial example. Then we search in the low-dimensional latent space produced by the encoder of the adversarial generator. The search is conducted under the guidance of the surrogate gradient obtained from the source model. Experimental results on different domains and different network architectures demonstrate that the proposed attack method can effectively and efficiently attack the fine-tuned models.
    Network-Level Adversaries in Federated Learning. (arXiv:2208.12911v1 [cs.CR])
    Federated learning is a popular strategy for training models on distributed, sensitive data, while preserving data privacy. Prior work identified a range of security threats on federated learning protocols that poison the data or the model. However, federated learning is a networked system where the communication between clients and server plays a critical role for the learning task performance. We highlight how communication introduces another vulnerability surface in federated learning and study the impact of network-level adversaries on training federated learning models. We show that attackers dropping the network traffic from carefully selected clients can significantly decrease model accuracy on a target population. Moreover, we show that a coordinated poisoning campaign from a few clients can amplify the dropping attacks. Finally, we develop a server-side defense which mitigates the impact of our attacks by identifying and up-sampling clients likely to positively contribute towards target accuracy. We comprehensively evaluate our attacks and defenses on three datasets, assuming encrypted communication channels and attackers with partial visibility of the network.
    Latent Heterogeneous Graph Network for Incomplete Multi-View Learning. (arXiv:2208.13669v1 [cs.LG])
    Multi-view learning has progressed rapidly in recent years. Although many previous studies assume that each instance appears in all views, it is common in real-world applications for instances to be missing from some views, resulting in incomplete multi-view data. To tackle this problem, we propose a novel Latent Heterogeneous Graph Network (LHGN) for incomplete multi-view learning, which aims to use multiple incomplete views as fully as possible in a flexible manner. By learning a unified latent representation, a trade-off between consistency and complementarity among different views is implicitly realized. To explore the complex relationship between samples and latent representations, a neighborhood constraint and a view-existence constraint are proposed, for the first time, to construct a heterogeneous graph. Finally, to avoid any inconsistencies between training and test phase, a transductive learning technique is applied based on graph learning for classification tasks. Extensive experimental results on real-world datasets demonstrate the effectiveness of our model over existing state-of-the-art approaches.
    Efficient liver segmentation with 3D CNN using computed tomography scans. (arXiv:2208.13271v1 [eess.IV])
    The liver is one of the most critical metabolic organs in vertebrates due to its vital functions in the human body, such as detoxification of the blood from waste products and medications. Liver diseases due to liver tumors are one of the most common mortality reasons around the globe. Hence, detecting liver tumors in the early stages of tumor development is highly required as a critical part of medical treatment. Many imaging modalities can be used as aiding tools to detect liver tumors. Computed tomography (CT) is the most used imaging modality for soft tissue organs such as the liver. This is because it is an invasive modality that can be captured relatively quickly. This paper proposed an efficient automatic liver segmentation framework to detect and segment the liver out of CT abdomen scans using the 3D CNN DeepMedic network model. Segmenting the liver region accurately and then using the segmented liver region as input to tumors segmentation method is adopted by many studies as it reduces the false rates resulted from segmenting abdomen organs as tumors. The proposed 3D CNN DeepMedic model has two pathways of input rather than one pathway, as in the original 3D CNN model. In this paper, the network was supplied with multiple abdomen CT versions, which helped improve the segmentation quality. The proposed model achieved 94.36%, 94.57%, 91.86%, and 93.14% for accuracy, sensitivity, specificity, and Dice similarity score, respectively. The experimental results indicate the applicability of the proposed method.
    Pipeline-Invariant Representation Learning for Neuroimaging. (arXiv:2208.12909v1 [cs.LG])
    Deep learning has been widely applied in neuroimaging, including to predicting brain-phenotype relationships from magnetic resonance imaging (MRI) volumes. MRI data usually requires extensive preprocessing before it is ready for modeling, even via deep learning, in part due to its high dimensionality and heterogeneity. A growing array of MRI preprocessing pipelines have been developed each with its own strengths and limitations. Recent studies have shown that pipeline-related variation may lead to different scientific findings, even when using the identical data. Meanwhile, the machine learning community has emphasized the importance of shifting from model-centric to data-centric approaches given that data quality plays an essential role in deep learning applications. Motivated by this idea, we first evaluate how preprocessing pipeline selection can impact the downstream performance of a supervised learning model. We next propose two pipeline-invariant representation learning methodologies, MPSL and PXL, to improve consistency in classification performance and to capture similar neural network representations between pipeline pairs. Using 2000 human subjects from the UK Biobank dataset, we demonstrate that both models present unique advantages, in particular that MPSL can be used to improve out-of-sample generalization to new pipelines, while PXL can be used to improve predictive performance consistency and representational similarity within a closed pipeline set. These results suggest that our proposed models can be applied to overcome pipeline-related biases and to improve reproducibility in neuroimaging prediction tasks.
    Consistency between ordering and clustering methods for graphs. (arXiv:2208.12933v1 [cs.LG])
    A relational dataset is often analyzed by optimally assigning a label to each element through clustering or ordering. While similar characterizations of a dataset would be achieved by both clustering and ordering methods, the former has been studied much more actively than the latter, particularly for the data represented as graphs. This study fills this gap by investigating methodological relationships between several clustering and ordering methods, focusing on spectral techniques. Furthermore, we evaluate the resulting performance of the clustering and ordering methods. To this end, we propose a measure called the label continuity error, which generically quantifies the degree of consistency between a sequence and partition for a set of elements. Based on synthetic and real-world datasets, we evaluate the extents to which an ordering method identifies a module structure and a clustering method identifies a banded structure.
    Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment. (arXiv:2208.13628v1 [cs.CV])
    Vision and Language Pretraining has become the prevalent approach for tackling multimodal downstream tasks. The current trend is to move towards ever larger models and pretraining datasets. This computational headlong rush does not seem reasonable in the long term to move toward sustainable solutions, and de facto excludes academic laboratories with limited resources. In this work, we propose a new framework, dubbed ViCHA, that efficiently exploits the input data to boost the learning by: (a) a new hierarchical cross-modal alignment loss, (b) new self-supervised scheme based on masked image modeling, (c) leveraging image-level annotations, called Visual Concepts, obtained with existing foundation models such as CLIP to boost the performance of the image encoder. Although pretrained on four times less data, our ViCHA strategy outperforms other approaches on several downstream tasks such as Image-Text Retrieval, VQA, Visual Reasoning, Visual Entailment and Visual Grounding. The code will be made publicly available here: https://github.com/mshukor/ViCHA
    Semantic Clustering of a Sequence of Satellite Images. (arXiv:2208.13504v1 [cs.CV])
    Satellite images constitute a highly valuable and abundant resource for many real world applications. However, the labeled data needed to train most machine learning models are scarce and difficult to obtain. In this context, the current work investigates a fully unsupervised methodology that, given a temporal sequence of satellite images, creates a partition of the ground according to its semantic properties and their evolution over time. The sequences of images are translated into a grid of multivariate time series of embedded tiles. The embedding and the partitional clustering of these sequences of tiles are constructed in two iterative steps: In the first step, the embedding is able to extract the information of the sequences of tiles based on a geographical neighborhood, and the tiles are grouped into clusters. In the second step, the embedding is refined by using the neighborhood defined by the clusters, and the final clustering of the sequences of tiles is obtained. We illustrate the methodology by conducting the semantic clustering of a sequence of 20 satellite images of the region of Navarra (Spain). The results show that the clustering of multivariate time series is robust and contains trustful spatio-temporal semantic information about the region under study. We unveil the close connection that exists between the geographic and embedded spaces, and find out that the semantic properties attributed to these kinds of embeddings are fully exploited and even enhanced by the proposed clustering of time series.
    Overparameterized (robust) models from computational constraints. (arXiv:2208.12926v1 [cs.LG])
    Overparameterized models with millions of parameters have been hugely successful. In this work, we ask: can the need for large models be, at least in part, due to the \emph{computational} limitations of the learner? Additionally, we ask, is this situation exacerbated for \emph{robust} learning? We show that this indeed could be the case. We show learning tasks for which computationally bounded learners need \emph{significantly more} model parameters than what information-theoretic learners need. Furthermore, we show that even more model parameters could be necessary for robust learning. In particular, for computationally bounded learners, we extend the recent result of Bubeck and Sellke [NeurIPS'2021] which shows that robust models might need more parameters, to the computational regime and show that bounded learners could provably need an even larger number of parameters. Then, we address the following related question: can we hope to remedy the situation for robust computationally bounded learning by restricting \emph{adversaries} to also be computationally bounded for sake of obtaining models with fewer parameters? Here again, we show that this could be possible. Specifically, building on the work of Garg, Jha, Mahloujifar, and Mahmoody [ALT'2020], we demonstrate a learning task that can be learned efficiently and robustly against a computationally bounded attacker, while to be robust against an information-theoretic attacker requires the learner to utilize significantly more parameters.
    Virtual Control Group: Measuring Hidden Performance Metrics. (arXiv:2208.12941v1 [cs.LG])
    Performance metrics measuring in Financial Integrity systems are crucial for maintaining an efficient and cost effective operation. An important performance metric is False Positive Rate. This metric cannot be directly monitored since we don't know for sure if a user is bad once blocked. We present a statistical method based on survey theory and causal inference methods to estimate the false positive rate of the system or a single blocking policy. We also suggest a new approach of outcome matching that in some cases including empirical data outperformed other commonly used methods. The approaches described in this paper can be applied in other Integrity domains such as Cyber Security.
    Towards In-distribution Compatibility in Out-of-distribution Detection. (arXiv:2208.13433v1 [cs.CV])
    Deep neural network, despite its remarkable capability of discriminating targeted in-distribution samples, shows poor performance on detecting anomalous out-of-distribution data. To address this defect, state-of-the-art solutions choose to train deep networks on an auxiliary dataset of outliers. Various training criteria for these auxiliary outliers are proposed based on heuristic intuitions. However, we find that these intuitively designed outlier training criteria can hurt in-distribution learning and eventually lead to inferior performance. To this end, we identify three causes of the in-distribution incompatibility: contradictory gradient, false likelihood, and distribution shift. Based on our new understandings, we propose a new out-of-distribution detection method by adapting both the top-design of deep models and the loss function. Our method achieves in-distribution compatibility by pursuing less interference with the probabilistic characteristic of in-distribution features. On several benchmarks, our method not only achieves the state-of-the-art out-of-distribution detection performance but also improves the in-distribution accuracy.
    An Access Control Method with Secret Key for Semantic Segmentation Models. (arXiv:2208.13135v1 [cs.CV])
    A novel method for access control with a secret key is proposed to protect models from unauthorized access in this paper. We focus on semantic segmentation models with the vision transformer (ViT), called segmentation transformer (SETR). Most existing access control methods focus on image classification tasks, or they are limited to CNNs. By using a patch embedding structure that ViT has, trained models and test images can be efficiently encrypted with a secret key, and then semantic segmentation tasks are carried out in the encrypted domain. In an experiment, the method is confirmed to provide the same accuracy as that of using plain images without any encryption to authorized users with a correct key and also to provide an extremely degraded accuracy to unauthorized users.
    Asynchronous Training Schemes in Distributed Learning with Time Delay. (arXiv:2208.13154v1 [cs.LG])
    In the context of distributed deep learning, the issue of stale weights or gradients could result in poor algorithmic performance. This issue is usually tackled by delay tolerant algorithms with some mild assumptions on the objective functions and step sizes. In this paper, we propose a different approach to develop a new algorithm, called $\textbf{P}$redicting $\textbf{C}$lipping $\textbf{A}$synchronous $\textbf{S}$tochastic $\textbf{G}$radient $\textbf{D}$escent (aka, PC-ASGD). Specifically, PC-ASGD has two steps - the $\textit{predicting step}$ leverages the gradient prediction using Taylor expansion to reduce the staleness of the outdated weights while the $\textit{clipping step}$ selectively drops the outdated weights to alleviate their negative effects. A tradeoff parameter is introduced to balance the effects between these two steps. Theoretically, we present the convergence rate considering the effects of delay of the proposed algorithm with constant step size when the smooth objective functions are weakly strongly-convex and nonconvex. One practical variant of PC-ASGD is also proposed by adopting a condition to help with the determination of the tradeoff parameter. For empirical validation, we demonstrate the performance of the algorithm with two deep neural network architectures on two benchmark datasets.
    Normalized Activation Function: Toward Better Convergence. (arXiv:2208.13315v1 [cs.LG])
    Activation functions are essential for neural networks to introduce non-linearity. A great number of empirical experiments have validated various activation functions, yet theoretical research on activation functions are insufficient. In this work, we study the impact of activation functions on the variance of gradients and propose an approach to normalize activation functions to keep the variance of the gradient same for all layers so that the neural network can achieve better convergence. First, we complement the previous work on the analysis of the variance of gradients where the impact of activation functions are just considered in an idealized initial state which almost cannot be preserved during training and obtained a property that good activation functions should satisfy as possible. Second, we offer an approach to normalize activation functions and testify its effectiveness on prevalent activation functions empirically. And by observing experiments, we discover that the speed of convergence is roughly related to the property we derived in the former part. We run experiments of our normalized activation functions against common activation functions. And the result shows our approach consistently outperforms their unnormalized counterparts. For example, normalized Swish outperforms vanilla Swish by 1.2% on ResNet50 with CIFAR-100 in terms of top-1 accuracy. Our method improves the performance by simply replacing activation functions with their normalized ones in both fully-connected networks and residual networks.
    Geometrical versus time-series representation of data in quantum control learning. (arXiv:1803.05169v2 [quant-ph] CROSS LISTED)
    Recently machine learning techniques have become popular for analysing physical systems and solving problems occurring in quantum computing. In this paper we focus on using such techniques for finding the sequence of physical operations implementing the given quantum logical operation. In this context we analyse the flexibility of the data representation and compare the applicability of two machine learning approaches based on different representations of data. We demonstrate that the utilization of the geometrical structure of control pulses is sufficient for achieving high-fidelity of the implemented evolution. We also demonstrate that artificial neural networks, unlike geometrical methods, posses the generalization abilities enabling them to generate control pulses for the systems with variable strength of the disturbance. The presented results suggest that in some quantum control scenarios, geometrical data representation and processing is competitive to more complex methods.
    Improving debris flow evacuation alerts in Taiwan using machine learning. (arXiv:2208.13027v1 [cs.LG])
    Taiwan has the highest susceptibility to and fatalities from debris flows worldwide. The existing debris flow warning system in Taiwan, which uses a time-weighted measure of rainfall, leads to alerts when the measure exceeds a predefined threshold. However, this system generates many false alarms and misses a substantial fraction of the actual debris flows. Towards improving this system, we implemented five machine learning models that input historical rainfall data and predict whether a debris flow will occur within a selected time. We found that a random forest model performed the best among the five models and outperformed the existing system in Taiwan. Furthermore, we identified the rainfall trajectories strongly related to debris flow occurrences and explored trade-offs between the risks of missing debris flows versus frequent false alerts. These results suggest the potential for machine learning models trained on hourly rainfall data alone to save lives while reducing false alerts.
    Computing with Hypervectors for Efficient Speaker Identification. (arXiv:2208.13285v1 [cs.SD])
    We introduce a method to identify speakers by computing with high-dimensional random vectors. Its strengths are simplicity and speed. With only 1.02k active parameters and a 128-minute pass through the training data we achieve Top-1 and Top-5 scores of 31% and 52% on the VoxCeleb1 dataset of 1,251 speakers. This is in contrast to CNN models requiring several million parameters and orders of magnitude higher computational complexity for only a 2$\times$ gain in discriminative power as measured in mutual information. An additional 92 seconds of training with Generalized Learning Vector Quantization (GLVQ) raises the scores to 48% and 67%. A trained classifier classifies 1 second of speech in 5.7 ms. All processing was done on standard CPU-based machines.
    Affective Manifolds: Modeling Machine's Mind to Like, Dislike, Enjoy, Suffer, Worry, Fear, and Feel Like A Human. (arXiv:2208.13386v1 [cs.LG])
    After the development of different machine learning and manifold learning algorithms, it may be a good time to put them together to make a powerful mind for machine. In this work, we propose affective manifolds as components of a machine's mind. Every affective manifold models a characteristic group of mind and contains multiple states. We define the machine's mind as a set of affective manifolds. We use a learning model for mapping the input signals to the embedding space of affective manifold. Using this mapping, a machine or a robot takes an input signal and can react emotionally to it. We use deep metric learning, with Siamese network, and propose a loss function for affective manifold learning. We define margins between states based on the psychological and philosophical studies. Using triplets of instances, we train the network to minimize the variance of every state and have the desired distances between states. We show that affective manifolds can have various applications for machine-machine and human-machine interactions. Some simulations are also provided for verification of the proposed method. It is possible to have as many affective manifolds as required in machine's mind. More affective manifolds in the machine's mind can make it more realistic and effective. This paper opens the door; we invite the researchers from various fields of science to propose more affective manifolds to be inserted in machine's mind.
    An Offset-Free Nonlinear MPC scheme for systems learned by Neural NARX models. (arXiv:2203.16290v2 [eess.SY] UPDATED)
    This paper deals with the design of nonlinear MPC controllers that provide offset-free setpoint tracking for models described by Neural Nonlinear AutoRegressive eXogenous (NNARX) networks. The NNARX model is identified from input-output data collected from the plant, and can be given a state-space representation with known measurable states made by past input and output variables, so that a state observer is not required. In the training phase, the Incremental Input-to-State Stability ({\delta}ISS) property can be forced when consistent with the behavior of the plant. The {\delta}ISS property is then leveraged to augment the model with an explicit integral action on the output tracking error, which allows to achieve offset-free tracking capabilities to the designed control scheme. The proposed control architecture is numerically tested on a water heating system and the achieved results are compared to those scored by another popular offset-free MPC method, showing that the proposed scheme attains remarkable performances even in presence of disturbances acting on the plant.
    Visualizing high-dimensional loss landscapes with Hessian directions. (arXiv:2208.13219v1 [cs.LG])
    Analyzing geometric properties of high-dimensional loss functions, such as local curvature and the existence of other optima around a certain point in loss space, can help provide a better understanding of the interplay between neural network structure, implementation attributes, and learning performance. In this work, we combine concepts from high-dimensional probability and differential geometry to study how curvature properties in lower-dimensional loss representations depend on those in the original loss space. We show that saddle points in the original space are rarely correctly identified as such in lower-dimensional representations if random projections are used. In such projections, the expected curvature in a lower-dimensional representation is proportional to the mean curvature in the original loss space. Hence, the mean curvature in the original loss space determines if saddle points appear, on average, as either minima, maxima, or almost flat regions. We use the connection between expected curvature and mean curvature (i.e., the normalized Hessian trace) to estimate the trace of Hessians without calculating the Hessian or Hessian-vector products as in Hutchinson's method. Because random projections are not able to correctly identify saddle information, we propose to study projections along Hessian directions that are associated with the largest and smallest principal curvatures. We connect our findings to the ongoing debate on loss landscape flatness and generalizability. Finally, we illustrate our method in numerical experiments on different image classifiers with up to about $7\times 10^6$ parameters.
    Time-aware Self-Attention Meets Logic Reasoning in Recommender Systems. (arXiv:2208.13330v1 [cs.IR])
    At the age of big data, recommender systems have shown remarkable success as a key means of information filtering in our daily life. Recent years have witnessed the technical development of recommender systems, from perception learning to cognition reasoning which intuitively build the task of recommendation as the procedure of logical reasoning and have achieve significant improvement. However, the logical statement in reasoning implicitly admits irrelevance of ordering, even does not consider time information which plays an important role in many recommendation tasks. Furthermore, recommendation model incorporated with temporal context would tend to be self-attentive, i.e., automatically focus more (less) on the relevance (irrelevance), respectively. To address these issues, in this paper, we propose a Time-aware Self-Attention with Neural Collaborative Reasoning (TiSANCR) based recommendation model, which integrates temporal patterns and self-attention mechanism into reasoning-based recommendation. Specially, temporal patterns represented by relative time, provide context and auxiliary information to characterize the user's preference in recommendation, while self-attention is leveraged to distill informative patterns and suppress irrelevances. Therefore, the fusion of self-attentive temporal information provides deeper representation of user's preference. Extensive experiments on benchmark datasets demonstrate that the proposed TiSANCR achieves significant improvement and consistently outperforms the state-of-the-art recommendation methods.
    Building the Intent Landscape of Real-World Conversational Corpora with Extractive Question-Answering Transformers. (arXiv:2208.12886v1 [cs.CL])
    For companies with customer service, mapping intents inside their conversational data is crucial in building applications based on natural language understanding (NLU). Nevertheless, there is no established automated technique to gather the intents from noisy online chats or voice transcripts. Simple clustering approaches are not suited to intent-sparse dialogues. To solve this intent-landscape task, we propose an unsupervised pipeline that extracts the intents and the taxonomy of intents from real-world dialogues. Our pipeline mines intent-span candidates with an extractive Question-Answering Electra model and leverages sentence embeddings to apply a low-level density clustering followed by a top-level hierarchical clustering. Our results demonstrate the generalization ability of an ELECTRA large model fine-tuned on the SQuAD2 dataset to understand dialogues. With the right prompting question, this model achieves a rate of linguistic validation on intent spans beyond 85%. We furthermore reconstructed the intent schemes of five domains from the MultiDoGo dataset with an average recall of 94.3%.
    Interpreting Black-box Machine Learning Models for High Dimensional Datasets. (arXiv:2208.13405v1 [cs.LG])
    Deep neural networks (DNNs) have been shown to outperform traditional machine learning algorithms in a broad variety of application domains due to their effectiveness in modeling intricate problems and handling high-dimensional datasets. Many real-life datasets, however, are of increasingly high dimensionality, where a large number of features may be irrelevant to the task at hand. The inclusion of such features would not only introduce unwanted noise but also increase computational complexity. Furthermore, due to high non-linearity and dependency among a large number of features, DNN models tend to be unavoidably opaque and perceived as black-box methods because of their not well-understood internal functioning. A well-interpretable model can identify statistically significant features and explain the way they affect the model's outcome. In this paper, we propose an efficient method to improve the interpretability of black-box models for classification tasks in the case of high-dimensional datasets. To this end, we first train a black-box model on a high-dimensional dataset to learn the embeddings on which the classification is performed. To decompose the inner working principles of the black-box model and to identify top-k important features, we employ different probing and perturbing techniques. We then approximate the behavior of the black-box model by means of an interpretable surrogate model on the top-k feature space. Finally, we derive decision rules and local explanations from the surrogate model to explain individual decisions. Our approach outperforms and competes with state-of-the-art methods such as TabNet, XGboost, and SHAP-based interpretability techniques when tested on different datasets with varying dimensionality between 50 and 20,000.
    Domain Adaptation Principal Component Analysis: base linear method for learning with out-of-distribution data. (arXiv:2208.13290v1 [cs.LG])
    Domain adaptation is a popular paradigm in modern machine learning which aims at tackling the problem of divergence between training or validation dataset possessing labels for learning and testing a classifier (source domain) and a potentially large unlabeled dataset where the model is exploited (target domain). The task is to find such a common representation of both source and target datasets in which the source dataset is informative for training and such that the divergence between source and target would be minimized. Most popular solutions for domain adaptation are currently based on training neural networks that combine classification and adversarial learning modules, which are data hungry and usually difficult to train. We present a method called Domain Adaptation Principal Component Analysis (DAPCA) which finds a linear reduced data representation useful for solving the domain adaptation task. DAPCA is based on introducing positive and negative weights between pairs of data points and generalizes the supervised extension of principal component analysis. DAPCA represents an iterative algorithm such that at each iteration a simple quadratic optimization problem is solved. The convergence of the algorithm is guaranteed and the number of iterations is small in practice. We validate the suggested algorithm on previously proposed benchmarks for solving the domain adaptation task, and also show the benefit of using DAPCA in the analysis of single cell omics datasets in biomedical applications. Overall, DAPCA can serve as a useful preprocessing step in many machine learning applications leading to reduced dataset representations, taking into account possible divergence between source and target domains.
    Learning with Proper Partial Labels. (arXiv:2112.12303v2 [cs.LG] UPDATED)
    Partial-label learning is a kind of weakly-supervised learning with inexact labels, where for each training example, we are given a set of candidate labels instead of only one true label. Recently, various approaches on partial-label learning have been proposed under different generation models of candidate label sets. However, these methods require relatively strong distributional assumptions on the generation models. When the assumptions do not hold, the performance of the methods is not guaranteed theoretically. In this paper, we propose the notion of properness on partial labels. We show that this proper partial-label learning framework requires a weaker distributional assumption and includes many previous partial-label learning settings as special cases. We then derive a unified unbiased estimator of the classification risk. We prove that our estimator is risk-consistent, and we also establish an estimation error bound. Finally, we validate the effectiveness of our algorithm through experiments.
    Contrastive Feature Learning for Fault Detection and Diagnostics in Railway Applications. (arXiv:2208.13288v1 [cs.LG])
    A railway is a complex system comprising multiple infrastructure and rolling stock assets. To operate the system safely, reliably, and efficiently, the condition many components needs to be monitored. To automate this process, data-driven fault detection and diagnostics models can be employed. In practice, however, the performance of data-driven models can be compromised if the training dataset is not representative of all possible future conditions. We propose to approach this problem by learning a feature representation that is, on the one hand, invariant to operating or environmental factors but, on the other hand, sensitive to changes in the asset's health condition. We evaluate how contrastive learning can be employed on supervised and unsupervised fault detection and diagnostics tasks given real condition monitoring datasets within a railway system - one image dataset from infrastructure assets and one time-series dataset from rolling stock assets. First, we evaluate the performance of supervised contrastive feature learning on a railway sleeper defect classification task given a labeled image dataset. Second, we evaluate the performance of unsupervised contrastive feature learning without access to faulty samples on an anomaly detection task given a railway wheel dataset. Here, we test the hypothesis of whether a feature encoder's sensitivity to degradation is also sensitive to novel fault patterns in the data. Our results demonstrate that contrastive feature learning improves the performance on the supervised classification task regarding sleepers compared to a state-of-the-art method. Moreover, on the anomaly detection task concerning the railway wheels, the detection of shelling defects is improved compared to state-of-the-art methods.
    Geometrical Homogeneous Clustering for Image Data Reduction. (arXiv:2208.13079v1 [cs.LG])
    In this paper, we present novel variations of an earlier approach called homogeneous clustering algorithm for reducing dataset size. The intuition behind the approaches proposed in this paper is to partition the dataset into homogeneous clusters and select some images which contribute significantly to the accuracy. Selected images are the proper subset of the training data and thus are human-readable. We propose four variations upon the baseline algorithm-RHC. The intuition behind the first approach, RHCKON, is that the boundary points contribute significantly towards the representation of clusters. It involves selecting k farthest and one nearest neighbour of the centroid of the clusters. In the following two approaches (KONCW and CWKC), we introduce the concept of cluster weights. They are based on the fact that larger clusters contribute more than smaller sized clusters. The final variation is GHCIDR which selects points based on the geometrical aspect of data distribution. We performed the experiments on two deep learning models- Fully Connected Networks (FCN) and VGG1. We experimented with the four variants on three datasets- MNIST, CIFAR10, and Fashion-MNIST. We found that GHCIDR gave the best accuracy of 99.35%, 81.10%, and 91.66% and a training data reduction of 87.27%, 32.34%, and 76.80% on MNIST, CIFAR10, and Fashion-MNIST respectively.
    AutoQML: Automatic Generation and Training of Robust Quantum-Inspired Classifiers by Using Genetic Algorithms on Grayscale Images. (arXiv:2208.13246v1 [quant-ph])
    We propose a new hybrid system for automatically generating and training quantum-inspired classifiers on grayscale images by using multiobjective genetic algorithms. We define a dynamic fitness function to obtain the smallest possible circuit and highest accuracy on unseen data, ensuring that the proposed technique is generalizable and robust. We minimize the complexity of the generated circuits in terms of the number of entanglement gates by penalizing their appearance. We reduce the size of the images with two dimensionality reduction approaches: principal component analysis (PCA), which is encoded in the individual for optimization purpose, and a small convolutional autoencoder (CAE). These two methods are compared with one another and with a classical nonlinear approach to understand their behaviors and to ensure that the classification ability is due to the quantum circuit and not the preprocessing technique used for dimensionality reduction.
    Unsupervised diffeomorphic cardiac image registration using parameterization of the deformation field. (arXiv:2208.13275v1 [eess.IV])
    This study proposes an end-to-end unsupervised diffeomorphic deformable registration framework based on moving mesh parameterization. Using this parameterization, a deformation field can be modeled with its transformation Jacobian determinant and curl of end velocity field. The new model of the deformation field has three important advantages; firstly, it relaxes the need for an explicit regularization term and the corresponding weight in the cost function. The smoothness is implicitly embedded in the solution which results in a physically plausible deformation field. Secondly, it guarantees diffeomorphism through explicit constraints applied to the transformation Jacobian determinant to keep it positive. Finally, it is suitable for cardiac data processing, since the nature of this parameterization is to define the deformation field in terms of the radial and rotational components. The effectiveness of the algorithm is investigated by evaluating the proposed method on three different data sets including 2D and 3D cardiac MRI scans. The results demonstrate that the proposed framework outperforms existing learning-based and non-learning-based methods while generating diffeomorphic transformations.
    MANDO: Multi-Level Heterogeneous Graph Embeddings for Fine-Grained Detection of Smart Contract Vulnerabilities. (arXiv:2208.13252v1 [cs.SE])
    Learning heterogeneous graphs consisting of different types of nodes and edges enhances the results of homogeneous graph techniques. An interesting example of such graphs is control-flow graphs representing possible software code execution flows. As such graphs represent more semantic information of code, developing techniques and tools for such graphs can be highly beneficial for detecting vulnerabilities in software for its reliability. However, existing heterogeneous graph techniques are still insufficient in handling complex graphs where the number of different types of nodes and edges is large and variable. This paper concentrates on the Ethereum smart contracts as a sample of software codes represented by heterogeneous contract graphs built upon both control-flow graphs and call graphs containing different types of nodes and links. We propose MANDO, a new heterogeneous graph representation to learn such heterogeneous contract graphs' structures. MANDO extracts customized metapaths, which compose relational connections between different types of nodes and their neighbors. Moreover, it develops a multi-metapath heterogeneous graph attention network to learn multi-level embeddings of different types of nodes and their metapaths in the heterogeneous contract graphs, which can capture the code semantics of smart contracts more accurately and facilitate both fine-grained line-level and coarse-grained contract-level vulnerability detection. Our extensive evaluation of large smart contract datasets shows that MANDO improves the vulnerability detection results of other techniques at the coarse-grained contract level. More importantly, it is the first learning-based approach capable of identifying vulnerabilities at the fine-grained line-level, and significantly improves the traditional code analysis-based vulnerability detection approaches by 11.35% to 70.81% in terms of F1-score.
    Neural Observer with Lyapunov Stability Guarantee for Uncertain Nonlinear Systems. (arXiv:2208.13006v1 [math.OC])
    In this paper, we propose a novel nonlinear observer, called the neural observer, for observation tasks of linear time-invariant (LTI) systems and uncertain nonlinear systems by introducing the neural network (NN) into the design of observers. By exploring the method of NN representation to the NN mapping vector, we derive stability analyses (e.g., exponential convergence rate) of LTI and uncertain nonlinear systems that pave the way to solve observation problems using linear matrix inequalities (LMIs) only. Remarkably, the neural observer designed for uncertain systems is based on the ideology of the active disturbance rejection control (ADRC), which can measure the uncertainty in real-time. The LMI results are also significant since we reveal that the observability and controllability of system matrices are required for the existence of solutions of LMIs. Finally, we verify the availability of neural observers on three simulation cases, including the X-29A aircraft model, the nonlinear pendulum, and the four-wheel steering vehicle.
    A Federated Learning-enabled Smart Street Light Monitoring Application: Benefits and Future Challenges. (arXiv:2208.12996v1 [cs.CV])
    Data-enabled cities are recently accelerated and enhanced with automated learning for improved Smart Cities applications. In the context of an Internet of Things (IoT) ecosystem, the data communication is frequently costly, inefficient, not scalable and lacks security. Federated Learning (FL) plays a pivotal role in providing privacy-preserving and communication efficient Machine Learning (ML) frameworks. In this paper we evaluate the feasibility of FL in the context of a Smart Cities Street Light Monitoring application. FL is evaluated against benchmarks of centralised and (fully) personalised machine learning techniques for the classification task of the lampposts operation. Incorporating FL in such a scenario shows minimal performance reduction in terms of the classification task, but huge improvements in the communication cost and the privacy preserving. These outcomes strengthen FL's viability and potential for IoT applications.
    An Empirical Study on the Usage of Automated Machine Learning Tools. (arXiv:2208.13116v1 [cs.SE])
    The popularity of automated machine learning (AutoML) tools in different domains has increased over the past few years. Machine learning (ML) practitioners use AutoML tools to automate and optimize the process of feature engineering, model training, and hyperparameter optimization and so on. Recent work performed qualitative studies on practitioners' experiences of using AutoML tools and compared different AutoML tools based on their performance and provided features, but none of the existing work studied the practices of using AutoML tools in real-world projects at a large scale. Therefore, we conducted an empirical study to understand how ML practitioners use AutoML tools in their projects. To this end, we examined the top 10 most used AutoML tools and their respective usages in a large number of open-source project repositories hosted on GitHub. The results of our study show 1) which AutoML tools are mostly used by ML practitioners and 2) the characteristics of the repositories that use these AutoML tools. Also, we identified the purpose of using AutoML tools (e.g. model parameter sampling, search space management, model evaluation/error-analysis, Data/ feature transformation, and data labeling) and the stages of the ML pipeline (e.g. feature engineering) where AutoML tools are used. Finally, we report how often AutoML tools are used together in the same source code files. We hope our results can help ML practitioners learn about different AutoML tools and their usages, so that they can pick the right tool for their purposes. Besides, AutoML tool developers can benefit from our findings to gain insight into the usages of their tools and improve their tools to better fit the users' usages and needs.
    Survey: Exploiting Data Redundancy for Optimization of Deep Learning. (arXiv:2208.13363v1 [cs.LG])
    Data redundancy is ubiquitous in the inputs and intermediate results of Deep Neural Networks (DNN). It offers many significant opportunities for improving DNN performance and efficiency and has been explored in a large body of work. These studies have scattered in many venues across several years. The targets they focus on range from images to videos and texts, and the techniques they use to detect and exploit data redundancy also vary in many aspects. There is not yet a systematic examination and summary of the many efforts, making it difficult for researchers to get a comprehensive view of the prior work, the state of the art, differences and shared principles, and the areas and directions yet to explore. This article tries to fill the void. It surveys hundreds of recent papers on the topic, introduces a novel taxonomy to put the various techniques into a single categorization framework, offers a comprehensive description of the main methods used for exploiting data redundancy in improving multiple kinds of DNNs on data, and points out a set of research opportunities for future to explore.
    GATE: Graph CCA for Temporal SElf-supervised Learning for Label-efficient fMRI Analysis. (arXiv:2203.09034v3 [cs.LG] UPDATED)
    In this work, we focus on the challenging task, neuro-disease classification, using functional magnetic resonance imaging (fMRI). In population graph-based disease analysis, graph convolutional neural networks (GCNs) have achieved remarkable success. However, these achievements are inseparable from abundant labeled data and sensitive to spurious signals. To improve fMRI representation learning and classification under a label-efficient setting, we propose a novel and theory-driven self-supervised learning (SSL) framework on GCNs, namely Graph CCA for Temporal self-supervised learning on fMRI analysis GATE. Concretely, it is demanding to design a suitable and effective SSL strategy to extract formation and robust features for fMRI. To this end, we investigate several new graph augmentation strategies from fMRI dynamic functional connectives (FC) for SSL training. Further, we leverage canonical-correlation analysis (CCA) on different temporal embeddings and present the theoretical implications. Consequently, this yields a novel two-step GCN learning procedure comprised of (i) SSL on an unlabeled fMRI population graph and (ii) fine-tuning on a small labeled fMRI dataset for a classification task. Our method is tested on two independent fMRI datasets, demonstrating superior performance on autism and dementia diagnosis.
    Deep Kernel Learning of Dynamical Models from High-Dimensional Noisy Data. (arXiv:2208.12975v1 [cs.LG])
    This work proposes a Stochastic Variational Deep Kernel Learning method for the data-driven discovery of low-dimensional dynamical models from high-dimensional noisy data. The framework is composed of an encoder that compresses high-dimensional measurements into low-dimensional state variables, and a latent dynamical model for the state variables that predicts the system evolution over time. The training of the proposed model is carried out in an unsupervised manner, i.e., not relying on labeled data. Our learning method is evaluated on the motion of a pendulum -- a well studied baseline for nonlinear model identification and control with continuous states and control inputs -- measured via high-dimensional noisy RGB images. Results show that the method can effectively denoise measurements, learn compact state representations and latent dynamical models, as well as identify and quantify modeling uncertainties.
    Improving Electricity Market Economy via Closed-Loop Predict-and-Optimize. (arXiv:2208.13065v1 [math.OC])
    The electricity market clearing is usually implemented via an open-loop predict-then-optimize (O-PO) process: it first predicts the available power of renewable energy sources (RES) and the system reserve requirements; then, given the predictions, the markets are cleared via optimization models, i.e., unit commitment (UC) and economic dispatch (ED), to pursue the optimal electricity market economy. However, the market economy could suffer from the open-loop process because its predictions may be overly myopic to the optimizations, i.e., the predictions seek to improve the immediate statistical forecasting errors instead of the ultimate market economy. To this end, this paper proposes a closed-loop predict-and-optimize (C-PO) framework based on the tri-level mixed-integer programming, which trains economy-oriented predictors tailored for the market-clearing optimization to improve the ultimate market economy. Specifically, the upper level trains the economy-oriented RES and reserve predictors according to their induced market economy; the middle and lower levels, with given predictions, mimic the market-clearing process and feed the induced market economy results back to the upper level. The trained economy-oriented predictors are then embedded into the UC model, forming a prescriptive UC model that can simultaneously provide RES-reserve predictions and UC decisions with enhanced market economy. Numerical case studies on an IEEE 118-bus system illustrate potential economic and practical advantages of C-PO over O-PO, robust UC, and stochastic UC.
    Label-Efficient Self-Training for Attribute Extraction from Semi-Structured Web Documents. (arXiv:2208.13086v1 [cs.IR])
    Extracting structured information from HTML documents is a long-studied problem with a broad range of applications, including knowledge base construction, faceted search, and personalized recommendation. Prior works rely on a few human-labeled web pages from each target website or thousands of human-labeled web pages from some seed websites to train a transferable extraction model that generalizes on unseen target websites. Noisy content, low site-level consistency, and lack of inter-annotator agreement make labeling web pages a time-consuming and expensive ordeal. We develop LEAST -- a Label-Efficient Self-Training method for Semi-Structured Web Documents to overcome these limitations. LEAST utilizes a few human-labeled pages to pseudo-annotate a large number of unlabeled web pages from the target vertical. It trains a transferable web-extraction model on both human-labeled and pseudo-labeled samples using self-training. To mitigate error propagation due to noisy training samples, LEAST re-weights each training sample based on its estimated label accuracy and incorporates it in training. To the best of our knowledge, this is the first work to propose end-to-end training for transferable web extraction models utilizing only a few human-labeled pages. Experiments on a large-scale public dataset show that using less than ten human-labeled pages from each seed website for training, a LEAST-trained model outperforms previous state-of-the-art by more than 26 average F1 points on unseen websites, reducing the number of human-labeled pages to achieve similar performance by more than 10x.
    ATTRITION: Attacking Static Hardware Trojan Detection Techniques Using Reinforcement Learning. (arXiv:2208.12897v1 [cs.CR])
    Stealthy hardware Trojans (HTs) inserted during the fabrication of integrated circuits can bypass the security of critical infrastructures. Although researchers have proposed many techniques to detect HTs, several limitations exist, including: (i) a low success rate, (ii) high algorithmic complexity, and (iii) a large number of test patterns. Furthermore, the most pertinent drawback of prior detection techniques stems from an incorrect evaluation methodology, i.e., they assume that an adversary inserts HTs randomly. Such inappropriate adversarial assumptions enable detection techniques to claim high HT detection accuracy, leading to a "false sense of security." Unfortunately, to the best of our knowledge, despite more than a decade of research on detecting HTs inserted during fabrication, there have been no concerted efforts to perform a systematic evaluation of HT detection techniques. In this paper, we play the role of a realistic adversary and question the efficacy of HT detection techniques by developing an automated, scalable, and practical attack framework, ATTRITION, using reinforcement learning (RL). ATTRITION evades eight detection techniques across two HT detection categories, showcasing its agnostic behavior. ATTRITION achieves average attack success rates of $47\times$ and $211\times$ compared to randomly inserted HTs against state-of-the-art HT detection techniques. We demonstrate ATTRITION's ability to evade detection techniques by evaluating designs ranging from the widely-used academic suites to larger designs such as the open-source MIPS and mor1kx processors to AES and a GPS module. Additionally, we showcase the impact of ATTRITION-generated HTs through two case studies (privilege escalation and kill switch) on the mor1kx processor. We envision that our work, along with our released HT benchmarks and models, fosters the development of better HT detection techniques.
    RL-DistPrivacy: Privacy-Aware Distributed Deep Inference for low latency IoT systems. (arXiv:2208.13032v1 [cs.LG])
    Although Deep Neural Networks (DNN) have become the backbone technology of several ubiquitous applications, their deployment in resource-constrained machines, e.g., Internet of Things (IoT) devices, is still challenging. To satisfy the resource requirements of such a paradigm, collaborative deep inference with IoT synergy was introduced. However, the distribution of DNN networks suffers from severe data leakage. Various threats have been presented, including black-box attacks, where malicious participants can recover arbitrary inputs fed into their devices. Although many countermeasures were designed to achieve privacy-preserving DNN, most of them result in additional computation and lower accuracy. In this paper, we present an approach that targets the security of collaborative deep inference via re-thinking the distribution strategy, without sacrificing the model performance. Particularly, we examine different DNN partitions that make the model susceptible to black-box threats and we derive the amount of data that should be allocated per device to hide proprieties of the original input. We formulate this methodology, as an optimization, where we establish a trade-off between the latency of co-inference and the privacy-level of data. Next, to relax the optimal solution, we shape our approach as a Reinforcement Learning (RL) design that supports heterogeneous devices as well as multiple DNNs/datasets.
    Improving the Efficiency of Gradient Descent Algorithms Applied to Optimization Problems with Dynamical Constraints. (arXiv:2208.12834v1 [cs.LG])
    We introduce two block coordinate descent algorithms for solving optimization problems with ordinary differential equations (ODEs) as dynamical constraints. The algorithms do not need to implement direct or adjoint sensitivity analysis methods to evaluate loss function gradients. They results from reformulation of the original problem as an equivalent optimization problem with equality constraints. The algorithms naturally follow from steps aimed at recovering the gradient-decent algorithm based on ODE solvers that explicitly account for sensitivity of the ODE solution. In our first proposed algorithm we avoid explicitly solving the ODE by integrating the ODE solver as a sequence of implicit constraints. In our second algorithm, we use an ODE solver to reset the ODE solution, but no direct are adjoint sensitivity analysis methods are used. Both algorithm accepts mini-batch implementations and show significant efficiency benefits from GPU-based parallelization. We demonstrate the performance of the algorithms when applied to learning the parameters of the Cucker-Smale model. The algorithms are compared with gradient descent algorithms based on ODE solvers endowed with sensitivity analysis capabilities, for various number of state size, using Pytorch and Jax implementations. The experimental results demonstrate that the proposed algorithms are at least 4x faster than the Pytorch implementations, and at least 16x faster than Jax implementations. For large versions of the Cucker-Smale model, the Jax implementation is thousands of times faster than the sensitivity analysis-based implementation. In addition, our algorithms generate more accurate results both on training and test data. Such gains in computational efficiency is paramount for algorithms that implement real time parameter estimations, such as diagnosis algorithms.
    Assessing Phenotype Definitions for Algorithmic Fairness. (arXiv:2203.05174v2 [q-bio.OT] UPDATED)
    Disease identification is a core, routine activity in observational health research. Cohorts impact downstream analyses, such as how a condition is characterized, how patient risk is defined, and what treatments are studied. It is thus critical to ensure that selected cohorts are representative of all patients, independently of their demographics or social determinants of health. While there are multiple potential sources of bias when constructing phenotype definitions which may affect their fairness, it is not standard in the field of phenotyping to consider the impact of different definitions across subgroups of patients. In this paper, we propose a set of best practices to assess the fairness of phenotype definitions. We leverage established fairness metrics commonly used in predictive models and relate them to commonly used epidemiological cohort description metrics. We describe an empirical study for Crohn's disease and diabetes type 2, each with multiple phenotype definitions taken from the literature across two sets of patient subgroups (gender and race). We show that the different phenotype definitions exhibit widely varying and disparate performance according to the different fairness metrics and subgroups. We hope that the proposed best practices can help in constructing fair and inclusive phenotype definitions.
    A Comprehensive Review of Digital Twin -- Part 2: Roles of Uncertainty Quantification and Optimization, a Battery Digital Twin, and Perspectives. (arXiv:2208.12904v1 [cs.LG])
    As an emerging technology in the era of Industry 4.0, digital twin is gaining unprecedented attention because of its promise to further optimize process design, quality control, health monitoring, decision and policy making, and more, by comprehensively modeling the physical world as a group of interconnected digital models. In a two-part series of papers, we examine the fundamental role of different modeling techniques, twinning enabling technologies, and uncertainty quantification and optimization methods commonly used in digital twins. This second paper presents a literature review of key enabling technologies of digital twins, with an emphasis on uncertainty quantification, optimization methods, open source datasets and tools, major findings, challenges, and future directions. Discussions focus on current methods of uncertainty quantification and optimization and how they are applied in different dimensions of a digital twin. Additionally, this paper presents a case study where a battery digital twin is constructed and tested to illustrate some of the modeling and twinning methods reviewed in this two-part review. Code and preprocessed data for generating all the results and figures presented in the case study are available on GitHub.
    Transfer Learning of High-Fidelity Opacity Spectra in Autoencoders and Surrogate Models. (arXiv:2203.00853v2 [physics.plasm-ph] UPDATED)
    Simulations of high energy density physics are expensive, largely in part for the need to produce non-local thermodynamic equilibrium opacities. High-fidelity spectra may reveal new physics in the simulations not seen with low-fidelity spectra, but the cost of these simulations also scale with the level of fidelity of the opacities being used. Neural networks are capable of reproducing these spectra, but neural networks need data to to train them which limits the level of fidelity of the training data. This paper demonstrates that it is possible to reproduce high-fidelity spectra with median errors in the realm of 3\% to 4\% using as few as 50 samples of high-fidelity Krypton data by performing transfer learning on a neural network trained on many times more low-fidelity data.
    A Hitchhiker's Guide to Statistical Comparisons of Reinforcement Learning Algorithms. (arXiv:1904.06979v2 [stat.ME] UPDATED)
    Consistently checking the statistical significance of experimental results is the first mandatory step towards reproducible science. This paper presents a hitchhiker's guide to rigorous comparisons of reinforcement learning algorithms. After introducing the concepts of statistical testing, we review the relevant statistical tests and compare them empirically in terms of false positive rate and statistical power as a function of the sample size (number of seeds) and effect size. We further investigate the robustness of these tests to violations of the most common hypotheses (normal distributions, same distributions, equal variances). Beside simulations, we compare empirical distributions obtained by running Soft-Actor Critic and Twin-Delayed Deep Deterministic Policy Gradient on Half-Cheetah. We conclude by providing guidelines and code to perform rigorous comparisons of RL algorithm performances.
    IM-META: Influence Maximization Using Node Metadata in Networks With Unknown Topology. (arXiv:2106.02926v2 [cs.SI] UPDATED)
    In real-world applications of influence maximization (IM), the network structure is often unknown. Thus, we may identify the most influential seed nodes by exploring only a part of the underlying network given a small budget for node queries. Motivated by the fact that collecting node metadata is more cost-effective than investigating the relationship between nodes via queried nodes, we propose IM-META, an end-to-end solution to IM in networks with unknown topology by retrieving information from queries and node metadata. However, using such metadata to aid the IM process is not without risk due to the noisy nature of metadata and uncertainties in connectivity inference. To tackle these challenges, we formulate a new IM problem that aims to find both seed nodes and queried nodes. In IM-META, we develop an effective method that iteratively performs three steps: 1) we learn the relationship between collected metadata and edges via a Siamese neural network model, 2) we select a number of inferred confident edges to construct a reinforced graph, and 3) we identify the next node to query by maximizing the inferred influence spread using our topology-aware ranking strategy. By querying only 5% of nodes, IM-META reaches 93% of the upper bound performance.
    Complexity-Driven CNN Compression for Resource-constrained Edge AI. (arXiv:2208.12816v1 [cs.LG])
    Recent advances in Artificial Intelligence (AI) on the Internet of Things (IoT)-enabled network edge has realized edge intelligence in several applications such as smart agriculture, smart hospitals, and smart factories by enabling low-latency and computational efficiency. However, deploying state-of-the-art Convolutional Neural Networks (CNNs) such as VGG-16 and ResNets on resource-constrained edge devices is practically infeasible due to their large number of parameters and floating-point operations (FLOPs). Thus, the concept of network pruning as a type of model compression is gaining attention for accelerating CNNs on low-power devices. State-of-the-art pruning approaches, either structured or unstructured do not consider the different underlying nature of complexities being exhibited by convolutional layers and follow a training-pruning-retraining pipeline, which results in additional computational overhead. In this work, we propose a novel and computationally efficient pruning pipeline by exploiting the inherent layer-level complexities of CNNs. Unlike typical methods, our proposed complexity-driven algorithm selects a particular layer for filter-pruning based on its contribution to overall network complexity. We follow a procedure that directly trains the pruned model and avoids the computationally complex ranking and fine-tuning steps. Moreover, we define three modes of pruning, namely parameter-aware (PA), FLOPs-aware (FA), and memory-aware (MA), to introduce versatile compression of CNNs. Our results show the competitive performance of our approach in terms of accuracy and acceleration. Lastly, we present a trade-off between different resources and accuracy which can be helpful for developers in making the right decisions in resource-constrained IoT environments.
    SupervisorBot: NLP-Annotated Real-Time Recommendations of Psychotherapy Treatment Strategies with Deep Reinforcement Learning. (arXiv:2208.13077v1 [cs.CL])
    We propose a recommendation system that suggests treatment strategies to a therapist during the psychotherapy session in real-time. Our system uses a turn-level rating mechanism that predicts the therapeutic outcome by computing a similarity score between the deep embedding of a scoring inventory, and the current sentence that the patient is speaking. The system automatically transcribes a continuous audio stream and separates it into turns of the patient and of the therapist using an online registration-free diarization method. The dialogue pairs along with their computed ratings are then fed into a deep reinforcement learning recommender where the sessions are treated as users and the topics are treated as items. Other than evaluating the empirical advantages of the core components on existing datasets, we demonstrate the effectiveness of this system in a web app.
    Interpretable Saliency Maps And Self-Supervised Learning For Generalized Zero Shot Medical Image Classification. (arXiv:2204.01728v2 [eess.IV] UPDATED)
    In many real world medical image classification settings we do not have access to samples of all possible disease classes, while a robust system is expected to give high performance in recognizing novel test data. We propose a generalized zero shot learning (GZSL) method that uses self supervised learning (SSL) for: 1) selecting anchor vectors of different disease classes; and 2) training a feature generator. Our approach does not require class attribute vectors which are available for natural images but not for medical images. SSL ensures that the anchor vectors are representative of each class. SSL is also used to generate synthetic features of unseen classes. Using a simpler architecture, our method matches a state of the art SSL based GZSL method for natural images and outperforms all methods for medical images. Our method is adaptable enough to accommodate class attribute vectors when they are available for natural images.
    Survey of Machine Learning Based Intrusion Detection Methods for Internet of Medical Things. (arXiv:2202.09657v2 [cs.CR] UPDATED)
    The internet of medical things (IoMT) allows the collection of physiological data using sensors, then their transmission to remote servers, which allows physicians and health professionals to analyze these data continuously and permanently and detect disease at an early stage. However, the use of wireless communication to transfer data exposes it to cyberattacks, and the sensitive and private nature of this data may represent a prime interest for attackers. Using traditional security methods on devices with limited storage and computing capacity is ineffective. On the other hand, using machine learning for intrusion detection can provide an adapted security response to the requirements of IoMT systems. In this context, a comprehensive survey on how machine learning (ML)-based intrusion detection systems address security and privacy issues in IoMT systems is performed. For this purpose, the generic three-layer architecture of IoMT and the security requirement of IoMT systems are provided. Then the various threats that can affect IoMT security are presented, and the advantages, disadvantages, methods, and datasets used in each solution based on ML are identified. Finally, some challenges and limitations of applying ML on each layer of IoMT are discussed, which can serve as a future research direction.
    Tricking the Hashing Trick: A Tight Lower Bound on the Robustness of CountSketch to Adaptive Inputs. (arXiv:2207.00956v2 [cs.DS] UPDATED)
    CountSketch and Feature Hashing (the "hashing trick") are popular randomized dimensionality reduction methods that support recovery of $\ell_2$-heavy hitters (keys $i$ where $v_i^2 > \epsilon \|\boldsymbol{v}\|_2^2$) and approximate inner products. When the inputs are {\em not adaptive} (do not depend on prior outputs), classic estimators applied to a sketch of size $O(\ell/\epsilon)$ are accurate for a number of queries that is exponential in $\ell$. When inputs are adaptive, however, an adversarial input can be constructed after $O(\ell)$ queries with the classic estimator and the best known robust estimator only supports $\tilde{O}(\ell^2)$ queries. In this work we show that this quadratic dependence is in a sense inherent: We design an attack that after $O(\ell^2)$ queries produces an adversarial input vector whose sketch is highly biased. Our attack uses "natural" non-adaptive inputs (only the final adversarial input is chosen adaptively) and universally applies with any correct estimator, including one that is unknown to the attacker. In that, we expose inherent vulnerability of this fundamental method.
    Lateral Movement Detection Using User Behavioral Analysis. (arXiv:2208.13524v1 [cs.CR])
    Lateral Movement refers to methods by which threat actors gain initial access to a network and then progressively move through said network collecting key data about assets until they reach the ultimate target of their attack. Lateral Movement intrusions have become more intricate with the increasing complexity and interconnected nature of enterprise networks, and require equally sophisticated detection mechanisms to proactively detect such threats in near real-time at enterprise scale. In this paper, the authors propose a novel, lightweight method for Lateral Movement detection using user behavioral analysis and machine learning. Specifically, this paper introduces a novel methodology for cyber domain-specific feature engineering that identifies Lateral Movement behavior on a per-user basis. Furthermore, the engineered features have also been used to develop two supervised machine learning models for Lateral Movement identification that have demonstrably outperformed models previously seen in literature while maintaining robust performance on datasets with high class imbalance. The models and methodology introduced in this paper have also been designed in collaboration with security operators to be relevant and interpretable in order to maximize impact and minimize time to value as a cyber threat detection toolkit. The underlying goal of the paper is to provide a computationally efficient, domain-specific approach to near real-time Lateral Movement detection that is interpretable and robust to enterprise-scale data volumes and class imbalance.
    Variance Reduced EXTRA and DIGing and Their Optimal Acceleration for Strongly Convex Decentralized Optimization. (arXiv:2009.04373v3 [math.OC] UPDATED)
    We study stochastic decentralized optimization for the problem of training machine learning models with large-scale distributed data. We extend the widely used EXTRA and DIGing methods with variance reduction (VR), and propose two methods: VR-EXTRA and VR-DIGing. The proposed VR-EXTRA requires the time of $O((\kappa_s+n)\log\frac{1}{\epsilon})$ stochastic gradient evaluations and $O((\kappa_b+\kappa_c)\log\frac{1}{\epsilon})$ communication rounds to reach precision $\epsilon$, which are the best complexities among the non-accelerated gradient-type methods, where $\kappa_s$ and $\kappa_b$ are the stochastic condition number and batch condition number for strongly convex and smooth problems, respectively, $\kappa_c$ is the condition number of the communication network, and $n$ is the sample size on each distributed node. The proposed VR-DIGing has a little higher communication cost of $O((\kappa_b+\kappa_c^2)\log\frac{1}{\epsilon})$. Our stochastic gradient computation complexities are the same as the ones of single-machine VR methods, such as SAG, SAGA, and SVRG, and our communication complexities keep the same as those of EXTRA and DIGing, respectively. To further speed up the convergence, we also propose the accelerated VR-EXTRA and VR-DIGing with both the optimal $O((\sqrt{n\kappa_s}+n)\log\frac{1}{\epsilon})$ stochastic gradient computation complexity and $O(\sqrt{\kappa_b\kappa_c}\log\frac{1}{\epsilon})$ communication complexity. Our stochastic gradient computation complexity is also the same as the ones of single-machine accelerated VR methods, such as Katyusha, and our communication complexity keeps the same as those of accelerated full batch decentralized methods, such as MSDA.
    High-Dimensional Distribution Generation Through Deep Neural Networks. (arXiv:2107.12466v3 [cs.LG] UPDATED)
    We show that every $d$-dimensional probability distribution of bounded support can be generated through deep ReLU networks out of a $1$-dimensional uniform input distribution. What is more, this is possible without incurring a cost - in terms of approximation error measured in Wasserstein-distance - relative to generating the $d$-dimensional target distribution from $d$ independent random variables. This is enabled by a vast generalization of the space-filling approach discovered in (Bailey & Telgarsky, 2018). The construction we propose elicits the importance of network depth in driving the Wasserstein distance between the target distribution and its neural network approximation to zero. Finally, we find that, for histogram target distributions, the number of bits needed to encode the corresponding generative network equals the fundamental limit for encoding probability distributions as dictated by quantization theory.
    Speech Emotion Recognition using Supervised Deep Recurrent System for Mental Health Monitoring. (arXiv:2208.12812v1 [eess.AS])
    Understanding human behavior and monitoring mental health are essential to maintaining the community and society's safety. As there has been an increase in mental health problems during the COVID-19 pandemic due to uncontrolled mental health, early detection of mental issues is crucial. Nowadays, the usage of Intelligent Virtual Personal Assistants (IVA) has increased worldwide. Individuals use their voices to control these devices to fulfill requests and acquire different services. This paper proposes a novel deep learning model based on the gated recurrent neural network and convolution neural network to understand human emotion from speech to improve their IVA services and monitor their mental health.
    Reducing Computational Complexity of Neural Networks in Optical Channel Equalization: From Concepts to Implementation. (arXiv:2208.12866v1 [eess.SP])
    In this paper, a new methodology is proposed that allows for the low-complexity development of neural network (NN) based equalizers for the mitigation of impairments in high-speed coherent optical transmission systems. In this work, we provide a comprehensive description and comparison of various deep model compression approaches that have been applied to feed-forward and recurrent NN designs. Additionally, we evaluate the influence these strategies have on the performance of each NN equalizer. Quantization, weight clustering, pruning, and other cutting-edge strategies for model compression are taken into consideration. In this work, we propose and evaluate a Bayesian optimization-assisted compression, in which the hyperparameters of the compression are chosen to simultaneously reduce complexity and improve performance. In conclusion, the trade-off between the complexity of each compression approach and its performance is evaluated by utilizing both simulated and experimental data in order to complete the analysis. By utilizing optimal compression approaches, we show that it is possible to design an NN-based equalizer that is simpler to implement and has better performance than the conventional digital back-propagation (DBP) equalizer with only one step per span. This is accomplished by reducing the number of multipliers used in the NN equalizer after applying the weighted clustering and pruning algorithms. Furthermore, we demonstrate that an equalizer based on NN can also achieve superior performance while still maintaining the same degree of complexity as the full electronic chromatic dispersion compensation block. We conclude our analysis by highlighting open questions and existing challenges, as well as possible future research directions.
    What Does the Gradient Tell When Attacking the Graph Structure. (arXiv:2208.12815v1 [cs.LG])
    Recent studies have proven that graph neural networks are vulnerable to adversarial attacks. Attackers can rely solely on the training labels to disrupt the performance of the agnostic victim model by edge perturbations. Researchers observe that the saliency-based attackers tend to add edges rather than delete them, which is previously explained by the fact that adding edges pollutes the nodes' features by aggregation while removing edges only leads to some loss of information. In this paper, we further prove that the attackers perturb graphs by adding inter-class edges, which also manifests as a reduction in the homophily of the perturbed graph. From this point of view, saliency-based attackers still have room for improvement in capability and imperceptibility. The message passing of the GNN-based surrogate model leads to the oversmoothing of nodes connected by inter-class edges, preventing attackers from obtaining the distinctiveness of node features. To solve this issue, we introduce a multi-hop aggregated message passing to preserve attribute differences between nodes. In addition, we propose a regularization term to restrict the homophily variance to enhance the attack imperceptibility. Experiments verify that our proposed surrogate model improves the attacker's versatility and the regularization term helps to limit the homophily of the perturbed graph.
    Quantifying French Document Complexity. (arXiv:2208.12924v1 [cs.CL])
    Measuring a document's complexity level is an open challenge, particularly when one is working on a diverse corpus of documents rather than comparing several documents on a similar topic or working on a language other than English. In this paper, we define a methodology to measure the complexity of French documents, using a new general and diversified corpus of texts, the "French Canadian complexity level corpus", and a wide range of metrics. We compare different learning algorithms to this task and contrast their performances and their observations on which characteristics of the texts are more significant to their complexity. Our results show that our methodology gives a general-purpose measurement of text complexity in French.
    Speedup deep learning models on GPU by taking advantage of efficient unstructured pruning and bit-width reduction. (arXiv:2112.15445v2 [cs.LG] UPDATED)
    This work is focused on the pruning of some convolutional neural networks (CNNs) and improving theirs efficiency on graphic processing units (GPU) by using a direct sparse algorithm. The Nvidia deep neural network (cuDnn) library is the most effective implementations of deep learning (DL) algorithms for GPUs. GPUs are the most commonly used accelerators for deep learning computations. One of the most common techniques for improving the efficiency of CNN models is weight pruning and quantization. There are two main types of pruning: structural and non-structural. The first enables much easier acceleration on many type of accelerators, but with this type it is difficult to achieve a sparsity level and accuracy as high as that obtained with the second type. Non-structural pruning with retraining can generate a weight tensors up to 90% or more of sparsity in some deep CNN models. In this article the pruning algorithm is presented which makes it possible to achieve high sparsity levels without accuracy drop. In the next stage the linear and non-linear quantization is adapted for further time and footprint reduction. This paper is an extended of previously published paper concerning effective pruning techniques and present real models pruned with high sparsities and reduced precision which can achieve better performance than the CuDnn library.
    Mixtures of Gaussian Process Experts with SMC$^2$. (arXiv:2208.12830v1 [stat.ML])
    Gaussian processes are a key component of many flexible statistical and machine learning models. However, they exhibit cubic computational complexity and high memory constraints due to the need of inverting and storing a full covariance matrix. To circumvent this, mixtures of Gaussian process experts have been considered where data points are assigned to independent experts, reducing the complexity by allowing inference based on smaller, local covariance matrices. Moreover, mixtures of Gaussian process experts substantially enrich the model's flexibility, allowing for behaviors such as non-stationarity, heteroscedasticity, and discontinuities. In this work, we construct a novel inference approach based on nested sequential Monte Carlo samplers to simultaneously infer both the gating network and Gaussian process expert parameters. This greatly improves inference compared to importance sampling, particularly in settings when a stationary Gaussian process is inappropriate, while still being thoroughly parallelizable.
    Abnormal Local Clustering in Federated Learning. (arXiv:2208.12813v1 [cs.LG])
    Federated learning is a model for privacy without revealing private data by transfer models instead of personal and private data from local client devices. While, in the global model, it's crucial to recognize each local data is normal. This paper suggests one method to separate normal locals and abnormal locals by Euclidean similarity clustering of vectors extracted by inputting dummy data in local models. In a federated classification model, this method divided locals into normal and abnormal.
    DETERRENT: Detecting Trojans using Reinforcement Learning. (arXiv:2208.12878v1 [cs.LG])
    Insertion of hardware Trojans (HTs) in integrated circuits is a pernicious threat. Since HTs are activated under rare trigger conditions, detecting them using random logic simulations is infeasible. In this work, we design a reinforcement learning (RL) agent that circumvents the exponential search space and returns a minimal set of patterns that is most likely to detect HTs. Experimental results on a variety of benchmarks demonstrate the efficacy and scalability of our RL agent, which obtains a significant reduction ($169\times$) in the number of test patterns required while maintaining or improving coverage ($95.75\%$) compared to the state-of-the-art techniques.
    Riesz-Quincunx-UNet Variational Auto-Encoder for Satellite Image Denoising. (arXiv:2208.12810v1 [eess.IV])
    Multiresolution deep learning approaches, such as the U-Net architecture, have achieved high performance in classifying and segmenting images. However, these approaches do not provide a latent image representation and cannot be used to decompose, denoise, and reconstruct image data. The U-Net and other convolutional neural network (CNNs) architectures commonly use pooling to enlarge the receptive field, which usually results in irreversible information loss. This study proposes to include a Riesz-Quincunx (RQ) wavelet transform, which combines 1) higher-order Riesz wavelet transform and 2) orthogonal Quincunx wavelets (which have both been used to reduce blur in medical images) inside the U-net architecture, to reduce noise in satellite images and their time-series. In the transformed feature space, we propose a variational approach to understand how random perturbations of the features affect the image to further reduce noise. Combining both approaches, we introduce a hybrid RQUNet-VAE scheme for image and time series decomposition used to reduce noise in satellite imagery. We present qualitative and quantitative experimental results that demonstrate that our proposed RQUNet-VAE was more effective at reducing noise in satellite imagery compared to other state-of-the-art methods. We also apply our scheme to several applications for multi-band satellite images, including: image denoising, image and time-series decomposition by diffusion and image segmentation.
    Neural Tangent Kernel: A Survey. (arXiv:2208.13614v1 [cs.LG])
    A seminal work [Jacot et al., 2018] demonstrated that training a neural network under specific parameterization is equivalent to performing a particular kernel method as width goes to infinity. This equivalence opened a promising direction for applying the results of the rich literature on kernel methods to neural nets which were much harder to tackle. The present survey covers key results on kernel convergence as width goes to infinity, finite-width corrections, applications, and a discussion of the limitations of the corresponding method.
    PRIME: Uncovering Circadian Oscillation Patterns and Associations with AD in Untimed Genome-wide Gene Expression across Multiple Brain Regions. (arXiv:2208.12811v1 [q-bio.GN])
    The disruption of circadian rhythm is a cardinal symptom for Alzheimer's disease (AD) patients. The full circadian rhythm orchestration of gene expression in the human brain and its inherent associations with AD remain largely unknown. We present a novel comprehensive approach, PRIME, to detect and analyze rhythmic oscillation patterns in untimed high-dimensional gene expression data across multiple datasets. To demonstrate the utility of PRIME, firstly, we validate it by a time course expression dataset from mouse liver as a cross-species and cross-organ validation. Then, we apply it to study oscillation patterns in untimed genome-wide gene expression from 19 human brain regions of controls and AD patients. Our findings reveal clear, synchronized oscillation patterns in 15 pairs of brain regions of control, while these oscillation patterns either disappear or dim for AD. It is worth noting that PRIME discovers the circadian rhythmic patterns without requiring the sample's timestamps. The codes for PRIME, along with codes to reproduce the figures in this paper, are available at https://github.com/xinxingwu-uk/PRIME.
    Bayesian Continual Learning via Spiking Neural Networks. (arXiv:2208.13723v1 [cs.NE])
    Among the main features of biological intelligence are energy efficiency, capacity for continual adaptation, and risk management via uncertainty quantification. Neuromorphic engineering has been thus far mostly driven by the goal of implementing energy-efficient machines that take inspiration from the time-based computing paradigm of biological brains. In this paper, we take steps towards the design of neuromorphic systems that are capable of adaptation to changing learning tasks, while producing well-calibrated uncertainty quantification estimates. To this end, we derive online learning rules for spiking neural networks (SNNs) within a Bayesian continual learning framework. In it, each synaptic weight is represented by parameters that quantify the current epistemic uncertainty resulting from prior knowledge and observed data. The proposed online rules update the distribution parameters in a streaming fashion as data are observed. We instantiate the proposed approach for both real-valued and binary synaptic weights. Experimental results using Intel's Lava platform show the merits of Bayesian over frequentist learning in terms of capacity for adaptation and uncertainty quantification.
    Invertible Gaussian Reparameterization: Revisiting the Gumbel-Softmax. (arXiv:1912.09588v5 [stat.ML] UPDATED)
    The Gumbel-Softmax is a continuous distribution over the simplex that is often used as a relaxation of discrete distributions. Because it can be readily interpreted and easily reparameterized, it enjoys widespread use. We propose a modular and more flexible family of reparameterizable distributions where Gaussian noise is transformed into a one-hot approximation through an invertible function. This invertible function is composed of a modified softmax and can incorporate diverse transformations that serve different specific purposes. For example, the stick-breaking procedure allows us to extend the reparameterization trick to distributions with countably infinite support, thus enabling the use of our distribution along nonparametric models, or normalizing flows let us increase the flexibility of the distribution. Our construction enjoys theoretical advantages over the Gumbel-Softmax, such as closed form KL, and significantly outperforms it in a variety of experiments. Our code is available at https://github.com/cunningham-lab/igr.
    Interpretable (not just posthoc-explainable) medical claims modeling for discharge placement to prevent avoidable all-cause readmissions or death. (arXiv:2208.12814v1 [cs.CY])
    This manuscript addresses the simultaneous problems of predicting all-cause inpatient readmission or death after discharge, and quantifying the impact of discharge placement in preventing these adverse events. To this end, we developed an inherently interpretable multilevel Bayesian modeling framework inspired by the piecewise linearity of ReLU-activated deep neural networks. In a survival model, we explicitly adjust for confounding in quantifying local average treatment effects for discharge placement interventions. We trained the model on a 5% sample of Medicare beneficiaries from 2008 and 2011, and then tested the model on 2012 claims. Evaluated on classification accuracy for 30-day all-cause unplanned readmissions (defined using official CMS methodology) or death, the model performed similarly against XGBoost, logistic regression (after feature engineering), and a Bayesian deep neural network trained on the same data. Tested on the 30-day classification task of predicting readmissions or death using left-out future data, the model achieved an AUROC of approximately 0.76 and and AUPRC of approximately 0.50 (relative to an overall positively rate in the testing data of 18%), demonstrating how one need not sacrifice interpretability for accuracy. Additionally, the model had a testing AUROC of 0.78 on the classification of 90-day all-cause unplanned readmission or death. We easily peer into our inherently interpretable model, summarizing its main findings. Additionally, we demonstrate how the black-box posthoc explainer tool SHAP generates explanations that are not supported by the fitted model -- and if taken at face value does not offer enough context to make a model actionable.
    Goal-Conditioned Q-Learning as Knowledge Distillation. (arXiv:2208.13298v1 [cs.LG])
    Many applications of reinforcement learning can be formalized as goal-conditioned environments, where, in each episode, there is a "goal" that affects the rewards obtained during that episode but does not affect the dynamics. Various techniques have been proposed to improve performance in goal-conditioned environments, such as automatic curriculum generation and goal relabeling. In this work, we explore a connection between off-policy reinforcement learning in goal-conditioned settings and knowledge distillation. In particular: the current Q-value function and the target Q-value estimate are both functions of the goal, and we would like to train the Q-value function to match its target for all goals. We therefore apply Gradient-Based Attention Transfer (Zagoruyko and Komodakis 2017), a knowledge distillation technique, to the Q-function update. We empirically show that this can improve the performance of goal-conditioned off-policy reinforcement learning when the space of goals is high-dimensional. We also show that this technique can be adapted to allow for efficient learning in the case of multiple simultaneous sparse goals, where the agent can attain a reward by achieving any one of a large set of objectives, all specified at test time. Finally, to provide theoretical support, we give examples of classes of environments where (under some assumptions) standard off-policy algorithms require at least O(d^2) observed transitions to learn an optimal policy, while our proposed technique requires only O(d) transitions, where d is the dimensionality of the goal and state space.
    Approach of variable clustering and compression for learning large Bayesian networks. (arXiv:2208.13605v1 [stat.ML])
    This paper describes a new approach for learning structures of large Bayesian networks based on blocks resulting from feature space clustering. This clustering is obtained using normalized mutual information. And the subsequent aggregation of blocks is done using classical learning methods except that they are input with compressed information about combinations of feature values for each block. Validation of this approach is done for Hill-Climbing as a graph enumeration algorithm for two score functions: BIC and MI. In this way, potentially parallelizable block learning can be implemented even for those score functions that are considered unsuitable for parallelizable learning. The advantage of the approach is evaluated in terms of speed of work as well as the accuracy of the found structures.
    Normality-Guided Distributional Reinforcement Learning for Continuous Control. (arXiv:2208.13125v1 [cs.LG])
    Learning a predictive model of the mean return, or value function, plays a critical role in many reinforcement learning algorithms. Distributional reinforcement learning (DRL) methods instead model the value distribution, which has been shown to improve performance in many settings. In this paper, we model the value distribution as approximately normal using the Markov Chain central limit theorem. We analytically compute quantile bars to provide a new DRL target that is informed by the decrease in standard deviation that occurs over the course of an episode. In addition, we suggest an exploration strategy based on how closely the learned value distribution resembles the target normal distribution to make the value function more accurate for better policy improvement. The approach we outline is compatible with many DRL structures. We use proximal policy optimization as a testbed and show that both the normality-guided target and exploration bonus produce performance improvements. We demonstrate our method outperforms DRL baselines on a number of continuous control tasks.
    A Missing Value Filling Model Based on Feature Fusion Enhanced Autoencoder. (arXiv:2208.13495v1 [cs.LG])
    With the advent of the big data era, the data quality problem is becoming more and more crucial. Among many factors, data with missing values is one primary issue, and thus developing effective imputation models is a key topic in the research community. Recently, a major research direction is to employ neural network models such as selforganizing mappings or automatic encoders for filling missing values. However, these classical methods can hardly discover correlation features and common features simultaneously among data attributes. Especially,it is a very typical problem for classical autoencoders that they often learn invalid constant mappings, thus dramatically hurting the filling performance. To solve the above problems, we propose and develop a missing-value-filling model based on a feature-fusion-enhanced autoencoder. We first design and incorporate into an autoencoder a hidden layer that consists of de-tracking neurons and radial basis function neurons, which can enhance the ability to learn correlated features and common features. Besides, we develop a missing value filling strategy based on dynamic clustering (MVDC) that is incorporated into an iterative optimization process. This design can enhance the multi-dimensional feature fusion ability and thus improves the dynamic collaborative missing-value-filling performance. The effectiveness of our model is validated by experimental comparisons to many missing-value-filling methods that are tested on seven datasets with different missing rates.
    Decentralized Coordination in Partially Observable Queueing Networks. (arXiv:2208.13621v1 [cs.LG])
    We consider communication in a fully cooperative multi-agent system, where the agents have partial observation of the environment and must act jointly to maximize the overall reward. We have a discrete-time queueing network where agents route packets to queues based only on the partial information of the current queue lengths. The queues have limited buffer capacity, so packet drops happen when they are sent to a full queue. In this work, we implemented a communication channel for the agents to share their information in order to reduce the packet drop rate. For efficient information sharing we use an attention-based communication model, called ATVC, to select informative messages from other agents. The agents then infer the state of queues using a combination of the variational auto-encoder, VAE, and product-of-experts, PoE, model. Ultimately, the agents learn what they need to communicate and with whom, instead of communicating all the time with everyone. We also show empirically that ATVC is able to infer the true state of the queues and leads to a policy which outperforms existing baselines.
    Federated Sparse Training: Lottery Aware Model Compression for Resource Constrained Edge. (arXiv:2208.13092v1 [cs.LG])
    Limited computation and communication capabilities of clients pose significant challenges in federated learning (FL) over resource-limited edge nodes. A potential solution to this problem is to deploy off-the-shelf sparse learning algorithms that train a binary sparse mask on each client with the expectation of training a consistent sparse server mask. However, as we investigate in this paper, such naive deployments result in a significant accuracy drop compared to FL with dense models, especially under low client's resource budget. In particular, our investigations reveal a serious lack of consensus among the trained masks on clients, which prevents convergence on the server mask and potentially leads to a substantial drop in model performance. Based on such key observations, we propose federated lottery aware sparsity hunting (FLASH), a unified sparse learning framework to make the server win a lottery in terms of a sparse sub-model, which can greatly improve performance under highly resource-limited client settings. Moreover, to address the issue of device heterogeneity, we leverage our findings to propose hetero-FLASH, where clients can have different target sparsity budgets based on their device resource limits. Extensive experimental evaluations with multiple models on various datasets (both IID and non-IID) show superiority of our models in yielding up to $\mathord{\sim}10.1\%$ improved accuracy with $\mathord{\sim}10.26\times$ fewer communication costs, compared to existing alternatives, at similar hyperparameter settings.
    Leachable Component Clustering. (arXiv:2208.13217v1 [cs.LG])
    Clustering attempts to partition data instances into several distinctive groups, while the similarities among data belonging to the common partition can be principally reserved. Furthermore, incomplete data frequently occurs in many realworld applications, and brings perverse influence on pattern analysis. As a consequence, the specific solutions to data imputation and handling are developed to conduct the missing values of data, and independent stage of knowledge exploitation is absorbed for information understanding. In this work, a novel approach to clustering of incomplete data, termed leachable component clustering, is proposed. Rather than existing methods, the proposed method handles data imputation with Bayes alignment, and collects the lost patterns in theory. Due to the simple numeric computation of equations, the proposed method can learn optimized partitions while the calculation efficiency is held. Experiments on several artificial incomplete data sets demonstrate that, the proposed method is able to present superior performance compared with other state-of-the-art algorithms.
    Learning Clinical Concepts for Predicting Risk of Progression to Severe COVID-19. (arXiv:2208.13126v1 [cs.LG])
    With COVID-19 now pervasive, identification of high-risk individuals is crucial. Using data from a major healthcare provider in Southwestern Pennsylvania, we develop survival models predicting severe COVID-19 progression. In this endeavor, we face a tradeoff between more accurate models relying on many features and less accurate models relying on a few features aligned with clinician intuition. Complicating matters, many EHR features tend to be under-coded, degrading the accuracy of smaller models. In this study, we develop two sets of high-performance risk scores: (i) an unconstrained model built from all available features; and (ii) a pipeline that learns a small set of clinical concepts before training a risk predictor. Learned concepts boost performance over the corresponding features (C-index 0.858 vs. 0.844) and demonstrate improvements over (i) when evaluated out-of-sample (subsequent time periods). Our models outperform previous works (C-index 0.844-0.872 vs. 0.598-0.810).
    FFCNN: Fast FPGA based Acceleration for Convolution neural network inference. (arXiv:2208.13250v1 [cs.LG])
    We present a new efficient OpenCL-based Accelerator for large scale Convolutional Neural Networks called Fast Inference on FPGAs for Convolution Neural Network (FFCNN). FFCNN is based on a deeply pipelined OpenCL kernels architecture. As pointed out before, high-level synthesis tools such as the OpenCL framework can easily port codes originally designed for CPUs and GPUs to FPGAs, but it is still difficult to make OpenCL codes run efficiently on FPGAs. This work aims to propose an efficient FPGA implementation of OpenCL High-Performance Computing Applications. To do so, a Data reuse and task mapping techniques are also presented to improve design efficiency. In addition, the following motivations were taken into account when developing FFCNN: 1) FFCNN has been designed to be easily implemented on Intel OpenCL SDK based FPGA design flow. 2) In FFFCN, different techniques have been integrated to improve the memory band with and throughput. A performance analysis is conducted on two deep CNN for Large-Scale Images classification. The obtained results, and the comparison with other works designed to accelerate the same types of architectures, show the efficiency and the competitiveness of the proposed accelerator design by significantly improved performance and resource utilization.
    Towards Reliable Simulation-Based Inference with Balanced Neural Ratio Estimation. (arXiv:2208.13624v1 [stat.ML])
    Modern approaches for simulation-based inference rely upon deep learning surrogates to enable approximate inference with computer simulators. In practice, the estimated posteriors' computational faithfulness is, however, rarely guaranteed. For example, Hermans et al. (2021) show that current simulation-based inference algorithms can produce posteriors that are overconfident, hence risking false inferences. In this work, we introduce Balanced Neural Ratio Estimation (BNRE), a variation of the NRE algorithm designed to produce posterior approximations that tend to be more conservative, hence improving their reliability, while sharing the same Bayes optimal solution. We achieve this by enforcing a balancing condition that increases the quantified uncertainty in small simulation budget regimes while still converging to the exact posterior as the budget increases. We provide theoretical arguments showing that BNRE tends to produce posterior surrogates that are more conservative than NRE's. We evaluate BNRE on a wide variety of tasks and show that it produces conservative posterior surrogates on all tested benchmarks and simulation budgets. Finally, we emphasize that BNRE is straightforward to implement over NRE and does not introduce any computational overhead.
    A Path Towards Clinical Adaptation of Accelerated MRI. (arXiv:2208.12835v1 [eess.IV])
    Accelerated MRI reconstructs images of clinical anatomies from sparsely sampled signal data to reduce patient scan times. While recent works have leveraged deep learning to accomplish this task, such approaches have often only been explored in simulated environments where there is no signal corruption or resource limitations. In this work, we explore augmentations to neural network MRI image reconstructors to enhance their clinical relevancy. Namely, we propose a ConvNet model for detecting sources of image artifacts that achieves a classifer $F_2$ score of $79.1\%$. We also demonstrate that training reconstructors on MR signal data with variable acceleration factors can improve their average performance during a clinical patient scan by up to $2\%$. We offer a loss function to overcome catastrophic forgetting when models learn to reconstruct MR images of multiple anatomies and orientations. Finally, we propose a method for using simulated phantom data to pre-train reconstructors in situations with limited clinically acquired datasets and compute capabilities. Our results provide a potential path forward for clinical adaptation of accelerated MRI.
    OpenFWI: Large-Scale Multi-Structural Benchmark Datasets for Seismic Full Waveform Inversion. (arXiv:2111.02926v4 [cs.LG] UPDATED)
    Full waveform inversion (FWI) is widely used in geophysics to reconstruct high-resolution velocity maps from seismic data. The recent success of data-driven FWI methods results in a rapidly increasing demand for open datasets to serve the geophysics community. We present OpenFWI, a collection of large-scale multi-structural benchmark datasets, to facilitate diversified, rigorous, and reproducible research on FWI. In particular, OpenFWI consists of 12 datasets (2.1TB in total) synthesized from multiple sources. It encompasses diverse domains in geophysics (interface, fault, CO2 reservoir, etc.), covers different geological subsurface structures (flat, curve, etc.), and contains various amounts of data samples (2K - 67K). It also includes a dataset for 3D FWI. Moreover, we use OpenFWI to perform benchmarking over four deep learning methods, covering both supervised and unsupervised learning regimes. Along with the benchmarks, we implement additional experiments, including physics-driven methods, complexity analysis, generalization study, uncertainty quantification, and so on, to sharpen our understanding of datasets and methods. The studies either provide valuable insights into the datasets and the performance, or uncover their current limitations. We hope OpenFWI supports prospective research on FWI and inspires future open-source efforts on AI for science. All datasets and related information can be accessed through our website at https://openfwi-lanl.github.io/
    Continual Semi-Supervised Learning through Contrastive Interpolation Consistency. (arXiv:2108.06552v3 [stat.ML] UPDATED)
    Continual Learning (CL) investigates how to train Deep Networks on a stream of tasks without incurring forgetting. CL settings proposed in literature assume that every incoming example is paired with ground-truth annotations. However, this clashes with many real-world applications: gathering labeled data, which is in itself tedious and expensive, becomes infeasible when data flow as a stream. This work explores Continual Semi-Supervised Learning (CSSL): here, only a small fraction of labeled input examples are shown to the learner. We assess how current CL methods (e.g.: EWC, LwF, iCaRL, ER, GDumb, DER) perform in this novel and challenging scenario, where overfitting entangles forgetting. Subsequently, we design a novel CSSL method that exploits metric learning and consistency regularization to leverage unlabeled examples while learning. We show that our proposal exhibits higher resilience to diminishing supervision and, even more surprisingly, relying only on 25% supervision suffices to outperform SOTA methods trained under full supervision.
    Self-Supervised Adversarial Example Detection by Disentangled Representation. (arXiv:2105.03689v4 [cs.CV] UPDATED)
    Deep learning models are known to be vulnerable to adversarial examples that are elaborately designed for malicious purposes and are imperceptible to the human perceptual system. Autoencoder, when trained solely over benign examples, has been widely used for (self-supervised) adversarial detection based on the assumption that adversarial examples yield larger reconstruction errors. However, because lacking adversarial examples in its training and the too strong generalization ability of autoencoder, this assumption does not always hold true in practice. To alleviate this problem, we explore how to detect adversarial examples with disentangled label/semantic features under the autoencoder structure. Specifically, we propose Disentangled Representation-based Reconstruction (DRR). In DRR, we train an autoencoder over both correctly paired label/semantic features and incorrectly paired label/semantic features to reconstruct benign and counterexamples. This mimics the behavior of adversarial examples and can reduce the unnecessary generalization ability of autoencoder. We compare our method with the state-of-the-art self-supervised detection methods under different adversarial attacks and different victim models, and it exhibits better performance in various metrics (area under the ROC curve, true positive rate, and true negative rate) for most attack settings. Though DRR is initially designed for visual tasks only, we demonstrate that it can be easily extended for natural language tasks as well. Notably, different from other autoencoder-based detectors, our method can provide resistance to the adaptive adversary.
    A scalable pipeline for COVID-19: the case study of Germany, Czechia and Poland. (arXiv:2208.12928v1 [cs.DB])
    Throughout the coronavirus disease 2019 (COVID-19) pandemic, decision makers have relied on forecasting models to determine and implement non-pharmaceutical interventions (NPI). In building the forecasting models, continuously updated datasets from various stakeholders including developers, analysts, and testers are required to provide precise predictions. Here we report the design of a scalable pipeline which serves as a data synchronization to support inter-country top-down spatiotemporal observations and forecasting models of COVID-19, named the where2test, for Germany, Czechia and Poland. We have built an operational data store (ODS) using PostgreSQL to continuously consolidate datasets from multiple data sources, perform collaborative work, facilitate high performance data analysis, and trace changes. The ODS has been built not only to store the COVID-19 data from Germany, Czechia, and Poland but also other areas. Employing the dimensional fact model, a schema of metadata is capable of synchronizing the various structures of data from those regions, and is scalable to the entire world. Next, the ODS is populated using batch Extract, Transfer, and Load (ETL) jobs. The SQL queries are subsequently created to reduce the need for pre-processing data for users. The data can then support not only forecasting using a version-controlled Arima-Holt model and other analyses to support decision making, but also risk calculator and optimisation apps. The data synchronization runs at a daily interval, which is displayed at https://www.where2test.de.
    fMBN-E: Efficient Unsupervised Network Structure Ensemble and Selection for Clustering. (arXiv:2107.02071v5 [cs.LG] UPDATED)
    It is known that unsupervised nonlinear dimensionality reduction and clustering is sensitive to the selection of hyperparameters, particularly for deep learning based methods, which hinders its practical use. How to select a proper network structure that may be dramatically different in different applications is a hard issue for deep models, given little prior knowledge of data. In this paper, we aim to automatically determine the optimal network structure of a deep model, named multilayer bootstrap networks (MBN), via simple ensemble learning and selection techniques. Specifically, we first propose an MBN ensemble (MBN-E) algorithm which concatenates the sparse outputs of a set of MBN base models with different network structures into a new representation. Then, we take the new representation produced by MBN-E as a reference for selecting the optimal MBN base models. Moreover, we propose a fast version of MBN-E (fMBN-E), which is not only theoretically even faster than a single standard MBN but also does not increase the estimation error of MBN-E. Importantly, MBN-E and its ensemble selection techniques maintain the simple formulation of MBN that is based on one-nearest-neighbor learning. Empirically, comparing to a number of advanced deep clustering methods and as many as 20 representative unsupervised ensemble learning and selection methods, the proposed methods reach the state-of-the-art performance without manual hyperparameter tuning. fMBN-E is empirically even hundreds of times faster than MBN-E without suffering performance degradation. The applications to image segmentation and graph data mining further demonstrate the advantage of the proposed methods.
    CH-MARL: A Multimodal Benchmark for Cooperative, Heterogeneous Multi-Agent Reinforcement Learning. (arXiv:2208.13626v1 [cs.AI])
    We propose a multimodal (vision-and-language) benchmark for cooperative and heterogeneous multi-agent learning. We introduce a benchmark multimodal dataset with tasks involving collaboration between multiple simulated heterogeneous robots in a rich multi-room home environment. We provide an integrated learning framework, multimodal implementations of state-of-the-art multi-agent reinforcement learning techniques, and a consistent evaluation protocol. Our experiments investigate the impact of different modalities on multi-agent learning performance. We also introduce a simple message passing method between agents. The results suggest that multimodality introduces unique challenges for cooperative multi-agent learning and there is significant room for advancing multi-agent reinforcement learning methods in such settings.
    Minute ventilation measurement using Plethysmographic Imaging and lighting parameters. (arXiv:2208.13319v1 [cs.LG])
    Breathing disorders such as sleep apnea is a critical disorder that affects a large number of individuals due to the insufficient capacity of the lungs to contain/exchange oxygen and carbon dioxide to ensure that the body is in the stable state of homeostasis. Respiratory Measurements such as minute ventilation can be used in correlation with other physiological measurements such as heart rate and heart rate variability for remote monitoring of health and detecting symptoms of such breathing related disorders. In this work, we formulate a deep learning based approach to measure remote ventilation on a private dataset. The dataset will be made public upon acceptance of this work. We use two versions of a deep neural network to estimate the minute ventilation from data streams obtained through wearable heart rate and respiratory devices. We demonstrate that the simple design of our pipeline - which includes lightweight deep neural networks - can be easily incorporate into real time health monitoring systems.
    Plant Species Recognition with Optimized 3D Polynomial Neural Networks and Variably Overlapping Time-Coherent Sliding Window. (arXiv:2203.02611v2 [cs.CV] UPDATED)
    Recently, the EAGL-I system was developed to rapidly create massive labeled datasets of plants intended to be commonly used by farmers and researchers to create AI-driven solutions in agriculture. As a result, a publicly available plant species recognition dataset composed of 40,000 images with different sizes consisting of 8 plant species was created with the system in order to demonstrate its capabilities. This paper proposes a novel method, called Variably Overlapping Time-Coherent Sliding Window (VOTCSW), that transforms a dataset composed of images with variable size to a 3D representation with fixed size that is suitable for convolutional neural networks, and demonstrates that this representation is more informative than resizing the images of the dataset to a given size. We theoretically formalized the use cases of the method as well as its inherent properties and we proved that it has an oversampling and a regularization effect on the data. By combining the VOTCSW method with the 3D extension of a recently proposed machine learning model called 1-Dimensional Polynomial Neural Networks, we were able to create a model that achieved a state-of-the-art accuracy of 99.9% on the dataset created by the EAGL-I system, surpassing well-known architectures such as ResNet and Inception. In addition, we created a heuristic algorithm that enables the degree reduction of any pre-trained N-Dimensional Polynomial Neural Network and which compresses it without altering its performance, thus making the model faster and lighter. Furthermore, we established that the currently available dataset could not be used for machine learning in its present form, due to a substantial class imbalance between the training set and the test set. Hence, we created a specific preprocessing and a model development framework that enabled us to improve the accuracy from 49.23% to 99.9%.
    A Hybrid Iterative Numerical Transferable Solver (HINTS) for PDEs Based on Deep Operator Network and Relaxation Methods. (arXiv:2208.13273v1 [math.NA])
    Iterative solvers of linear systems are a key component for the numerical solutions of partial differential equations (PDEs). While there have been intensive studies through past decades on classical methods such as Jacobi, Gauss-Seidel, conjugate gradient, multigrid methods and their more advanced variants, there is still a pressing need to develop faster, more robust and reliable solvers. Based on recent advances in scientific deep learning for operator regression, we propose HINTS, a hybrid, iterative, numerical, and transferable solver for differential equations. HINTS combines standard relaxation methods and the Deep Operator Network (DeepONet). Compared to standard numerical solvers, HINTS is capable of providing faster solutions for a wide class of differential equations, while preserving the accuracy close to machine zero. Through an eigenmode analysis, we find that the individual solvers in HINTS target distinct regions in the spectrum of eigenmodes, resulting in a uniform convergence rate and hence exceptional performance of the hybrid solver overall. Moreover, HINTS applies to equations in multidimensions, and is flexible with regards to computational domain and transferable to different discretizations.  ( 3 min )
    Tensor Decomposition based Personalized Federated Learning. (arXiv:2208.12959v1 [cs.LG])
    Federated learning (FL) is a new distributed machine learning framework that can achieve reliably collaborative training without collecting users' private data. However, due to FL's frequent communication and average aggregation strategy, they experience challenges scaling to statistical diversity data and large-scale models. In this paper, we propose a personalized FL framework, named Tensor Decomposition based Personalized Federated learning (TDPFed), in which we design a novel tensorized local model with tensorized linear layers and convolutional layers to reduce the communication cost. TDPFed uses a bi-level loss function to decouple personalized model optimization from the global model learning by controlling the gap between the personalized model and the tensorized local model. Moreover, an effective distributed learning strategy and two different model aggregation strategies are well designed for the proposed TDPFed framework. Theoretical convergence analysis and thorough experiments demonstrate that our proposed TDPFed framework achieves state-of-the-art performance while reducing the communication cost.  ( 2 min )
    Sub-mW Neuromorphic SNN audio processing applications with Rockpool and Xylo. (arXiv:2208.12991v1 [cs.NE])
    Spiking Neural Networks (SNNs) provide an efficient computational mechanism for temporal signal processing, especially when coupled with low-power SNN inference ASICs. SNNs have been historically difficult to configure, lacking a general method for finding solutions for arbitrary tasks. In recent years, gradient-descent optimization methods have been applied to SNNs with increasing ease. SNNs and SNN inference processors therefore offer a good platform for commercial low-power signal processing in energy constrained environments without cloud dependencies. However, to date these methods have not been accessible to ML engineers in industry, requiring graduate-level training to successfully configure a single SNN application. Here we demonstrate a convenient high-level pipeline to design, train and deploy arbitrary temporal signal processing applications to sub-mW SNN inference hardware. We apply a new straightforward SNN architecture designed for temporal signal processing, using a pyramid of synaptic time constants to extract signal features at a range of temporal scales. We demonstrate this architecture on an ambient audio classification task, deployed to the Xylo SNN inference processor in streaming mode. Our application achieves high accuracy (98%) and low latency (100ms) at low power (<4muW inference power). Our approach makes training and deploying SNN applications available to ML engineers with general NN backgrounds, without requiring specific prior experience with spiking NNs. We intend for our approach to make Neuromorphic hardware and SNNs an attractive choice for commercial low-power and edge signal processing applications.  ( 3 min )
    IDP-PGFE: An Interpretable Disruption Predictor based on Physics-Guided Feature Extraction. (arXiv:2208.13197v1 [physics.plasm-ph])
    Disruption prediction has made rapid progress in recent years, especially in machine learning (ML)-based methods. Understanding why a predictor makes a certain prediction can be as crucial as the prediction's accuracy for future tokamak disruption predictors. The purpose of most disruption predictors is accuracy or cross-machine capability. However, if a disruption prediction model can be interpreted, it can tell why certain samples are classified as disruption precursors. This allows us to tell the types of incoming disruption and gives us insight into the mechanism of disruption. This paper designs a disruption predictor called Interpretable Disruption Predictor based On Physics-guided feature extraction (IDP-PGFE) on J-TEXT. The prediction performance of the model is effectively improved by extracting physics-guided features. A high-performance model is required to ensure the validity of the interpretation results. The interpretability study of IDP-PGFE provides an understanding of J-TEXT disruption and is generally consistent with existing comprehension of disruption. IDP-PGFE has been applied to the disruption due to continuously increasing density towards density limit experiments on J-TEXT. The time evolution of the PGFE features contribution demonstrates that the application of ECRH triggers radiation-caused disruption, which lowers the density at disruption. While the application of RMP indeed raises the density limit in J-TEXT. The interpretability study guides intuition on the physical mechanisms of density limit disruption that RMPs affect not only the MHD instabilities but also the radiation profile, which delays density limit disruption.  ( 3 min )
    FedEgo: Privacy-preserving Personalized Federated Graph Learning with Ego-graphs. (arXiv:2208.13685v1 [cs.LG])
    As special information carriers containing both structure and feature information, graphs are widely used in graph mining, e.g., Graph Neural Networks (GNNs). However, in some practical scenarios, graph data are stored separately in multiple distributed parties, which may not be directly shared due to conflicts of interest. Hence, federated graph neural networks are proposed to address such data silo problems while preserving the privacy of each party (or client). Nevertheless, different graph data distributions among various parties, which is known as the statistical heterogeneity, may degrade the performance of naive federated learning algorithms like FedAvg. In this paper, we propose FedEgo, a federated graph learning framework based on ego-graphs to tackle the challenges above, where each client will train their local models while also contributing to the training of a global model. FedEgo applies GraphSAGE over ego-graphs to make full use of the structure information and utilizes Mixup for privacy concerns. To deal with the statistical heterogeneity, we integrate personalization into learning and propose an adaptive mixing coefficient strategy that enables clients to achieve their optimal personalization. Extensive experimental results and in-depth analysis demonstrate the effectiveness of FedEgo.  ( 2 min )
    Spatio-Temporal Wind Speed Forecasting using Graph Networks and Novel Transformer Architectures. (arXiv:2208.13585v1 [cs.LG])
    To improve the security and reliability of wind energy production, short-term forecasting has become of utmost importance. This study focuses on multi-step spatio-temporal wind speed forecasting for the Norwegian continental shelf. A graph neural network (GNN) architecture was used to extract spatial dependencies, with different update functions to learn temporal correlations. These update functions were implemented using different neural network architectures. One such architecture, the Transformer, has become increasingly popular for sequence modelling in recent years. Various alterations of the original architecture have been proposed to better facilitate time-series forecasting, of which this study focused on the Informer, LogSparse Transformer and Autoformer. This is the first time the LogSparse Transformer and Autoformer have been applied to wind forecasting and the first time any of these or the Informer have been formulated in a spatio-temporal setting for wind forecasting. By comparing against spatio-temporal Long Short-Term Memory (LSTM) and Multi-Layer Perceptron (MLP) models, the study showed that the models using the altered Transformer architectures as update functions in GNNs were able to outperform these. Furthermore, we propose the Fast Fourier Transformer (FFTransformer), which is a novel Transformer architecture based on signal decomposition and consists of two separate streams that analyse trend and periodic components separately. The FFTransformer and Autoformer were found to achieve superior results for the 10-minute and 1-hour ahead forecasts, with the FFTransformer significantly outperforming all other models for the 4-hour ahead forecasts. Finally, by varying the degree of connectivity for the graph representations, the study explicitly demonstrates how all models were able to leverage spatial dependencies to improve local short-term wind speed forecasting.  ( 3 min )
    Combating high variance in Data-Scarce Implicit Hate Speech Classification. (arXiv:2208.13595v1 [cs.CL])
    Hate speech classification has been a long-standing problem in natural language processing. However, even though there are numerous hate speech detection methods, they usually overlook a lot of hateful statements due to them being implicit in nature. Developing datasets to aid in the task of implicit hate speech classification comes with its own challenges; difficulties are nuances in language, varying definitions of what constitutes hate speech, and the labor-intensive process of annotating such data. This had led to a scarcity of data available to train and test such systems, which gives rise to high variance problems when parameter-heavy transformer-based models are used to address the problem. In this paper, we explore various optimization and regularization techniques and develop a novel RoBERTa-based model that achieves state-of-the-art performance.  ( 2 min )
    Stock Market Prediction using Natural Language Processing -- A Survey. (arXiv:2208.13564v1 [q-fin.ST])
    The stock market is a network which provides a platform for almost all major economic transactions. While investing in the stock market is a good idea, investing in individual stocks may not be, especially for the casual investor. Smart stock-picking requires in-depth research and plenty of dedication. Predicting this stock value offers enormous arbitrage profit opportunities. This attractiveness of finding a solution has prompted researchers to find a way past problems like volatility, seasonality, and dependence on time. This paper surveys recent literature in the domain of natural language processing and machine learning techniques used to predict stock market movements. The main contributions of this paper include the sophisticated categorizations of many recent articles and the illustration of the recent trends of research in stock market prediction and its related areas.  ( 2 min )
    Towards Federated Learning against Noisy Labels via Local Self-Regularization. (arXiv:2208.12807v1 [cs.LG])
    Federated learning (FL) aims to learn joint knowledge from a large scale of decentralized devices with labeled data in a privacy-preserving manner. However, since high-quality labeled data require expensive human intelligence and efforts, data with incorrect labels (called noisy labels) are ubiquitous in reality, which inevitably cause performance degradation. Although a lot of methods are proposed to directly deal with noisy labels, these methods either require excessive computation overhead or violate the privacy protection principle of FL. To this end, we focus on this issue in FL with the purpose of alleviating performance degradation yielded by noisy labels meanwhile guaranteeing data privacy. Specifically, we propose a Local Self-Regularization method, which effectively regularizes the local training process via implicitly hindering the model from memorizing noisy labels and explicitly narrowing the model output discrepancy between original and augmented instances using self distillation. Experimental results demonstrate that our proposed method can achieve notable resistance against noisy labels in various noise levels on three benchmark datasets. In addition, we integrate our method with existing state-of-the-arts and achieve superior performance on the real-world dataset Clothing1M. The code is available at https://github.com/Sprinter1999/FedLSR.  ( 3 min )
    Incrementality Bidding and Attribution. (arXiv:2208.12809v1 [cs.LG])
    The causal effect of showing an ad to a potential customer versus not, commonly referred to as "incrementality", is the fundamental question of advertising effectiveness. In digital advertising three major puzzle pieces are central to rigorously quantifying advertising incrementality: ad buying/bidding/pricing, attribution, and experimentation. Building on the foundations of machine learning and causal econometrics, we propose a methodology that unifies these three concepts into a computationally viable model of both bidding and attribution which spans the randomization, training, cross validation, scoring, and conversion attribution of advertising's causal effects. Implementation of this approach is likely to secure a significant improvement in the return on investment of advertising.  ( 2 min )
    Federated Learning of Large Models at the Edge via Principal Sub-Model Training. (arXiv:2208.13141v1 [cs.LG])
    Limited compute and communication capabilities of edge users create a significant bottleneck for federated learning (FL) of large models. We consider a realistic, but much less explored, cross-device FL setting in which no client has the capacity to train a full large model nor is willing to share any intermediate activations with the server. To this end, we present Principal Sub-Model (PriSM) training methodology, which leverages models low-rank structure and kernel orthogonality to train sub-models in the orthogonal kernel space. More specifically, by applying singular value decomposition (SVD) to original kernels in the server model, PriSM first obtains a set of principal orthogonal kernels in which each one is weighed by its singular value. Thereafter, PriSM utilizes our novel sampling strategy that selects different subsets of the principal kernels independently to create sub-models for clients. Importantly, a kernel with a large singular value is assigned with a high sampling probability. Thus, each sub-model is a low-rank approximation of the full large model, and all clients together achieve the near full-model training. Our extensive evaluations on multiple datasets in various resource-constrained settings show that PriSM can yield an improved performance of up to 10% compared to existing alternatives, with only around 20% sub-model training.  ( 2 min )
    Local Context-Aware Active Domain Adaptation. (arXiv:2208.12856v1 [cs.LG])
    Active Domain Adaptation (ADA) queries the label of selected target samples to help adapting a model from a related source domain to a target domain. It has attracted increasing attention recently due to its promising performance with minimal labeling cost. Nevertheless, existing ADA methods have not fully exploited the local context of queried data, which is important to ADA, especially when the domain gap is large. In this paper, we propose a novel framework of Local context-aware Active Domain Adaptation (LADA), which is composed of two key modules. The Local context-aware Active Selection (LAS) module selects target samples whose class probability predictions are inconsistent with their neighbors. The Local context-aware Model Adaptation (LMA) module refines a model with both queried samples and their expanded neighbors, regularized by a context-preserving loss. Extensive experiments show that LAS selects more informative samples than existing active selection strategies. Furthermore, equipped with LMA, the full LADA method outperforms state-of-the-art ADA solutions on various benchmarks. Code is available at https://github.com/tsun/LADA.  ( 2 min )
    Constraining Pseudo-label in Self-training Unsupervised Domain Adaptation with Energy-based Model. (arXiv:2208.12885v1 [cs.CV])
    Deep learning is usually data starved, and the unsupervised domain adaptation (UDA) is developed to introduce the knowledge in the labeled source domain to the unlabeled target domain. Recently, deep self-training presents a powerful means for UDA, involving an iterative process of predicting the target domain and then taking the confident predictions as hard pseudo-labels for retraining. However, the pseudo-labels are usually unreliable, thus easily leading to deviated solutions with propagated errors. In this paper, we resort to the energy-based model and constrain the training of the unlabeled target sample with an energy function minimization objective. It can be achieved via a simple additional regularization or an energy-based loss. This framework allows us to gain the benefits of the energy-based model, while retaining strong discriminative performance following a plug-and-play fashion. The convergence property and its connection with classification expectation minimization are investigated. We deliver extensive experiments on the most popular and large-scale UDA benchmarks of image classification as well as semantic segmentation to demonstrate its generality and effectiveness.  ( 2 min )
    Fine-tuning Multi-hop Question Answering with Hierarchical Graph Network. (arXiv:2004.13821v3 [cs.CL] UPDATED)
    In this paper, we present a two stage model for multi-hop question answering. The first stage is a hierarchical graph network, which is used to reason over multi-hop question and is capable to capture different levels of granularity using the nature structure(i.e., paragraphs, questions, sentences and entities) of documents. The reasoning process is convert to node classify task(i.e., paragraph nodes and sentences nodes). The second stage is a language model fine-tuning task. In a word, stage one use graph neural network to select and concatenate support sentences as one paragraph, and stage two find the answer span in language model fine-tuning paradigm.  ( 2 min )
    Statistical Inverse Problems in Hilbert Scales. (arXiv:2208.13289v1 [math.ST])
    In this paper, we study the Tikhonov regularization scheme in Hilbert scales for the nonlinear statistical inverse problem with a general noise. The regularizing norm in this scheme is stronger than the norm in Hilbert space. We focus on developing a theoretical analysis for this scheme based on the conditional stability estimates. We utilize the concept of the distance function to establish the high probability estimates of the direct and reconstruction error in Reproducing kernel Hilbert space setting. Further, the explicit rates of convergence in terms of sample size are established for the oversmoothing case and the regular case over the regularity class defined through appropriate source condition. Our results improve and generalize previous results obtained in related settings.  ( 2 min )
    Domain Adaptation with Adversarial Training on Penultimate Activations. (arXiv:2208.12853v1 [cs.LG])
    Enhancing model prediction confidence on unlabeled target data is an important objective in Unsupervised Domain Adaptation (UDA). In this paper, we explore adversarial training on penultimate activations, ie, input features of the final linear classification layer. We show that this strategy is more efficient and better correlated with the objective of boosting prediction confidence than adversarial training on input images or intermediate features, as used in previous works. Furthermore, with activation normalization commonly used in domain adaptation to reduce domain gap, we derive two variants and systematically analyze the effects of normalization on our adversarial training. This is illustrated both in theory and through empirical analysis on real adaptation tasks. Extensive experiments are conducted on popular UDA benchmarks under both standard setting and source-data free setting. The results validate that our method achieves the best scores against previous arts.  ( 2 min )
    Uncovering dark matter density profiles in dwarf galaxies with graph neural networks. (arXiv:2208.12825v1 [astro-ph.CO])
    Dwarf galaxies are small, dark matter-dominated galaxies, some of which are embedded within the Milky Way. Their lack of baryonic matter (e.g., stars and gas) makes them perfect test beds for probing the properties of dark matter -- understanding the spatial dark matter distribution in these systems can be used to constrain microphysical dark matter interactions that influence the formation and evolution of structures in our Universe. We introduce a new method that leverages simulation-based inference and graph-based machine learning in order to infer the dark matter density profiles of dwarf galaxies from observable kinematics of stars gravitationally bound to these systems. Our approach aims to address some of the limitations of established methods based on dynamical Jeans modeling. We show that this novel method can place stronger constraints on dark matter profiles and, consequently, has the potential to weigh in on some of the ongoing puzzles associated with the small-scale structure of dark matter halos, such as the core-cusp discrepancy.  ( 2 min )
    Adaptively-weighted Integral Space for Fast Multiview Clustering. (arXiv:2208.12808v1 [cs.LG])
    Multiview clustering has been extensively studied to take advantage of multi-source information to improve the clustering performance. In general, most of the existing works typically compute an n * n affinity graph by some similarity/distance metrics (e.g. the Euclidean distance) or learned representations, and explore the pairwise correlations across views. But unfortunately, a quadratic or even cubic complexity is often needed, bringing about difficulty in clustering largescale datasets. Some efforts have been made recently to capture data distribution in multiple views by selecting view-wise anchor representations with k-means, or by direct matrix factorization on the original observations. Despite the significant success, few of them have considered the view-insufficiency issue, implicitly holding the assumption that each individual view is sufficient to recover the cluster structure. Moreover, the latent integral space as well as the shared cluster structure from multiple insufficient views is not able to be simultaneously discovered. In view of this, we propose an Adaptively-weighted Integral Space for Fast Multiview Clustering (AIMC) with nearly linear complexity. Specifically, view generation models are designed to reconstruct the view observations from the latent integral space with diverse adaptive contributions. Meanwhile, a centroid representation with orthogonality constraint and cluster partition are seamlessly constructed to approximate the latent integral space. An alternate minimizing algorithm is developed to solve the optimization problem, which is proved to have linear time complexity w.r.t. the sample size. Extensive experiments conducted on several realworld datasets confirm the superiority of the proposed AIMC method compared with the state-of-the-art methods.  ( 3 min )
    BOBA: Byzantine-Robust Federated Learning with Label Skewness. (arXiv:2208.12932v1 [cs.LG])
    In federated learning, most existing techniques for robust aggregation against Byzantine attacks are designed for the IID setting, i.e., the data distributions for clients are independent and identically distributed. In this paper, we address label skewness, a more realistic and challenging non-IID setting, where each client only has access to a few classes of data. In this setting, state-of-the-art techniques suffer from selection bias, leading to significant performance drop for particular classes; they are also more vulnerable to Byzantine attacks due to the increased deviation among gradients of honest clients. To address these limitations, we propose an efficient two-stage method named BOBA. Theoretically, we prove the convergence of BOBA with an error of optimal order. Empirically, we verify the superior unbiasedness and robustness of BOBA across a wide range of models and data sets against various baselines.  ( 2 min )
    Lossy Image Compression with Quantized Hierarchical VAEs. (arXiv:2208.13056v1 [eess.IV])
    Recent work has shown a strong theoretical connection between variational autoencoders (VAEs) and the rate distortion theory. Motivated by this, we consider the problem of lossy image compression from the perspective of generative modeling. Starting from ResNet VAEs, which are originally designed for data (image) distribution modeling, we redesign their latent variable model using a quantization-aware posterior and prior, enabling easy quantization and entropy coding for image compression. Along with improved neural network blocks, we present a powerful and efficient class of lossy image coders, outperforming previous methods on natural image (lossy) compression. Our model compresses images in a coarse-to-fine fashion and supports parallel encoding and decoding, leading to fast execution on GPUs.  ( 2 min )
  • Open

    Hydrogen atom confined inside an inverted-Gaussian potential. (arXiv:2208.13107v1 [physics.atom-ph])
    In this work, we consider the hydrogen atom confined inside a penetrable spherical potential. The confining potential is described by an inverted-Gaussian function of depth $\omega_0$, width $\sigma$ and centered at $r_c$. In particular, this model has been used to study atoms inside a $C_{60}$ fullerene. For the lowest values of angular momentum $l=0,1,2$, the spectra of the system as a function of the parameters ($\omega_0,\sigma,r_c$) is calculated using three distinct numerical methods: (i) Lagrange-mesh method, (ii) fourth order finite differences and (iii) the finite element method. Concrete results with not less than 11 significant figures are displayed. Also, within the Lagrange-mesh approach the corresponding eigenfunctions and the expectation value of $r$ for the first six states of $s, p$ and $d$ symmetries, respectively, are presented. Our accurate energies are taken as initial data to train an artificial neural network as well. It generates an efficient numerical interpolation. The present numerical results improve and extend those reported in the literature.
    Equivariant Mesh Attention Networks. (arXiv:2205.10662v2 [cs.LG] UPDATED)
    Equivariance to symmetries has proven to be a powerful inductive bias in deep learning research. Recent works on mesh processing have concentrated on various kinds of natural symmetries, including translations, rotations, scaling, node permutations, and gauge transformations. To date, no existing architecture is equivariant to all of these transformations. In this paper, we present an attention-based architecture for mesh data that is provably equivariant to all transformations mentioned above. Our pipeline relies on the use of relative tangential features: a simple, effective, equivariance-friendly alternative to raw node positions as inputs. Experiments on the FAUST and TOSCA datasets confirm that our proposed architecture achieves improved performance on these benchmarks and is indeed equivariant, and therefore robust, to a wide variety of local/global transformations.
    Causality and independence in perfectly adapted dynamical systems. (arXiv:2101.11885v2 [cs.AI] UPDATED)
    Perfect adaptation in a dynamical system is the phenomenon that one or more variables have an initial transient response to a persistent change in an external stimulus but revert to their original value as the system converges to equilibrium. With the help of the causal ordering algorithm, one can construct graphical representations of dynamical systems that represent the causal relations between the variables and the conditional independences in the equilibrium distribution. We apply these tools to formulate sufficient graphical conditions for identifying perfect adaptation from a set of first-order differential equations. Furthermore, we give sufficient conditions to test for the presence of perfect adaptation in experimental equilibrium data. We apply this method to a simple model for a protein signalling pathway and test its predictions both in simulations and using real-world protein expression data. We demonstrate that perfect adaptation can lead to misleading orientation of edges in the output of causal discovery algorithms.
    Understanding the Limits of Poisoning Attacks in Episodic Reinforcement Learning. (arXiv:2208.13663v1 [cs.LG])
    To understand the security threats to reinforcement learning (RL) algorithms, this paper studies poisoning attacks to manipulate \emph{any} order-optimal learning algorithm towards a targeted policy in episodic RL and examines the potential damage of two natural types of poisoning attacks, i.e., the manipulation of \emph{reward} and \emph{action}. We discover that the effect of attacks crucially depend on whether the rewards are bounded or unbounded. In bounded reward settings, we show that only reward manipulation or only action manipulation cannot guarantee a successful attack. However, by combining reward and action manipulation, the adversary can manipulate any order-optimal learning algorithm to follow any targeted policy with $\tilde{\Theta}(\sqrt{T})$ total attack cost, which is order-optimal, without any knowledge of the underlying MDP. In contrast, in unbounded reward settings, we show that reward manipulation attacks are sufficient for an adversary to successfully manipulate any order-optimal learning algorithm to follow any targeted policy using $\tilde{O}(\sqrt{T})$ amount of contamination. Our results reveal useful insights about what can or cannot be achieved by poisoning attacks, and are set to spur more works on the design of robust RL algorithms.
    Sparse random hypergraphs: Non-backtracking spectra and community detection. (arXiv:2203.07346v2 [math.PR] UPDATED)
    We consider the community detection problem in a sparse $q$-uniform hypergraph $G$, assuming that $G$ is generated according to the so-called Hypergraph Stochastic Block Model (HSBM). We prove that a spectral method based on the non-backtracking operator for hypergraphs works with high probability down to the generalized Kesten-Stigum detection threshold conjectured by Angelini et al. We characterize the spectrum of the non-backtracking operator for the sparse HSBM, and provide an efficient dimension reduction procedure using the Ihara-Bass formula for hypergraphs. As a result, community detection for the sparse HSBM on $n$ vertices can be reduced to an eigenvector problem of a $2n\times 2n$ non-normal matrix constructed from the adjacency matrix and the degree matrix of the hypergraph. To the best of our knowledge, this is the first provable and efficient spectral algorithm that achieves the conjectured threshold for HSBMs with $k$ blocks generated according to a general symmetric probability tensor.
    Continual Semi-Supervised Learning through Contrastive Interpolation Consistency. (arXiv:2108.06552v3 [stat.ML] UPDATED)
    Continual Learning (CL) investigates how to train Deep Networks on a stream of tasks without incurring forgetting. CL settings proposed in literature assume that every incoming example is paired with ground-truth annotations. However, this clashes with many real-world applications: gathering labeled data, which is in itself tedious and expensive, becomes infeasible when data flow as a stream. This work explores Continual Semi-Supervised Learning (CSSL): here, only a small fraction of labeled input examples are shown to the learner. We assess how current CL methods (e.g.: EWC, LwF, iCaRL, ER, GDumb, DER) perform in this novel and challenging scenario, where overfitting entangles forgetting. Subsequently, we design a novel CSSL method that exploits metric learning and consistency regularization to leverage unlabeled examples while learning. We show that our proposal exhibits higher resilience to diminishing supervision and, even more surprisingly, relying only on 25% supervision suffices to outperform SOTA methods trained under full supervision.
    Invertible Gaussian Reparameterization: Revisiting the Gumbel-Softmax. (arXiv:1912.09588v5 [stat.ML] UPDATED)
    The Gumbel-Softmax is a continuous distribution over the simplex that is often used as a relaxation of discrete distributions. Because it can be readily interpreted and easily reparameterized, it enjoys widespread use. We propose a modular and more flexible family of reparameterizable distributions where Gaussian noise is transformed into a one-hot approximation through an invertible function. This invertible function is composed of a modified softmax and can incorporate diverse transformations that serve different specific purposes. For example, the stick-breaking procedure allows us to extend the reparameterization trick to distributions with countably infinite support, thus enabling the use of our distribution along nonparametric models, or normalizing flows let us increase the flexibility of the distribution. Our construction enjoys theoretical advantages over the Gumbel-Softmax, such as closed form KL, and significantly outperforms it in a variety of experiments. Our code is available at https://github.com/cunningham-lab/igr.
    Robust Fine-Tuning of Deep Neural Networks with Hessian-based Generalization Guarantees. (arXiv:2206.02659v2 [cs.LG] UPDATED)
    We consider transfer learning approaches that fine-tune a pretrained deep neural network on a target task. We study generalization properties of fine-tuning to understand the problem of overfitting, which commonly occurs in practice. Previous works have shown that constraining the distance from the initialization of fine-tuning improves generalization. Using a PAC-Bayesian analysis, we observe that besides distance from initialization, Hessians affect generalization through the noise stability of deep neural networks against noise injections. Motivated by the observation, we develop Hessian distance-based generalization bounds for a wide range of fine-tuning methods. Additionally, we study the robustness of fine-tuning in the presence of noisy labels. Motivated by our theory, we design an algorithm that incorporates consistent losses and distance-based regularization for fine-tuning, along with a generalization error guarantee under class conditional independent noise in the training set labels. We perform a detailed empirical study of our algorithm on various noisy environments and architectures. On six image classification tasks whose training labels are generated with programmatic labeling, we find a 3.26% accuracy gain over prior fine-tuning methods. Meanwhile, the Hessian distance measure of the fine-tuned model decreases by six times more than existing approaches.
    Leachable Component Clustering. (arXiv:2208.13217v1 [cs.LG])
    Clustering attempts to partition data instances into several distinctive groups, while the similarities among data belonging to the common partition can be principally reserved. Furthermore, incomplete data frequently occurs in many realworld applications, and brings perverse influence on pattern analysis. As a consequence, the specific solutions to data imputation and handling are developed to conduct the missing values of data, and independent stage of knowledge exploitation is absorbed for information understanding. In this work, a novel approach to clustering of incomplete data, termed leachable component clustering, is proposed. Rather than existing methods, the proposed method handles data imputation with Bayes alignment, and collects the lost patterns in theory. Due to the simple numeric computation of equations, the proposed method can learn optimized partitions while the calculation efficiency is held. Experiments on several artificial incomplete data sets demonstrate that, the proposed method is able to present superior performance compared with other state-of-the-art algorithms.
    A Variance-Reduced Stochastic Gradient Tracking Algorithm for Decentralized Optimization with Orthogonality Constraints. (arXiv:2208.13643v1 [math.OC])
    Decentralized optimization with orthogonality constraints is found widely in scientific computing and data science. Since the orthogonality constraints are nonconvex, it is quite challenging to design efficient algorithms. Existing approaches leverage the geometric tools from Riemannian optimization to solve this problem at the cost of high sample and communication complexities. To relieve this difficulty, based on two novel techniques that can waive the orthogonality constraints, we propose a variance-reduced stochastic gradient tracking (VRSGT) algorithm with the convergence rate of $O(1 / k)$ to a stationary point. To the best of our knowledge, VRSGT is the first algorithm for decentralized optimization with orthogonality constraints that reduces both sampling and communication complexities simultaneously. In the numerical experiments, VRSGT has a promising performance in a real-world autonomous driving application.
    Optimal Estimation of Generic Dynamics by Path-Dependent Neural Jump ODEs. (arXiv:2206.14284v2 [stat.ML] UPDATED)
    This paper studies the problem of forecasting general stochastic processes using an extension of the Neural Jump ODE (NJ-ODE) framework. While NJ-ODE was the first framework to establish convergence guarantees for the prediction of irregularly observed time-series, these results were limited to data stemming from It\^o-diffusions with complete observations, in particular Markov processes where all coordinates are observed simultaneously. In this work, we generalise these results to generic, possibly non-Markovian or discontinuous, stochastic processes with incomplete observations, by utilising the reconstruction properties of the signature transform. These theoretical results are supported by empirical studies, where it is shown that the path-dependent NJ-ODE outperforms the original NJ-ODE framework in the case of non-Markovian data. Moreover, we show that PD-NJ-ODE can be applied successfully to limit order book (LOB) data.
    Variance estimation in graphs with the fused lasso. (arXiv:2207.12638v2 [math.ST] UPDATED)
    We study the problem of variance estimation in general graph-structured problems. First, we develop a linear time estimator for the homoscedastic case that can consistently estimate the variance in general graphs. We show that our estimator attains minimax rates for the chain and 2D grid graphs when the mean signal has a total variation with canonical scaling. Furthermore, we provide general upper bounds on the mean squared error performance of the fused lasso estimator in general graphs under a moment condition and a bound on the tail behavior of the errors. These upper bounds allow us to generalize for broader classes of distributions, such as sub-Exponential, many existing results on the fused lasso that are only known to hold with the assumption that errors are sub-Gaussian random variables. Exploiting our upper bounds, we then study a simple total variation regularization estimator for estimating the signal of variances in the heteroscedastic case. Our results show that the variance estimator attains minimax rates for estimating signals of bounded variation in grid graphs, $K$-nearest neighbor graphs with very mild assumptions, and it is consistent for estimating the variances in any connected graph. In addition, extensive numerical results show that our proposed estimators perform reasonably well in a variety of graph-structured models.
    Optimal high-dimensional and nonparametric distributed testing under communication constraints. (arXiv:2202.00968v2 [math.ST] UPDATED)
    We derive minimax testing errors in a distributed framework where the data is split over multiple machines and their communication to a central machine is limited to $b$ bits. We investigate both the $d$- and infinite-dimensional signal detection problem under Gaussian white noise. We also derive distributed testing algorithms reaching the theoretical lower bounds. Our results show that distributed testing is subject to fundamentally different phenomena that are not observed in distributed estimation. Among our findings, we show that testing protocols that have access to shared randomness can perform strictly better in some regimes than those that do not. We also observe that consistent nonparametric distributed testing is always possible, even with as little as $1$-bit of communication and the corresponding test outperforms the best local test using only the information available at a single local machine. Furthermore, we also derive adaptive nonparametric distributed testing strategies and the corresponding theoretical lower bounds.
    Generalization In Multi-Objective Machine Learning. (arXiv:2208.13499v1 [cs.LG])
    Modern machine learning tasks often require considering not just one but multiple objectives. For example, besides the prediction quality, this could be the efficiency, robustness or fairness of the learned models, or any of their combinations. Multi-objective learning offers a natural framework for handling such problems without having to commit to early trade-offs. Surprisingly, statistical learning theory so far offers almost no insight into the generalization properties of multi-objective learning. In this work, we make first steps to fill this gap: we establish foundational generalization bounds for the multi-objective setting as well as generalization and excess bounds for learning with scalarizations. We also provide the first theoretical analysis of the relation between the Pareto-optimal sets of the true objectives and the Pareto-optimal sets of their empirical approximations from training data. In particular, we show a surprising asymmetry: all Pareto-optimal solutions can be approximated by empirically Pareto-optimal ones, but not vice versa.
    Empirical Gateaux Derivatives for Causal Inference. (arXiv:2208.13701v1 [stat.ME])
    We study a constructive algorithm that approximates Gateaux derivatives for statistical functionals by finite-differencing, with a focus on causal inference functionals. We consider the case where probability distributions are not known a priori but also need to be estimated from data. These estimated distributions lead to empirical Gateaux derivatives, and we study the relationships between empirical, numerical, and analytical Gateaux derivatives. Starting with a case study of counterfactual mean estimation, we instantiate the exact relationship between finite-differences and the analytical Gateaux derivative. We then derive requirements on the rates of numerical approximation in perturbation and smoothing that preserve the statistical benefits of one-step adjustments, such as rate-double-robustness. We then study more complicated functionals such as dynamic treatment regimes and the linear-programming formulation for policy optimization in infinite-horizon Markov decision processes. The newfound ability to approximate bias adjustments in the presence of arbitrary constraints illustrates the usefulness of constructive approaches for Gateaux derivatives. We also find that the statistical structure of the functional (rate-double robustness) can permit less conservative rates of finite-difference approximation. This property, however, can be specific to particular functionals, e.g. it occurs for the counterfactual mean but not the infinite-horizon MDP policy value.
    Comparing two samples through stochastic dominance: a graphical approach. (arXiv:2203.07889v3 [stat.ML] UPDATED)
    Non-deterministic measurements are common in real-world scenarios: the performance of a stochastic optimization algorithm or the total reward of a reinforcement learning agent in a chaotic environment are just two examples in which unpredictable outcomes are common. These measures can be modeled as random variables and compared among each other via their expected values or more sophisticated tools such as null hypothesis statistical tests. In this paper, we propose an alternative framework to visually compare two samples according to their estimated cumulative distribution functions. First, we introduce a dominance measure for two random variables that quantifies the proportion in which the cumulative distribution function of one of the random variables scholastically dominates the other one. Then, we present a graphical method that decomposes in quantiles i) the proposed dominance measure and ii) the probability that one of the random variables takes lower values than the other. With illustrative purposes, we re-evaluate the experimentation of an already published work with the proposed methodology and we show that additional conclusions (missed by the rest of the methods) can be inferred. Additionally, the software package RVCompare was created as a convenient way of applying and experimenting with the proposed framework.
    Visualizing high-dimensional loss landscapes with Hessian directions. (arXiv:2208.13219v1 [cs.LG])
    Analyzing geometric properties of high-dimensional loss functions, such as local curvature and the existence of other optima around a certain point in loss space, can help provide a better understanding of the interplay between neural network structure, implementation attributes, and learning performance. In this work, we combine concepts from high-dimensional probability and differential geometry to study how curvature properties in lower-dimensional loss representations depend on those in the original loss space. We show that saddle points in the original space are rarely correctly identified as such in lower-dimensional representations if random projections are used. In such projections, the expected curvature in a lower-dimensional representation is proportional to the mean curvature in the original loss space. Hence, the mean curvature in the original loss space determines if saddle points appear, on average, as either minima, maxima, or almost flat regions. We use the connection between expected curvature and mean curvature (i.e., the normalized Hessian trace) to estimate the trace of Hessians without calculating the Hessian or Hessian-vector products as in Hutchinson's method. Because random projections are not able to correctly identify saddle information, we propose to study projections along Hessian directions that are associated with the largest and smallest principal curvatures. We connect our findings to the ongoing debate on loss landscape flatness and generalizability. Finally, we illustrate our method in numerical experiments on different image classifiers with up to about $7\times 10^6$ parameters.  ( 3 min )
    Statistical Inverse Problems in Hilbert Scales. (arXiv:2208.13289v1 [math.ST])
    In this paper, we study the Tikhonov regularization scheme in Hilbert scales for the nonlinear statistical inverse problem with a general noise. The regularizing norm in this scheme is stronger than the norm in Hilbert space. We focus on developing a theoretical analysis for this scheme based on the conditional stability estimates. We utilize the concept of the distance function to establish the high probability estimates of the direct and reconstruction error in Reproducing kernel Hilbert space setting. Further, the explicit rates of convergence in terms of sample size are established for the oversmoothing case and the regular case over the regularity class defined through appropriate source condition. Our results improve and generalize previous results obtained in related settings.  ( 2 min )
    Deep Kernel Learning of Dynamical Models from High-Dimensional Noisy Data. (arXiv:2208.12975v1 [cs.LG])
    This work proposes a Stochastic Variational Deep Kernel Learning method for the data-driven discovery of low-dimensional dynamical models from high-dimensional noisy data. The framework is composed of an encoder that compresses high-dimensional measurements into low-dimensional state variables, and a latent dynamical model for the state variables that predicts the system evolution over time. The training of the proposed model is carried out in an unsupervised manner, i.e., not relying on labeled data. Our learning method is evaluated on the motion of a pendulum -- a well studied baseline for nonlinear model identification and control with continuous states and control inputs -- measured via high-dimensional noisy RGB images. Results show that the method can effectively denoise measurements, learn compact state representations and latent dynamical models, as well as identify and quantify modeling uncertainties.  ( 2 min )
    Overparameterized (robust) models from computational constraints. (arXiv:2208.12926v1 [cs.LG])
    Overparameterized models with millions of parameters have been hugely successful. In this work, we ask: can the need for large models be, at least in part, due to the \emph{computational} limitations of the learner? Additionally, we ask, is this situation exacerbated for \emph{robust} learning? We show that this indeed could be the case. We show learning tasks for which computationally bounded learners need \emph{significantly more} model parameters than what information-theoretic learners need. Furthermore, we show that even more model parameters could be necessary for robust learning. In particular, for computationally bounded learners, we extend the recent result of Bubeck and Sellke [NeurIPS'2021] which shows that robust models might need more parameters, to the computational regime and show that bounded learners could provably need an even larger number of parameters. Then, we address the following related question: can we hope to remedy the situation for robust computationally bounded learning by restricting \emph{adversaries} to also be computationally bounded for sake of obtaining models with fewer parameters? Here again, we show that this could be possible. Specifically, building on the work of Garg, Jha, Mahloujifar, and Mahmoody [ALT'2020], we demonstrate a learning task that can be learned efficiently and robustly against a computationally bounded attacker, while to be robust against an information-theoretic attacker requires the learner to utilize significantly more parameters.  ( 3 min )
    Fast Optimal Estimation with Intractable Models using Permutation-Invariant Neural Networks. (arXiv:2208.12942v1 [stat.ME])
    Neural networks have recently shown promise for likelihood-free inference, providing orders-of-magnitude speed-ups over classical methods. However, current implementations are suboptimal when estimating parameters from independent replicates. In this paper, we use a decision-theoretic framework to argue that permutation-invariant neural networks are ideally placed for constructing Bayes estimators for arbitrary models, provided that simulation from these models is straightforward. We illustrate the potential of these estimators on both conventional spatial models, as well as highly parameterised spatial-extremes models, and show that they considerably outperform neural estimators that do not account for replication appropriately in their network design. At the same time, they are highly competitive and much faster than traditional likelihood-based estimators. We apply our estimator on a spatial analysis of sea-surface temperature in the Red Sea where, after training, we obtain parameter estimates, and uncertainty quantification of the estimates via bootstrap sampling, from hundreds of spatial fields in a fraction of a second.  ( 2 min )
    Approach of variable clustering and compression for learning large Bayesian networks. (arXiv:2208.13605v1 [stat.ML])
    This paper describes a new approach for learning structures of large Bayesian networks based on blocks resulting from feature space clustering. This clustering is obtained using normalized mutual information. And the subsequent aggregation of blocks is done using classical learning methods except that they are input with compressed information about combinations of feature values for each block. Validation of this approach is done for Hill-Climbing as a graph enumeration algorithm for two score functions: BIC and MI. In this way, potentially parallelizable block learning can be implemented even for those score functions that are considered unsuitable for parallelizable learning. The advantage of the approach is evaluated in terms of speed of work as well as the accuracy of the found structures.  ( 2 min )
    Smooth Monotone Stochastic Variational Inequalities and Saddle Point Problems -- Survey. (arXiv:2208.13592v1 [math.OC])
    This paper is a survey of methods for solving smooth (strongly) monotone stochastic variational inequalities. To begin with, we give the deterministic foundation from which the stochastic methods eventually evolved. Then we review methods for the general stochastic formulation, and look at the finite sum setup. The last parts of the paper are devoted to various recent (not necessarily stochastic) advances in algorithms for variational inequalities.  ( 2 min )
    Mixtures of Gaussian Process Experts with SMC$^2$. (arXiv:2208.12830v1 [stat.ML])
    Gaussian processes are a key component of many flexible statistical and machine learning models. However, they exhibit cubic computational complexity and high memory constraints due to the need of inverting and storing a full covariance matrix. To circumvent this, mixtures of Gaussian process experts have been considered where data points are assigned to independent experts, reducing the complexity by allowing inference based on smaller, local covariance matrices. Moreover, mixtures of Gaussian process experts substantially enrich the model's flexibility, allowing for behaviors such as non-stationarity, heteroscedasticity, and discontinuities. In this work, we construct a novel inference approach based on nested sequential Monte Carlo samplers to simultaneously infer both the gating network and Gaussian process expert parameters. This greatly improves inference compared to importance sampling, particularly in settings when a stationary Gaussian process is inappropriate, while still being thoroughly parallelizable.  ( 2 min )
    Learning Clinical Concepts for Predicting Risk of Progression to Severe COVID-19. (arXiv:2208.13126v1 [cs.LG])
    With COVID-19 now pervasive, identification of high-risk individuals is crucial. Using data from a major healthcare provider in Southwestern Pennsylvania, we develop survival models predicting severe COVID-19 progression. In this endeavor, we face a tradeoff between more accurate models relying on many features and less accurate models relying on a few features aligned with clinician intuition. Complicating matters, many EHR features tend to be under-coded, degrading the accuracy of smaller models. In this study, we develop two sets of high-performance risk scores: (i) an unconstrained model built from all available features; and (ii) a pipeline that learns a small set of clinical concepts before training a risk predictor. Learned concepts boost performance over the corresponding features (C-index 0.858 vs. 0.844) and demonstrate improvements over (i) when evaluated out-of-sample (subsequent time periods). Our models outperform previous works (C-index 0.844-0.872 vs. 0.598-0.810).  ( 2 min )
    Bayesian Graph Contrastive Learning. (arXiv:2112.07823v4 [cs.LG] UPDATED)
    Contrastive learning has become a key component of self-supervised learning approaches for graph-structured data. Despite their success, existing graph contrastive learning methods are incapable of uncertainty quantification for node representations or their downstream tasks, limiting their application in high-stakes domains. In this paper, we propose a novel Bayesian perspective of graph contrastive learning methods showing random augmentations leads to stochastic encoders. As a result, our proposed method represents each node by a distribution in the latent space in contrast to existing techniques which embed each node to a deterministic vector. By learning distributional representations, we provide uncertainty estimates in downstream graph analytics tasks and increase the expressive power of the predictive model. In addition, we propose a Bayesian framework to infer the probability of perturbations in each view of the contrastive model, eliminating the need for a computationally expensive search for hyperparameter tuning. We empirically show a considerable improvement in performance compared to existing state-of-the-art methods on several benchmark datasets.  ( 2 min )
    Towards Reliable Simulation-Based Inference with Balanced Neural Ratio Estimation. (arXiv:2208.13624v1 [stat.ML])
    Modern approaches for simulation-based inference rely upon deep learning surrogates to enable approximate inference with computer simulators. In practice, the estimated posteriors' computational faithfulness is, however, rarely guaranteed. For example, Hermans et al. (2021) show that current simulation-based inference algorithms can produce posteriors that are overconfident, hence risking false inferences. In this work, we introduce Balanced Neural Ratio Estimation (BNRE), a variation of the NRE algorithm designed to produce posterior approximations that tend to be more conservative, hence improving their reliability, while sharing the same Bayes optimal solution. We achieve this by enforcing a balancing condition that increases the quantified uncertainty in small simulation budget regimes while still converging to the exact posterior as the budget increases. We provide theoretical arguments showing that BNRE tends to produce posterior surrogates that are more conservative than NRE's. We evaluate BNRE on a wide variety of tasks and show that it produces conservative posterior surrogates on all tested benchmarks and simulation budgets. Finally, we emphasize that BNRE is straightforward to implement over NRE and does not introduce any computational overhead.  ( 2 min )
    Low-degree learning and the metric entropy of polynomials. (arXiv:2203.09659v2 [cs.LG] UPDATED)
    Let $\mathscr{F}_{n,d}$ be the class of all functions $f:\{-1,1\}^n\to[-1,1]$ on the $n$-dimensional discrete hypercube of degree at most $d$. In the first part of this paper, we prove that any (deterministic or randomized) algorithm which learns $\mathscr{F}_{n,d}$ with $L_2$-accuracy $\varepsilon$ requires at least $\Omega((1-\sqrt{\varepsilon})2^d\log n)$ queries for large enough $n$, thus establishing the sharpness as $n\to\infty$ of a recent upper bound of Eskenazis and Ivanisvili (2021). To do this, we show that the $L_2$-packing numbers $\mathsf{M}(\mathscr{F}_{n,d},\|\cdot\|_{L_2},\varepsilon)$ of the concept class $\mathscr{F}_{n,d}$ satisfy the two-sided estimate $$c(1-\varepsilon)2^d\log n \leq \log \mathsf{M}(\mathscr{F}_{n,d},\|\cdot\|_{L_2},\varepsilon) \leq \frac{2^{Cd}\log n}{\varepsilon^4}$$ for large enough $n$, where $c, C>0$ are universal constants. In the second part of the paper, we present a logarithmic upper bound for the randomized query complexity of classes of bounded approximate polynomials whose Fourier spectra are concentrated on few subsets. As an application, we prove new estimates for the number of random queries required to learn approximate juntas of a given degree, functions with rapidly decaying Fourier tails and constant depth circuits of given size. Finally, we obtain bounds for the number of queries required to learn the polynomial class $\mathscr{F}_{n,d}$ without error in the query and random example models.  ( 3 min )
    A preprocessing perspective for quantum machine learning classification advantage using NISQ algorithms. (arXiv:2208.13251v1 [quant-ph])
    Quantum Machine Learning (QML) hasn't yet demonstrated extensively and clearly its advantages compared to the classical machine learning approach. So far, there are only specific cases where some quantum-inspired techniques have achieved small incremental advantages, and a few experimental cases in hybrid quantum computing are promising considering a mid-term future (not taking into account the achievements purely associated with optimization using quantum-classical algorithms). The current quantum computers are noisy and have few qubits to test, making it difficult to demonstrate the current and potential quantum advantage of QML methods. This study shows that we can achieve better classical encoding and performance of quantum classifiers by using Linear Discriminant Analysis (LDA) during the data preprocessing step. As a result, Variational Quantum Algorithm (VQA) shows a gain of performance in balanced accuracy with the LDA technique and outperforms baseline classical classifiers.  ( 2 min )
    Information FOMO: The unhealthy fear of missing out on information. A method for removing misleading data for healthier models. (arXiv:2208.13080v1 [cs.LG])
    Not all data are equal. Misleading or unnecessary data can critically hinder the accuracy of Machine Learning (ML) models. When data is plentiful, misleading effects can be overcome, but in many real-world applications data is sparse and expensive to acquire. We present a method that substantially reduces the data size necessary to accurately train ML models, potentially opening the door for many new, limited-data applications in ML. Our method extracts the most informative data, while ignoring and omitting data that misleads the ML model to inferior generalization properties. Specifically, the method eliminates the phenomena of "double descent", where more data leads to worse performance. This approach brings several key features to the ML community. Notably, the method naturally converges and removes the traditional need to divide the dataset into training, testing, and validation data. Instead, the selection metric inherently assesses testing error. This ensures that key information is never wasted in testing or validation.  ( 2 min )
    Sum-of-Squares Relaxations for Information Theory and Variational Inference. (arXiv:2206.13285v2 [cs.IT] UPDATED)
    We consider extensions of the Shannon relative entropy, referred to as f-divergences. Three classical related computational problems are typically associated with these divergences: (a) estimation from moments, (b) computing normalizing integrals, and (c) variational inference in probabilistic models. These problems are related to one another through convex duality, and for all them, there are many applications throughout data science, and we aim for computationally tractable approximation algorithms that preserve properties of the original problem such as potential convexity or monotonicity. In order to achieve this, we derive a sequence of convex relaxations for computing these divergences from non-centered covariance matrices associated with a given feature vector: starting from the typically non-tractable optimal lower-bound, we consider an additional relaxation based on "sums-of-squares", which is is now computable in polynomial time as a semidefinite program, as well as further computationally more efficient relaxations based on spectral information divergences from quantum information theory. For all of the tasks above, beyond proposing new relaxations, we derive tractable algorithms based on augmented Lagrangians and first-order methods, and we present illustrations on multivariate trigonometric polynomials and functions on the Boolean hypercube.  ( 2 min )
    Equivariant graph neural networks for fast electron density estimation of molecules, liquids, and solids. (arXiv:2112.00652v2 [physics.comp-ph] UPDATED)
    Electron density $\rho(\vec{r})$ is the fundamental variable in the calculation of ground state energy with density functional theory (DFT). Beyond total energy, features and changes in $\rho(\vec{r})$ distributions are often used to capture critical physicochemical phenomena in functional materials. We present a machine learning framework for the prediction of $\rho(\vec{r})$. The model is based on equivariant graph neural networks and the electron density is predicted at special query point vertices that are part of the message passing graph, but only receive messages. The model is tested across multiple data sets of molecules (QM9), liquid ethylene carbonate electrolyte (EC) and LixNiyMnzCo(1-y-z)O2 lithium ion battery cathodes (NMC). For QM9 molecules, the accuracy of the proposed model exceeds typical variability in $\rho(\vec{r})$ obtained from DFT done with different exchange-correlation functionals. The accuracy on all three datasets is beyond state of the art and the computation time is orders of magnitude faster than DFT.  ( 3 min )
    Mixed Logit Models and Network Formation. (arXiv:2006.16516v5 [cs.SI] UPDATED)
    The study of network formation is pervasive in economics, sociology, and many other fields. In this paper, we model network formation as a `choice' that is made by nodes in a network to connect to other nodes. We study these `choices' using discrete-choice models, in which an agent chooses between two or more discrete alternatives. We employ the `repeated-choice' (RC) model to study network formation. We argue that the RC model overcomes important limitations of the multinomial logit (MNL) model, which gives one framework for studying network formation, and that it is well-suited to study network formation. We also illustrate how to use the RC model to accurately study network formation using both synthetic and real-world networks. Using edge-independent synthetic networks, we also compare the performance of the MNL model and the RC model. We find that the RC model estimates the data-generation process of our synthetic networks more accurately than the MNL model. In a patent citation network, which forms sequentially, we present a case study of a qualitatively interesting scenario -- the fact that new patents are more likely to cite older, more cited, and similar patents -- for which employing the RC model yields interesting insights.  ( 3 min )
    Hybrid Spectrogram and Waveform Source Separation. (arXiv:2111.03600v2 [eess.AS] UPDATED)
    Source separation models either work on the spectrogram or waveform domain. In this work, we show how to perform end-to-end hybrid source separation, letting the model decide which domain is best suited for each source, and even combining both. The proposed hybrid version of the Demucs architecture won the Music Demixing Challenge 2021 organized by Sony. This architecture also comes with additional improvements, such as compressed residual branches, local attention or singular value regularization. Overall, a 1.4 dB improvement of the Signal-To-Distortion (SDR) was observed across all sources as measured on the MusDB HQ dataset, an improvement confirmed by human subjective evaluation, with an overall quality rated at 2.83 out of 5 (2.36 for the non hybrid Demucs), and absence of contamination at 3.04 (against 2.37 for the non hybrid Demucs and 2.44 for the second ranking model submitted at the competition).  ( 2 min )
    Bayesian reconstruction of memories stored in neural networks from their connectivity. (arXiv:2105.07416v2 [q-bio.NC] UPDATED)
    The advent of comprehensive synaptic wiring diagrams of large neural circuits has created the field of connectomics and given rise to a number of open research questions. One such question is whether it is possible to reconstruct the information stored in a recurrent network of neurons, given its synaptic connectivity matrix. Here, we address this question by determining when solving such an inference problem is theoretically possible in specific attractor network models and by providing a practical algorithm to do so. The algorithm builds on ideas from statistical physics to perform approximate Bayesian inference and is amenable to exact analysis. We study its performance on three different models, compare the algorithm to standard algorithms such as PCA, and explore the limitations of reconstructing stored patterns from synaptic connectivity.  ( 2 min )
    Neural Network Approximation of Lipschitz Functions in High Dimensions with Applications to Inverse Problems. (arXiv:2208.13305v1 [stat.ML])
    The remarkable successes of neural networks in a huge variety of inverse problems have fueled their adoption in disciplines ranging from medical imaging to seismic analysis over the past decade. However, the high dimensionality of such inverse problems has simultaneously left current theory, which predicts that networks should scale exponentially in the dimension of the problem, unable to explain why the seemingly small networks used in these settings work as well as they do in practice. To reduce this gap between theory and practice, a general method for bounding the complexity required for a neural network to approximate a Lipschitz function on a high-dimensional set with a low-complexity structure is provided herein. The approach is based on the observation that the existence of a linear Johnson-Lindenstrauss embedding $\mathbf{A} \in \mathbb{R}^{d \times D}$ of a given high-dimensional set $\mathcal{S} \subset \mathbb{R}^D$ into a low dimensional cube $[-M,M]^d$ implies that for any Lipschitz function $f : \mathcal{S}\to \mathbb{R}^p$, there exists a Lipschitz function $g : [-M,M]^d \to \mathbb{R}^p$ such that $g(\mathbf{A}\mathbf{x}) = f(\mathbf{x})$ for all $\mathbf{x} \in \mathcal{S}$. Hence, if one has a neural network which approximates $g : [-M,M]^d \to \mathbb{R}^p$, then a layer can be added which implements the JL embedding $\mathbf{A}$ to obtain a neural network which approximates $f : \mathcal{S} \to \mathbb{R}^p$. By pairing JL embedding results along with results on approximation of Lipschitz functions by neural networks, one then obtains results which bound the complexity required for a neural network to approximate Lipschitz functions on high dimensional sets. The end result is a general theoretical framework which can then be used to better explain the observed empirical successes of smaller networks in a wider variety of inverse problems than current theory allows.  ( 3 min )

  • Open

    Building a conversational AI assistant that books a collection of services the same time
    ​ https://reddit.com/link/x13vmp/video/mehnxosjqqk91/player submitted by /u/SudoSharma [link] [comments]  ( 87 min )
    Researchers at Imperial College London Developed Deep Learning-Based Choice Models Under Feature-Free and Feature-Based Settings of Choice Modeling
    submitted by /u/ai-lover [link] [comments]  ( 87 min )
    AI Robot Dog Armed With RPG By Russian Military | AI Monitors Traumatic Brain Injury | Machine Learning Detects Parkinson's Disease Early
    submitted by /u/kenickh [link] [comments]  ( 87 min )
    Are there any tools currently for this purpose?
    Hello! So recently I lost my eyesight due to a rare degenerative eye disorder unfortunately, but I love video games. I heard about this project called Mars vision which had a goal of using a neural net work to help blind people play video games by either announcing what button you were hovering over on a menu, or highlighting certain key points in a game world with sound and it sounded amazing. Unfortunately, it seems like everywhere I look the project has seemingly dropped off the face of the earth and has not been talked about for almost 2 years now. I thought this project was overpromising as if I’m not mistaken, google or by extension deep mind only recently achieved something like this a few months ago with a model that is significant in size, so I am unsure if this is something that would actually be possible or is available right now but I was hoping that by posting here I’d get some useful information. Basically, if the programmer able to do even half of what Mars vision was promising it would be revolutionary for me. Being able to open a mini map and ask an AI where the quest marker is, for example, would be huge. Or, in the case of single player games, asking the AI to help me find some thing or help me get through a quest objective and have it take over control would be pretty incredible. Although those seem far more advanced than visual tasks. I’m eager to hear anything you guys happen to know about, thank you very much for taking the time to read my post! submitted by /u/ChipsAhoiMcCoy [link] [comments]  ( 88 min )
    Artificial Intelligence generated art - JDM japan Tuned Trophy :) Realt...
    submitted by /u/crateone [link] [comments]  ( 87 min )
    Enstil.ai - a partial replacement for Artbreeder, and a complete replacement for wombo.art
    submitted by /u/apeloverage [link] [comments]  ( 100 min )
    A.I. Predicts the Shape of Nearly Every Protein Known to Science
    submitted by /u/SamuelSmith1416 [link] [comments]  ( 87 min )
    Stable Diffusion - Improved Scripting and Grids
    submitted by /u/pwillia7 [link] [comments]  ( 89 min )
    Opportunity to learn MLOps LIVE with Harvard Professor.
    Dr Pavlos Protopapas, who runs the graduate program of Data Science/AI at Harvard University, is offering a fellowship program at Univ.AI in NLP, GANs and Reinforcement Learning, and MLOps to a small group of advanced learners in data science and AI around the world. Dr Pavlos Protopapas with his team will lead the fellowship. Next Fellowship: [MLOps] - starting on 3rd September. --- Geoffrey Hinton Fellowship Highlights (Inaugurated by Geoffrey Hinton himself!) Topics - NLP, GANs and Reinforcement Learning, MLOps Duration - LIVE, online. 6-8 weeks Faculty - Dr Pavlos Protopapas | Scientific Program Director at the IACS at Harvard University Commitment - 2 hours a day 2 days a week + assignments and project Timings for Live sessions* (*subject to change with prior notice) Schedule …  ( 89 min )
    Stable Diffusion :Video Modes-Quick demo of all video modes
    submitted by /u/prfitofthesngularity [link] [comments]  ( 87 min )
    My Showcase of AI Art Made with Midjourney & DreamStudio
    submitted by /u/Spare_Ad6464 [link] [comments]  ( 87 min )
    Lost city of Atlantis. Made with starryai digital art app.
    submitted by /u/eFishhh [link] [comments]  ( 87 min )
    Enter Sandman - Metallica - But the lyrics are Ai generated images (Made using Midjourney AI bot)
    submitted by /u/magenta_placenta [link] [comments]  ( 87 min )
    Secret Museum
    submitted by /u/widgia [link] [comments]  ( 86 min )
    Glimpse into the Future - Imagined by an AI
    submitted by /u/Ziinxx [link] [comments]  ( 87 min )
    Midjourney Video Game Prompt Generator with Python + GPT-3
    submitted by /u/kbf_ [link] [comments]  ( 90 min )
    The A - Z of Machine Learning Book Bundle by Packt
    submitted by /u/8ing8ong [link] [comments]  ( 87 min )
    AI Images: Last Week Tonight with John Oliver (HBO)
    submitted by /u/XXmynameisNeganXX [link] [comments]  ( 87 min )
    Stable Diffusion Colab Notebook! First Look!
    submitted by /u/Available_Tadpole829 [link] [comments]  ( 87 min )
    I asked Midjourney Bot to create images for "stardust" here are the results!
    submitted by /u/Plutopow [link] [comments]  ( 87 min )
    Introduction to Machine Learning with C# and ML.NET
    submitted by /u/RubiksCodeNMZ [link] [comments]  ( 90 min )
    Google AI Open-Sources ‘MultiNeRF’: Image Noise Reduction Artificial Intelligence Project Presented at Computer Vision and Graphics Recognition (CVPR) Conference 2022
    submitted by /u/ai-lover [link] [comments]  ( 101 min )
    just came here to share a pic of my clown hunting dogs
    submitted by /u/VerbNoun123 [link] [comments]  ( 87 min )
  • Open

    [D] Generalization from Noisy Labels
    Hello everyone, I was shooting the shit with someone at a party that work in computer vision and they were making an interesting claim that, given a very large volume of noisy labels, a classifier can essentially "learn" to ignore the noise and successfully "generalize beyond" the label error rate. For example, lets say we had an ImageNet style dataset with 10% label noise. Could the final classifier achieve an accuracy less than 10%? I've been looking up papers about this and there is a lot of work done on training neural nets with noisy labels and strategies for dealing with the noise. However, from the conversation, I got the impression the person was suggesting that a "vanilla" architecture like a ResNet without a significant modification could perform this feat. Looking around on Google Scholar, I haven't really come across a clear cut example of "just give a model a ton of data with noisy labels and it will eventually generalize". This paper, for example, makes this claim but has rather lackluster results based on their actual figures. No unfortunately, the person in question hasn't responded to my emails so I was hoping someone here might be more familiar with work in this area. I work in computer vision but I'd really be interested to read about such results in any domain of deep learning. submitted by /u/ppg_dork [link] [comments]  ( 90 min )
    [D] A simple trick to quickly verify data
    I've been using this trick to help me efficiently verify datasets and easily find stuff like mislabeled examples. It's simple enough that I'm sure a lot of people are using it or some variation thereof, but I wasn't able to find anything online about it. The trick helps you quickly find problematic examples in the data (outliers). Simply you train a model on all your data without cleaning it, then run the trained model on all the data and calculate the loss. You then sort the examples by their loss, and inspect the top x%. The idea is that problematic examples will have high loss, so you'll always find them at the top. Of course not all examples at the top will be problematic, some of them will just be hard. So this also gives you a chance to understand your model's shortcomings. I know there's an autoencoder approach to anomaly detection that's similar to this, but not quite the same. In this approach you use the reconstruction loss. The approach I'm describing however is much simpler since you don't have to train an autoencoder for it; you can just use the model you want to build in the first place. So has anyone been doing something similar? submitted by /u/mkthabet [link] [comments]  ( 91 min )
    [D] How good is ml/nlp/cv research at ASU?
    Title I’ll be joining asu next year very interested in research and doing a phd eventually and it seems like you need to publish some papers at some of these known top conferences like cvpr is asu good in preparing me to get into a good school for ml phd? I apologize in advance if this is not a the right sub submitted by /u/djssoapappskdid [link] [comments]  ( 90 min )
    [P] Call for Participation: DREAM Challenge Preterm Birth Prediction [Top teams sponsored to 2022 DREAM conference in Las Vegas]
    Accurately predicting women at higher risk for preterm birth would help healthcare providers promptly treat those at risk of preterm delivery. DREAM Challenge Preterm Birth Prediction: Microbiome invites the computational and research communities to predict preterm & early preterm birth using microbiome data. Background: Globally, about 11% of infants every year are born preterm, defined as birth prior to 37 weeks of gestation, totaling nearly 15 million births. The vaginal microbiome could play a role in preterm birth, with previous studies showing significant differences between the vaginal microbiome of patients who deliver at term and those who deliver prematurely. We hypothesize that the microbiome could be used as a potential avenue for predicting which women are at a higher risk of delivering birth. Our challenge team curated and merged 9 training datasets of vaginal microbiome data from the public domain, with a total of 3,578 samples, across 1,268 individuals. 851 individuals delivered to term and 417 delivered preterm. We present Challenge participants with 2 tasks: (a) Prediction of preterm birth, and (b) Prediction of early preterm birth. Top teams will be invited to: Attend the 2022 DREAM Conference in Las Vegas, NV - November 8 - 11, 2022. Travel support will be provided on behalf of the challenge organizers. Participate in manuscript design and writing, and therefore eligible for byline authorship. More information at https://www.synapse.org/preterm_birth_microbiome https://preview.redd.it/weoy4zes0qk91.png?width=2222&format=png&auto=webp&s=f7f2c522f7d458cedf901feca260c84b75bc38bb submitted by /u/al1563 [link] [comments]  ( 107 min )
    [D] ML integration into marketing department. Any data scraping tools that can help with the processes?
    Hey guys, in need of some suggestions. I run an online SME, and over the last 3 years, the market we operate within has become more saturated with competition. Due to the market saturation, we found ourselves flowing a large portion of our expense budget into marketing efforts on our webpage, which we are looking to innovate moving forward. My team and I have been looking to integrate a ML model into our site by the end of Q4 this year to strengthen our marketing approaches. More specifically, an ML integration is for us to better understand the journey of our customers on our webpage as well as understanding the overall behaviors of our customers to determine stronger points of interest to develop stronger data-driven recommendations for future campaigns. In order to best utilize ML models for our web page, we are looking around in the market for web scraping tools that can help pull data from our page, as well as competitors, and similar operating sites, so we can continuously understand the changing consumer behaviors to get that upper edge. Curious if anyone within this community has used a data scraping tool for ML models in their marketing department. If so, what specific tools did you have success with, how easy was the integration, as well as the costs involved with such an integration? Any information helps! submitted by /u/EpisodicEthos304 [link] [comments]  ( 93 min )
    [P] Model explainability for 🤗 Diffusers. Get explanations for your generated images with the latest stable diffusion!
    https://github.com/JoaoLages/diffusers-interpret Looking for contributors to improve this package! Generated image for the phrase \"A cute corgi with the Eiffel Tower in the background\" Word importances for the selected region submitted by /u/JClub [link] [comments]  ( 88 min )
    [D] Weird consequence of not freezing layers in Neural Network
    I was researching about "why are we freezing layers" and i came across the answer says "to not lose the information of pre-trained model" But; we are just freezing early layers (i know why). For example: our data is so similar to the data that the model trained on. Let's say we are not freezing any layer. The model will make very small mistakes and convergence will be less, we will not be destroying any information(even if, weights will change very little). Am I wrong? If i am not, then why are we freezing any layer? submitted by /u/Silly_Ad_4008 [link] [comments]  ( 90 min )
    [R] Emotion-based symbolic music (MIDI) generation
    https://www.serkansulun.com/midi/ Here is my recent research project. The website has music samples, and links to the paper and the code. It allows conditioning on an arbitrary emotions, using valence-arousal values, and generates 5-instrument (strings, guitar, bass, piano, drums) rock and pop songs. I hope you find it useful. submitted by /u/councilanderson2 [link] [comments]  ( 89 min )
    [R] Novel Methods for Drift Detection
    Looking for any papers on non-CDF based (K-S or K-L divergence) drift detection methods. What is the preferred method nowadays, as distribution based methods are quite sensitive? Additionally, it would be really cool if someone has a paper for identifying intentional drift caused by poisoning or interference, such as by a GAN or DDOS. submitted by /u/millenial_wh00p [link] [comments]  ( 89 min )
    [D] "A majority of respondents think that the scientific value of the majority of work in NLP is dubious"
    Interesting result in the recent NLP metasurvey: https://nlpsurvey.net/nlp-metasurvey-results.pdf Why is NLP producing so much stuff that NLP people think is suspect, even after review? submitted by /u/leondz [link] [comments]  ( 111 min )
    [R] Deep Learning with a Small Training Batch (or Lack Thereof). Part 1
    This article covers two self-supervised approaches to tackling the image classification issue. These methods are excellent when you only have a small training batch, and they will significantly simplify manual markup or even allow you to ditch it. Some 100,000 new content units are uploaded to our iFunny app daily. Such manual markup would be costly, but we need information on the objects in the content to personalize the feed and select the proper push notifications. This is where self-supervised plans come to the rescue. SimCLR: a framework for unsupervised learning by Google Research Original article: A Simple Framework for Contrastive Learning of Visual Representations The method is based on the contrastive loss function, https://preview.redd.it/4mjsb17q8mk91.png?width=700&format=…  ( 98 min )
    [D] Easily Run Stable Diffusion Image to Image mode
    You can easily and freely try out img2img mode of Stable Diffusion in this demo https://huggingface.co/spaces/huggingface/diffuse-the-rest. This lets you do cool things like turn your drawings into images or combine other models like dalle-mini with stable diffusion. https://preview.redd.it/ysu9a3cr7mk91.jpg?width=1646&format=pjpg&auto=webp&s=c47efcefea96265bcf6ea0fa5255e496cc8aa8e4 Demo link https://huggingface.co/spaces/huggingface/diffuse-the-rest GitHub example https://github.com/huggingface/diffusers/tree/main/examples/inference#image-to-image-text-guided-generation-with-stable-diffusion Colab https://colab.research.google.com/github/patil-suraj/Notebooks/blob/master/image_2_image_using_diffusers.ipynb submitted by /u/hackerllama [link] [comments]  ( 105 min )
    [D] Position embedding in Vision transformer
    Position embedding in vision transformer paper (image is worth 16x16 words...) is added to the projected patch embedding, do the position vectors capture the pixel position within a patch or do they capture only the positional information across patches like patch 1, patch 2 etc .. I'm not able to get an intuition for how they can capture positional information across pixels which is also needed along with patches position submitted by /u/bharys1276 [link] [comments]  ( 89 min )
    [D] Does anyone use performance profiling / flamegraphs for optimizing ML algorithms?
    I know profiling and continuous profiling have become popular for understanding system-wide performance characteristics. I.e. companies like Neflix, Uber (pyflame) , Doordash, Paypal, and many more use profiling and flamegraphs to optimize their application code often by decreasing latency -- I never hear about them using it for their ML related code (this is what my question is ultimately about). There's definitely both a increased-revenue case for using profiling and also a decrease-cost case for profiling as well: Revenue case for code profiling: For all of these companies latency has a major impact on revenue, for example the longer it takes to order an Uber ride, the more likely a user is to go to Lyft. The longer a Netflix show buffers the more likely someone is to go to a different app. Cost case for code profiling: Obviously the longer it takes to complete *some* function the more cpu, memory, etc and so decreasing the latency by x% here can also sort of decrease the cost of some compute resource like ec2 by ~x%. However, one area where I suspect things can be very computationally expensive is in training machine learning algorithms. I know tensorflow has some profiling capabilities and I only casually play around with ML but not in my day job, so my question is: Has anyone here ever gotten significant value from using profiling / flamegraphs in a machine learning use case? submitted by /u/rperry2174 [link] [comments]  ( 91 min )
  • Open

    AI Robot Dog Armed With RPG By Russian Military | AI Monitors Traumatic Brain Injury | Machine Learning Detects Parkinson's Disease Early
    submitted by /u/kenickh [link] [comments]  ( 95 min )
    Opportunity to learn MLOps LIVE with Harvard Professor.
    Dr Pavlos Protopapas, who runs the graduate program of Data Science/AI at Harvard University, is offering a fellowship program at Univ.AI in NLP, GANs and Reinforcement Learning, and MLOps to a small group of advanced learners in data science and AI around the world. Dr Pavlos Protopapas with his team will lead the fellowship. Next Fellowship: [MLOps] - starting on 3rd September. --- Geoffrey Hinton Fellowship Highlights (Inaugurated by Geoffrey Hinton himself!) Topics - NLP, GANs and Reinforcement Learning, MLOps Duration - LIVE, online. 6-8 weeks Faculty - Dr Pavlos Protopapas | Scientific Program Director at the IACS at Harvard University Commitment - 2 hours a day 2 days a week + assignments and project Timings for Live sessions* (*subject to change with prior notice) Schedule …  ( 103 min )
    Introduction to Machine Learning with C# and ML.NET
    submitted by /u/RubiksCodeNMZ [link] [comments]  ( 87 min )
  • Open

    Meet the Omnivore: Artist Fires Up NVIDIA Omniverse to Glaze Animated Ceramics
    Vanessa Rosa’s art transcends time: it merges traditional and contemporary techniques, gives new life to ancient tales and imagines possible futures. The post Meet the Omnivore: Artist Fires Up NVIDIA Omniverse to Glaze Animated Ceramics appeared first on NVIDIA Blog.  ( 6 min )
  • Open

    Use Amazon SageMaker pipeline sharing to view or manage pipelines across AWS accounts
    On August 9, 2022, we announced the general availability of cross-account sharing of Amazon SageMaker Pipelines entities. You can now use cross-account support for Amazon SageMaker Pipelines to share pipeline entities across AWS accounts and access shared pipelines directly through Amazon SageMaker API calls. Customers are increasingly adopting multi-account architectures for deploying and managing machine […]  ( 9 min )
    Explore Amazon SageMaker Data Wrangler capabilities with sample datasets
    Data preparation is the process of collecting, cleaning, and transforming raw data to make it suitable for insight extraction through machine learning (ML) and analytics. Data preparation is crucial for ML and analytics pipelines. Your model and insights will only be as reliable as the data you use for training them. Flawed data will produce […]  ( 8 min )
  • Open

    "Nearest Neighbor Non-autoregressive Text Generation", Niwa et al 2022
    submitted by /u/gwern [link] [comments]  ( 87 min )
    Memory leak in my DQN
    I am training a DQN, and I get the following error - RuntimeError Traceback (most recent call last) Input In [7], in () 21 replay_buffer.append((state, next_state, reward, done, action)) 22 if len(replay_buffer)>batch_size and steps%4==0: ---> 23 loss = compute_td_loss(batch_size) 24 eps_loss += loss.cpu().detach().numpy() 25 eps = eps/(1 + decay_val) Input In [6], in compute_td_loss(batch_size) 3 state = torch.stack(list(state), dim=0).squeeze(1) 4 state= state.reshape(batch_size, 3, 210, 160).to(device) ----> 5 next_state = torch.from_numpy(np.array(next_state)).reshape(batch_size, 3, 210, 160).type(torch.float32).to(device) 6 reward = torch.from_numpy(np.array(reward)).to(device) 7 done = torch.from_numpy(np.array(done)).long().to(device) RuntimeError: [enforce fail at C…  ( 90 min )
    When use positive or negative rewards in reinforcement learning? Is there anything in literature?
    Let's say I can design a reward as function of a distance d>0 from the target in 2 ways: r=1/(1+d) or r=-d. The first is defined in (0,1] the second in (-inf,0]. I would expect both to work since they encode the same information (maybe on e better that the other), but I often see that one works while the other doesn't. I'm wondering if there is a way to predict which one works based on the tasks. In google I don't find any work on it. Or it maybe doesn't depend on the sign but only on the shape, in particular the first is nonlinear (and gives more importance to big distances and less to those close to $0$), while the second is linear (and gives same importance to all values). submitted by /u/riccardogauss [link] [comments]  ( 89 min )
    How is it possible to overcome the biasedness of Reinforcement Learning choosing offers with most rewards in Next Best Action?
    I am using Reinforcement learning to solve next best action problem. I design the actions to be the offers that the customers can subscribe in, and the rewards are the prices of these offers. My state space is a collection of features aggregated for some period of time before the subscription. The Problem that I am facing now is that: no matter of what the states or actions are in the historic data that the agent is experiencing and trying to learn, the q function always give me the action with highest price as the next best action. I know the epsilon greedy is choosing the actions with highest rewards -most of time-. But I feel that something is missing, because there is no effect of the combination of state-action on q. My Question: What else should be used in order to get rid of biasedness? submitted by /u/InformationOrnery194 [link] [comments]  ( 91 min )
    Forcing convergence of PPO to a near-deterministic policy
    Hello, I am training a PPO agent to learn the optimal policy of a stochastic optimization problem. It is known that the optimal policy of such a problem is deterministic given the state of the MDP. However, the policy to which the agent converges is a diagonal gaussian with an average standard deviation of 10, which is about on tenth of the values of the means of the gaussian. The policy learned by the agent must therefore hedge against its own uncertainty and using only the mean at test cannot be optimal. Of course I am not expecting PPO to learn exactly the optimal policy but I was hoping to narrow the gap by somehow forcing the policy to become near-deterministic at the end of training. I tried to remove the entropy bonus at the last iterations, and even change it to an entropy malus. The only effect this has is to destabilize the training and leads to poor results. I was wondering if someone is aware of works about fine tuning a stochastic policy to a deterministic one ? submitted by /u/Playmad37 [link] [comments]  ( 104 min )
  • Open

    A Comprehensive Definition of the Metaverse
    The Metaverse is seen as the fourth wave of computing and networking - the first three being the mainframe, personal computing, and mobile + cloud. Yet, In contrast to the previous waves, the Metaverse adds 3D experience, or immersion. The post A Comprehensive Definition of the Metaverse appeared first on Data Science Central.  ( 18 min )
  • Open

    Reverse engineering the NTK: towards first-principles architecture design
    Foundational works showed how to find the kernel corresponding to a wide network. We find the inverse mapping, showing how to find the wide network corresponding to a given kernel. Deep neural networks have enabled technological wonders ranging from voice recognition to machine transition to protein engineering, but their design and application is nonetheless notoriously unprincipled. The development of tools and methods to guide this process is one of the grand challenges of deep learning theory. In Reverse Engineering the Neural Tangent Kernel, we propose a paradigm for bringing some principle to the art of architecture design using recent theoretical breakthroughs: first design a good kernel function – often a much easier task – and then “reverse-engineer” a net-kernel equivalence…  ( 5 min )
  • Open

    The cost of passing -- using deep learning AIs to expand our understanding of the ancient game of Go. (arXiv:2208.12643v1 [cs.AI])
    AI engines utilizing deep learning neural networks provide excellent tools for analyzing traditional board games. Here we are interested in gaining new insights into the ancient game of Go. For that purpose, we need to define new numerical measures based on the raw output of the engines. In this paper, we develop a numerical tool for automated move-by-move performance evaluation in a context-sensitive manner and for recognizing game features. We measure the urgency of a move by the cost of passing, which is the score value difference between the current configuration of stones and after a hypothetical pass in the same board position. Here we investigate the properties of this measure and describe some applications.
    Information Theory with Kernel Methods. (arXiv:2202.08545v2 [cs.IT] UPDATED)
    We consider the analysis of probability distributions through their associated covariance operators from reproducing kernel Hilbert spaces. We show that the von Neumann entropy and relative entropy of these operators are intimately related to the usual notions of Shannon entropy and relative entropy, and share many of their properties. They come together with efficient estimation algorithms from various oracles on the probability distributions. We also consider product spaces and show that for tensor product kernels, we can define notions of mutual information and joint entropies, which can then characterize independence perfectly, but only partially conditional independence. We finally show how these new notions of relative entropy lead to new upper-bounds on log partition functions, that can be used together with convex optimization within variational inference methods, providing a new family of probabilistic inference methods.
    Efficient privacy-preserving inference for convolutional neural networks. (arXiv:2110.08321v2 [cs.LG] UPDATED)
    The processing of sensitive user data using deep learning models is an area that has gained recent traction. Existing work has leveraged homomorphic encryption (HE) schemes to enable computation on encrypted data. An early work was CryptoNets, which takes 250 seconds for one MNIST inference. The main limitation of such approaches is that of the expensive FFT-like operations required to perform operations on HE-encrypted ciphertext. Others have proposed the use of model pruning and efficient data representations to reduce the number of HE operations required. We focus on improving upon existing work by proposing changes to the representations of intermediate tensors during CNN inference. We construct and evaluate private CNNs on the MNIST and CIFAR-10 datasets, and achieve over a two-fold reduction in the number of operations used for inferences of the CryptoNets architecture.
    SPATL: Salient Parameter Aggregation and Transfer Learning for Heterogeneous Clients in Federated Learning. (arXiv:2111.14345v2 [cs.LG] UPDATED)
    Federated learning~(FL) facilitates the training and deploying AI models on edge devices. Preserving user data privacy in FL introduces several challenges, including expensive communication costs, limited resources, and data heterogeneity. In this paper, we propose SPATL, an FL method that addresses these issues by: (a) introducing a salient parameter selection agent and communicating selected parameters only; (b) splitting a model into a shared encoder and a local predictor, and transferring its knowledge to heterogeneous clients via the locally customized predictor. Additionally, we leverage a gradient control mechanism to further speed up model convergence and increase robustness of training processes. Experiments demonstrate that SPATL reduces communication overhead, accelerates model inference, and enables stable training processes with better results compared to state-of-the-art methods. Our approach reduces communication cost by up to $86.45\%$, accelerates local inference by reducing up to $39.7\%$ FLOPs on VGG-11, and requires $7.4 \times$ less communication overhead when training ResNet-20.
    Visual processing in context of reinforcement learning. (arXiv:2208.12525v1 [cs.LG])
    Although deep reinforcement learning (RL) has recently enjoyed many successes, its methods are still data inefficient, which makes solving numerous problems prohibitively expensive in terms of data. We aim to remedy this by taking advantage of the rich supervisory signal in unlabeled data for learning state representations. This thesis introduces three different representation learning algorithms that have access to different subsets of the data sources that traditional RL algorithms use: (i) GRICA is inspired by independent component analysis (ICA) and trains a deep neural network to output statistically independent features of the input. GrICA does so by minimizing the mutual information between each feature and the other features. Additionally, GrICA only requires an unsorted collection of environment states. (ii) Latent Representation Prediction (LARP) requires more context: in addition to requiring a state as an input, it also needs the previous state and an action that connects them. This method learns state representations by predicting the representation of the environment's next state given a current state and action. The predictor is used with a graph search algorithm. (iii) RewPred learns a state representation by training a deep neural network to learn a smoothed version of the reward function. The representation is used for preprocessing inputs to deep RL, while the reward predictor is used for reward shaping. This method needs only state-reward pairs from the environment for learning the representation. We discover that every method has their strengths and weaknesses, and conclude from our experiments that including unsupervised representation learning in RL problem-solving pipelines can speed up learning.
    Instructions and Guide: Causal Insights for Learning Paths in Education. (arXiv:2208.12610v1 [cs.CY])
    Causal machine learning is a field that focuses on using machine learning methods to tackle causality problems. Despite the recent progress in this field, there are still many unresolved challenges, including missing data, selection bias, unobserved confounders, etc., which are ubiquitous in the real world. Advances in any of the above areas can greatly reduce the gap between research and real-world impact. In this competition, we focus on two fundamental challenges of causal machine learning in the context of education using time-series data. The first is to identify the causal relationships between different constructs, where a construct is defined as the smallest element of learning. The second challenge is to predict the impact of learning one construct on the ability to answer questions on other constructs. Addressing these challenges will not only impact the causal ML community but also enable optimisation of students' knowledge acquisition, which can be deployed in a real edtech solution impacting millions of students. Participants will run these tasks in an idealised environment with synthetic data and a real-world scenario with evaluation data collected from a series of A/B tests. We expect participants to develop novel machine learning methodologies for causal discovery between different constructs and the impact estimation of learning one construct on other constructs, which should bring fundamental advances to causal ML.
    Neural graph embeddings as explicit low-rank matrix factorization for link prediction. (arXiv:2011.09907v3 [cs.SI] UPDATED)
    Learning good quality neural graph embeddings has long been achieved by minimizing the point-wise mutual information (PMI) for co-occurring nodes in simulated random walks. This design choice has been mostly popularized by the direct application of the highly-successful word embedding algorithm word2vec to predicting the formation of new links in social, co-citation, and biological networks. However, such a skeuomorphic design of graph embedding methods entails a truncation of information coming from pairs of nodes with low PMI. To circumvent this issue, we propose an improved approach to learning low-rank factorization embeddings that incorporate information from such unlikely pairs of nodes and show that it can improve the link prediction performance of baseline methods from 1.2% to 24.2%. Based on our results and observations we outline further steps that could improve the design of next graph embedding algorithms that are based on matrix factorization.
    Constraining Gaussian Processes to Systems of Linear Ordinary Differential Equations. (arXiv:2208.12515v1 [cs.LG])
    Data in many applications follows systems of Ordinary Differential Equations (ODEs). This paper presents a novel algorithmic and symbolic construction for covariance functions of Gaussian Processes (GPs) with realizations strictly following a system of linear homogeneous ODEs with constant coefficients, which we call LODE-GPs. Introducing this strong inductive bias into a GP improves modelling of such data. Using smith normal form algorithms, a symbolic technique, we overcome two current restrictions in the state of the art: (1) the need for certain uniqueness conditions in the set of solutions, typically assumed in classical ODE solvers and their probabilistic counterparts, and (2) the restriction to controllable systems, typically assumed when encoding differential equations in covariance functions. We show the effectiveness of LODE-GPs in a number of experiments, for example learning physically interpretable parameters by maximizing the likelihood.
    MonaCoBERT: Monotonic attention based ConvBERT for Knowledge Tracing. (arXiv:2208.12615v1 [cs.CY])
    Knowledge tracing (KT) is a field of study that predicts the future performance of students based on prior performance datasets collected from educational applications such as intelligent tutoring systems, learning management systems, and online courses. Some previous studies on KT have concentrated only on the interpretability of the model, whereas others have focused on enhancing the performance. Models that consider both interpretability and the performance improvement have been insufficient. Moreover, models that focus on performance improvements have not shown an overwhelming performance compared with existing models. In this study, we propose MonaCoBERT, which achieves the best performance on most benchmark datasets and has significant interpretability. MonaCoBERT uses a BERT-based architecture with monotonic convolutional multihead attention, which reflects forgetting behavior of the students and increases the representation power of the model. We can also increase the performance and interpretability using a classical test-theory-based (CTT-based) embedding strategy that considers the difficulty of the question. To determine why MonaCoBERT achieved the best performance and interpret the results quantitatively, we conducted ablation studies and additional analyses using Grad-CAM, UMAP, and various visualization techniques. The analysis results demonstrate that both attention components complement one another and that CTT-based embedding represents information on both global and local difficulties. We also demonstrate that our model represents the relationship between concepts.
    NeuralSI: Structural Parameter Identification in Nonlinear Dynamical Systems. (arXiv:2208.12771v1 [cs.LG])
    Structural monitoring for complex built environments often suffers from mismatch between design, laboratory testing, and actual built parameters. Additionally, real-world structural identification problems encounter many challenges. For example, the lack of accurate baseline models, high dimensionality, and complex multivariate partial differential equations (PDEs) pose significant difficulties in training and learning conventional data-driven algorithms. This paper explores a new framework, dubbed NeuralSI, for structural identification by augmenting PDEs that govern structural dynamics with neural networks. Our approach seeks to estimate nonlinear parameters from governing equations. We consider the vibration of nonlinear beams with two unknown parameters, one that represents geometric and material variations, and another that captures energy losses in the system mainly through damping. The data for parameter estimation is obtained from a limited set of measurements, which is conducive to applications in structural health monitoring where the exact state of an existing structure is typically unknown and only a limited amount of data samples can be collected in the field. The trained model can also be extrapolated under both standard and extreme conditions using the identified structural parameters. We compare with pure data-driven Neural Networks and other classical Physics-Informed Neural Networks (PINNs). Our approach reduces both interpolation and extrapolation errors in displacement distribution by two to five orders of magnitude over the baselines. Code is available at https://github.com/human-analysis/neural-structural-identification
    GCNs-Net: A Graph Convolutional Neural Network Approach for Decoding Time-resolved EEG Motor Imagery Signals. (arXiv:2006.08924v4 [eess.SP] UPDATED)
    Towards developing effective and efficient brain-computer interface (BCI) systems, precise decoding of brain activity measured by electroencephalogram (EEG), is highly demanded. Traditional works classify EEG signals without considering the topological relationship among electrodes. However, neuroscience research has increasingly emphasized network patterns of brain dynamics. Thus, the Euclidean structure of electrodes might not adequately reflect the interaction between signals. To fill the gap, a novel deep learning framework based on the graph convolutional neural networks (GCNs) is presented to enhance the decoding performance of raw EEG signals during different types of motor imagery (MI) tasks while cooperating with the functional topological relationship of electrodes. Based on the absolute Pearson's matrix of overall signals, the graph Laplacian of EEG electrodes is built up. The GCNs-Net constructed by graph convolutional layers learns the generalized features. The followed pooling layers reduce dimensionality, and the fully-connected softmax layer derives the final prediction. The introduced approach has been shown to converge for both personalized and group-wise predictions. It has achieved the highest averaged accuracy, 93.06% and 88.57% (PhysioNet Dataset), 96.24% and 80.89% (High Gamma Dataset), at the subject and group level, respectively, compared with existing studies, which suggests adaptability and robustness to individual variability. Moreover, the performance is stably reproducible among repetitive experiments for cross-validation. The excellent performance of our method has shown that it is an important step towards better BCI approaches. To conclude, the GCNs-Net filters EEG signals based on the functional topological relationship, which manages to decode relevant features for brain motor imagery.
    An approach to implement Reinforcement Learning for Heterogeneous Vehicular Networks. (arXiv:2208.12466v1 [cs.LG])
    This paper presents the extension of the idea of spectrum sharing in the vehicular networks towards the Heterogeneous Vehicular Network(HetVNET) based on multi-agent reinforcement learning. Here, the multiple vehicle-to-vehicle(V2V) links reuse the spectrum of other vehicle-to-interface(V2I) and also those of other networks. The fast-changing environment in vehicular networks limits the idea of centralizing the CSI and allocate the channels. So, the idea of implementing ML-based methods is used here so that it can be implemented in a distributed manner in all vehicles. Here each On-Board Unit(OBU) can sense the signals in the channel and based on that information runs the RL to decide which channel to autonomously take up. Here, each V2V link will be an agent in MARL. The idea is to train the RL model in such a way that these agents will collaborate rather than compete.
    Deep Learning Opacity in Scientific Discovery. (arXiv:2206.00520v2 [cs.AI] UPDATED)
    Philosophers have recently focused on critical, epistemological challenges that arise from the opacity of deep neural networks. One might conclude from this literature that doing good science with opaque models is exceptionally challenging, if not impossible. Yet, this is hard to square with the recent boom in optimism for AI in science alongside a flood of recent scientific breakthroughs driven by AI methods. In this paper, I argue that the disconnect between philosophical pessimism and scientific optimism is driven by a failure to examine how AI is actually used in science. I show that, in order to understand the epistemic justification for AI-powered breakthroughs, philosophers must examine the role played by deep learning as part of a wider process of discovery. The philosophical distinction between the 'context of discovery' and the 'context of justification' is helpful in this regard. I demonstrate the importance of attending to this distinction with two cases drawn from the scientific literature, and show that epistemic opacity need not diminish AI's capacity to lead scientists to significant and justifiable breakthroughs.
    SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning. (arXiv:2207.04606v2 [cs.LG] UPDATED)
    Sparse tensors are rapidly becoming critical components of modern deep learning workloads. However, developing high-performance sparse operators can be difficult and tedious, and existing vendor libraries cannot satisfy the escalating demands from new operators. Sparse tensor compilers simplify the development of operators, but efficient sparse compilation for deep learning remains challenging because a single sparse format cannot maximize hardware efficiency, and single-shot compilers cannot keep up with latest hardware and system advances. We show that the key to addressing both challenges is two forms of composability. In this paper, we propose SparseTIR, a sparse tensor compilation abstraction that offers composable formats and composable transformations for deep learning workloads. SparseTIR constructs a search space over these composable components for performance tuning. With these improvements, SparseTIR obtains consistent performance speedups vs vendor libraries on GPUs for single operators: 1.1-3.3x for GNN operators and 1.1-4.4x for sparse transformer operators. SparseTIR also accelerates end-to-end GNNs by 1.1-2.2x for GraphSAGE training and 0.9-26x for RGCN inference.
    Battery and Hydrogen Energy Storage Control in a Smart Energy Network with Flexible Energy Demand using Deep Reinforcement Learning. (arXiv:2208.12779v1 [eess.SY])
    Smart energy networks provide for an effective means to accommodate high penetrations of variable renewable energy sources like solar and wind, which are key for deep decarbonisation of energy production. However, given the variability of the renewables as well as the energy demand, it is imperative to develop effective control and energy storage schemes to manage the variable energy generation and achieve desired system economics and environmental goals. In this paper, we introduce a hybrid energy storage system composed of battery and hydrogen energy storage to handle the uncertainties related to electricity prices, renewable energy production and consumption. We aim to improve renewable energy utilisation and minimise energy costs and carbon emissions while ensuring energy reliability and stability within the network. To achieve this, we propose a multi-agent deep deterministic policy gradient approach, which is a deep reinforcement learning-based control strategy to optimise the scheduling of the hybrid energy storage system and energy demand in real-time. The proposed approach is model-free and does not require explicit knowledge and rigorous mathematical models of the smart energy network environment. Simulation results based on real-world data show that: (i) integration and optimised operation of the hybrid energy storage system and energy demand reduces carbon emissions by 78.69%, improves cost savings by 23.5% and renewable energy utilisation by over 13.2% compared to other baseline models and (ii) the proposed algorithm outperforms the state-of-the-art self-learning algorithms like deep-Q network.
    LUCID: Exposing Algorithmic Bias through Inverse Design. (arXiv:2208.12786v1 [cs.LG])
    AI systems can create, propagate, support, and automate bias in decision-making processes. To mitigate biased decisions, we both need to understand the origin of the bias and define what it means for an algorithm to make fair decisions. Most group fairness notions assess a model's equality of outcome by computing statistical metrics on the outputs. We argue that these output metrics encounter intrinsic obstacles and present a complementary approach that aligns with the increasing focus on equality of treatment. By Locating Unfairness through Canonical Inverse Design (LUCID), we generate a canonical set that shows the desired inputs for a model given a preferred output. The canonical set reveals the model's internal logic and exposes potential unethical biases by repeatedly interrogating the decision-making process. We evaluate LUCID on the UCI Adult and COMPAS data sets and find that some biases detected by a canonical set differ from those of output metrics. The results show that by shifting the focus towards equality of treatment and looking into the algorithm's internal workings, the canonical sets are a valuable addition to the toolbox of algorithmic fairness evaluation.
    Traffic Congestion Prediction Using Machine Learning Techniques. (arXiv:2206.10983v2 [cs.LG] UPDATED)
    The prediction of traffic congestion can serve a crucial role in making future decisions. Although many studies have been conducted regarding congestion, most of these could not cover all the important factors (e.g., weather conditions). We proposed a prediction model for traffic congestion that can predict congestion based on day, time and several weather data (e.g., temperature, humidity). To evaluate our model, it has been tested against the traffic data of New Delhi. With this model, congestion of a road can be predicted one week ahead with an average RMSE of 1.12. Therefore, this model can be used to take preventive measure beforehand.
    Play with Emotion: Affect-Driven Reinforcement Learning. (arXiv:2208.12622v1 [cs.LG])
    This paper introduces a paradigm shift by viewing the task of affect modeling as a reinforcement learning (RL) process. According to the proposed paradigm, RL agents learn a policy (i.e. affective interaction) by attempting to maximize a set of rewards (i.e. behavioral and affective patterns) via their experience with their environment (i.e. context). Our hypothesis is that RL is an effective paradigm for interweaving affect elicitation and manifestation with behavioral and affective demonstrations. Importantly, our second hypothesis-building on Damasio's somatic marker hypothesis-is that emotion can be the facilitator of decision-making. We test our hypotheses in a racing game by training Go-Blend agents to model human demonstrations of arousal and behavior; Go-Blend is a modified version of the Go-Explore algorithm which has recently showcased supreme performance in hard exploration tasks. We first vary the arousal-based reward function and observe agents that can effectively display a palette of affect and behavioral patterns according to the specified reward. Then we use arousal-based state selection mechanisms in order to bias the strategies that Go-Blend explores. Our findings suggest that Go-Blend not only is an efficient affect modeling paradigm but, more importantly, affect-driven RL improves exploration and yields higher performing agents, validating Damasio's hypothesis in the domain of games.
    Merging Models with Fisher-Weighted Averaging. (arXiv:2111.09832v2 [cs.LG] UPDATED)
    Averaging the parameters of models that have the same architecture and initialization can provide a means of combining their respective capabilities. In this paper, we take the perspective that this "merging" operation can be seen as choosing parameters that approximately maximize the joint likelihood of the posteriors of the models' parameters. Computing a simple average of the models' parameters therefore corresponds to making an isotropic Gaussian approximation to their posteriors. We develop an alternative merging procedure based on the Laplace approximation where we approximate each model's posterior as a Gaussian distribution whose precision matrix corresponds to its Fisher information. We first show that our "Fisher merging" technique provides a performance boost in settings where simple parameter averaging is currently used -- specifically, robust fine-tuning and model ensembling. Then, we compare merging to standard gradient-based transfer learning and demonstrate that merging enables a fundamentally different method for transferring capabilities across models. Specifically, we show that Fisher merging is competitive with gradient-based transfer learning approaches (while being significantly cheaper) in intermediate-task training and domain-adaptive pre-training. We also show that our merging procedure makes it possible to combine models in previously unexplored ways. We release our code to facilitate future research into methods for merging models.
    Universal Mini-Batch Consistency for Set Encoding Functions. (arXiv:2208.12401v1 [cs.LG])
    Previous works have established solid foundations for neural set functions, as well as effective architectures which preserve the necessary properties for operating on sets, such as being invariant to permutations of the set elements. Subsequently, Mini-Batch Consistency (MBC), the ability to sequentially process any permutation of any random set partition scheme while maintaining consistency guarantees on the output, has been established but with limited options for network architectures. We further study the MBC property in neural set encoding functions, establishing a method for converting arbitrary non-MBC models to satisfy MBC. In doing so, we provide a framework for a universally-MBC (UMBC) class of set functions. Additionally, we explore an interesting dropout strategy made possible by our framework, and investigate its effects on probabilistic calibration under test-time distributional shifts. We validate UMBC with proofs backed by unit tests, also providing qualitative/quantitative experiments on toy data, clean and corrupted point cloud classification, and amortized clustering on ImageNet. The results demonstrate the utility of UMBC, and we further discover that our dropout strategy improves uncertainty calibration.
    Static Seeding and Clustering of LSTM Embeddings to Learn from Loosely Time-Decoupled Events. (arXiv:2208.12389v1 [cs.LG])
    Humans learn from the occurrence of events in a different place and time to predict similar trajectories of events. We define Loosely Decoupled Timeseries (LDT) phenomena as two or more events that could happen in different places and across different timelines but share similarities in the nature of the event and the properties of the location. In this work we improve on the use of Recurring Neural Networks (RNN), in particular Long Short-Term Memory (LSTM) networks, to enable AI solutions that generate better timeseries predictions for LDT. We use similarity measures between timeseries based on the trends and introduce embeddings representing those trends. The embeddings represent properties of the event which, coupled with the LSTM structure, can be clustered to identify similar temporally unaligned events. In this paper, we explore methods of seeding a multivariate LSTM from time-invariant data related to the geophysical and demographic phenomena being modeled by the LSTM. We apply these methods on the timeseries data derived from the COVID-19 detected infection and death cases. We use publicly available socio-economic data to seed the LSTM models, creating embeddings, to determine whether such seeding improves case predictions. The embeddings produced by these LSTMs are clustered to identify best-matching candidates for forecasting an evolving timeseries. Applying this method, we show an improvement in 10-day moving average predictions of disease propagation at the US County level.
    Momentum Capsule Networks. (arXiv:2201.11091v2 [cs.CV] UPDATED)
    Capsule networks are a class of neural networks that achieved promising results on many computer vision tasks. However, baseline capsule networks have failed to reach state-of-the-art results on more complex datasets due to the high computation and memory requirements. We tackle this problem by proposing a new network architecture, called Momentum Capsule Network (MoCapsNet). MoCapsNets are inspired by Momentum ResNets, a type of network that applies reversible residual building blocks. Reversible networks allow for recalculating activations of the forward pass in the backpropagation algorithm, so those memory requirements can be drastically reduced. In this paper, we provide a framework on how invertible residual building blocks can be applied to capsule networks. We will show that MoCapsNet beats the accuracy of baseline capsule networks on MNIST, SVHN, CIFAR-10 and CIFAR-100 while using considerably less memory. The source code is available on https://github.com/moejoe95/MoCapsNet.
    DPAUC: Differentially Private AUC Computation in Federated Learning. (arXiv:2208.12294v1 [cs.LG])
    Federated learning (FL) has gained significant attention recently as a privacy-enhancing tool to jointly train a machine learning model by multiple participants. The prior work on FL has mostly studied how to protect label privacy during model training. However, model evaluation in FL might also lead to potential leakage of private label information. In this work, we propose an evaluation algorithm that can accurately compute the widely used AUC (area under the curve) metric when using the label differential privacy (DP) in FL. Through extensive experiments, we show our algorithms can compute accurate AUCs compared to the ground truth.
    SoftHebb: Bayesian Inference in Unsupervised Hebbian Soft Winner-Take-All Networks. (arXiv:2107.05747v3 [cs.LG] UPDATED)
    Hebbian plasticity in winner-take-all (WTA) networks is highly attractive for neuromorphic on-chip learning, owing to its efficient, local, unsupervised, and on-line nature. Moreover, its biological plausibility may help overcome important limitations of artificial algorithms, such as their susceptibility to adversarial attacks and long training time. However, Hebbian WTA learning has found little use in machine learning (ML), likely because it has been missing an optimization theory compatible with deep learning (DL). Here we show rigorously that WTA networks constructed by standard DL elements, combined with a Hebbian-like plasticity that we derive, maintain a Bayesian generative model of the data. Importantly, without any supervision, our algorithm, SoftHebb, minimizes cross-entropy, i.e. a common loss function in supervised DL. We show this theoretically and in practice. The key is a "soft" WTA where there is no absolute "hard" winner neuron. Strikingly, in shallow-network comparisons with backpropagation (BP), SoftHebb shows advantages beyond its Hebbian efficiency. Namely, it converges faster and is significantly more robust to noise and adversarial attacks. Notably, attacks that maximally confuse SoftHebb are also confusing to the human eye, potentially linking human perceptual robustness, with Hebbian WTA circuits of cortex. Finally, SoftHebb can generate synthetic objects as interpolations of real object classes. All in all, Hebbian efficiency, theoretical underpinning, cross-entropy-minimization, and surprising empirical advantages, suggest that SoftHebb may inspire highly neuromorphic and radically different, but practical and advantageous learning algorithms and hardware accelerators.
    Light-weight probing of unsupervised representations for Reinforcement Learning. (arXiv:2208.12345v1 [cs.LG])
    Unsupervised visual representation learning offers the opportunity to leverage large corpora of unlabeled trajectories to form useful visual representations, which can benefit the training of reinforcement learning (RL) algorithms. However, evaluating the fitness of such representations requires training RL algorithms which is computationally intensive and has high variance outcomes. To alleviate this issue, we design an evaluation protocol for unsupervised RL representations with lower variance and up to 600x lower computational cost. Inspired by the vision community, we propose two linear probing tasks: predicting the reward observed in a given state, and predicting the action of an expert in a given state. These two tasks are generally applicable to many RL domains, and we show through rigorous experimentation that they correlate strongly with the actual downstream control performance on the Atari100k Benchmark. This provides a better method for exploring the space of pretraining algorithms without the need of running RL evaluations for every setting. Leveraging this framework, we further improve existing self-supervised learning (SSL) recipes for RL, highlighting the importance of the forward model, the size of the visual backbone, and the precise formulation of the unsupervised objective.
    Dynamic Regret of Online Markov Decision Processes. (arXiv:2208.12483v1 [cs.LG])
    We investigate online Markov Decision Processes (MDPs) with adversarially changing loss functions and known transitions. We choose dynamic regret as the performance measure, defined as the performance difference between the learner and any sequence of feasible changing policies. The measure is strictly stronger than the standard static regret that benchmarks the learner's performance with a fixed compared policy. We consider three foundational models of online MDPs, including episodic loop-free Stochastic Shortest Path (SSP), episodic SSP, and infinite-horizon MDPs. For these three models, we propose novel online ensemble algorithms and establish their dynamic regret guarantees respectively, in which the results for episodic (loop-free) SSP are provably minimax optimal in terms of time horizon and certain non-stationarity measure. Furthermore, when the online environments encountered by the learner are predictable, we design improved algorithms and achieve better dynamic regret bounds for the episodic (loop-free) SSP; and moreover, we demonstrate impossibility results for the infinite-horizon MDPs.
    Replication Robust Payoff Allocation in Submodular Cooperative Games. (arXiv:2006.14583v5 [cs.LG] UPDATED)
    Submodular functions have been a powerful mathematical model for a wide range of real-world applications. Recently, submodular functions are becoming increasingly important in machine learning (ML) for modelling notions such as information and redundancy among entities such as data and features. Among these applications, a key question is payoff allocation, i.e., how to evaluate the importance of each entity towards the collective objective? To this end, classic solution concepts from cooperative game theory offer principled approaches to payoff allocation. However, despite the extensive body of game-theoretic literature, payoff allocation in submodular games are relatively under-researched. In particular, an important notion that arises in the emerging submodular applications is redundancy, which may occur from various sources such as abundant data or malicious manipulations where a player replicates its resource and act under multiple identities. Though many game-theoretic solution concepts can be directly used in submodular games, naively applying them for payoff allocation in these settings may incur robustness issues against replication. In this paper, we systematically study the replication manipulation in submodular games and investigate replication robustness, a metric that quantitatively measures the robustness of solution concepts against replication. Using this metric, we present conditions which theoretically characterise the robustness of semivalues, a wide family of solution concepts including the Shapley and Banzhaf value. Moreover, we empirically validate our theoretical results on an emerging submodular ML application, i.e., the ML data market.
    Balanced Contrastive Learning for Long-Tailed Visual Recognition. (arXiv:2207.09052v2 [cs.CV] UPDATED)
    Real-world data typically follow a long-tailed distribution, where a few majority categories occupy most of the data while most minority categories contain a limited number of samples. Classification models minimizing cross-entropy struggle to represent and classify the tail classes. Although the problem of learning unbiased classifiers has been well studied, methods for representing imbalanced data are under-explored. In this paper, we focus on representation learning for imbalanced data. Recently, supervised contrastive learning has shown promising performance on balanced data recently. However, through our theoretical analysis, we find that for long-tailed data, it fails to form a regular simplex which is an ideal geometric configuration for representation learning. To correct the optimization behavior of SCL and further improve the performance of long-tailed visual recognition, we propose a novel loss for balanced contrastive learning (BCL). Compared with SCL, we have two improvements in BCL: class-averaging, which balances the gradient contribution of negative classes; class-complement, which allows all classes to appear in every mini-batch. The proposed balanced contrastive learning (BCL) method satisfies the condition of forming a regular simplex and assists the optimization of cross-entropy. Equipped with BCL, the proposed two-branch framework can obtain a stronger feature representation and achieve competitive performance on long-tailed benchmark datasets such as CIFAR-10-LT, CIFAR-100-LT, ImageNet-LT, and iNaturalist2018. Our code is available at https://github.com/FlamieZhu/BCL .
    Large-N dynamics of the spiked tensor model with random initial conditions. (arXiv:2208.12586v1 [hep-th])
    In these notes, we develop a path integral approach for the partial differential equations with random initial conditions. Then, we apply it to the dynamics of the spiked tensor model and show that the large-$N$ saddle point equations are dominated by the melonic type diagrams.
    A machine learning approach to predict the structural and magnetic properties of Heusler alloy families. (arXiv:2208.12705v1 [cond-mat.mtrl-sci])
    Random forest (RF) regression model is used to predict the lattice constant, magnetic moment and formation energies of full Heusler alloys, half Heusler alloys, inverse Heusler alloys and quaternary Heusler alloys based on existing as well as indigenously prepared databases. Prior analysis was carried out to check the distribution of the data points of the response variables and found that in most of the cases, the data is not normally distributed. The outcome of the RF model performance is sufficiently accurate to predict the response variables on the test data and also shows its robustness against overfitting, outliers, multicollinearity and distribution of data points. The parity plots between the machine learning predicted values against the computed values using density functional theory (DFT) shows linear behavior with adjusted R2 values lying in the range of 0.80 to 0.94 for all the predicted properties for different types of Heusler alloys. Feature importance analysis shows that the valence electron numbers plays an important feature role in the prediction for most of the predicted outcomes. Case studies with one full Heusler alloy and one quaternary Heusler alloy were also mentioned comparing the machine learning predicted results with our earlier theoretical calculated values and experimentally measured results, suggesting high accuracy of the model predicted results.
    Learning Time Delay Systems with Neural Ordinary Differential Equations. (arXiv:2206.14288v2 [cs.LG] UPDATED)
    A novel way of using neural networks to learn the dynamics of time delay systems from sequential data is proposed. A neural network with trainable delays is used to approximate the right hand side of a delay differential equation. We relate the delay differential equation to an ordinary differential equation by discretizing the time history and train the corresponding neural ordinary differential equation (NODE) to learn the dynamics. An example on learning the dynamics of the Mackey-Glass equation using data from chaotic behavior is given. After learning both the nonlinearity and the time delay, we demonstrate that the bifurcation diagram of the neural network matches that of the original system.
    A New Class of Efficient Adaptive Filters for Online Nonlinear Modeling. (arXiv:2104.09641v2 [cs.LG] UPDATED)
    Nonlinear models are known to provide excellent performance in real-world applications that often operate in non-ideal conditions. However, such applications often require online processing to be performed with limited computational resources. To address this problem, we propose a new class of efficient nonlinear models for online applications. The proposed algorithms are based on linear-in-the-parameters (LIP) nonlinear filters using functional link expansions. In order to make this class of functional link adaptive filters (FLAFs) efficient, we propose low-complexity expansions and frequency-domain adaptation of the parameters. Among this family of algorithms, we also define the partitioned-block frequency-domain FLAF, whose implementation is particularly suitable for online nonlinear modeling problems. We assess and compare frequency-domain FLAFs with different expansions providing the best possible tradeoff between performance and computational complexity. Experimental results prove that the proposed algorithms can be considered as an efficient and effective solution for online applications, such as the acoustic echo cancellation, even in the presence of adverse nonlinear conditions and with limited availability of computational resources.
    Causal Bandits for Linear Structural Equation Models. (arXiv:2208.12764v1 [stat.ML])
    This paper studies the problem of designing an optimal sequence of interventions in a causal graphical model to minimize the cumulative regret with respect to the best intervention in hindsight. This is, naturally, posed as a causal bandit problem. The focus is on causal bandits for linear structural equation models (SEMs) and soft interventions. It is assumed that the graph's structure is known, and it has $N$ nodes. Two linear mechanisms, one soft intervention and one observational, are assumed for each node, giving rise to $2^N$ possible interventions. The existing causal bandit algorithms assume that at least the interventional distributions of the reward node's parents are fully specified. However, there are $2^N$ such distributions (one corresponding to each intervention), acquiring which becomes prohibitive even in moderate-sized graphs. This paper dispenses with the assumption of knowing these distributions. Two algorithms are proposed for the frequentist (UCB-based) and Bayesian (Thompson Sampling-based) settings. The key idea of these algorithms is to avoid directly estimating the $2^N$ reward distributions and instead estimate the parameters that fully specify the SEMs (linear in $N$) and use them to compute the rewards. In both algorithms, under boundedness assumptions on noise and the parameter space, the cumulative regrets scale as $\tilde{\cal O} ((2d)^L L \sqrt{T})$, where $d$ is the graph's maximum degree, and $L$ is the length of its longest causal path.
    Generalizability of Code Clone Detection on CodeBERT. (arXiv:2208.12588v1 [cs.SE])
    Transformer networks such as CodeBERT already achieve outstanding results for code clone detection in benchmark datasets, so one could assume that this task has already been solved. However, code clone detection is not a trivial task. Semantic code clones, in particular, are challenging to detect. We show that the generalizability of CodeBERT decreases by evaluating two different subsets of Java code clones from BigCloneBench. We observe a significant drop in F1 score when we evaluate different code snippets and functionality IDs than those used for model building.
    Why is the video analytics accuracy fluctuating, and what can we do about it?. (arXiv:2208.12644v1 [cs.CV])
    It is a common practice to think of a video as a sequence of images (frames), and re-use deep neural network models that are trained only on images for similar analytics tasks on videos. In this paper, we show that this leap of faith that deep learning models that work well on images will also work well on videos is actually flawed. We show that even when a video camera is viewing a scene that is not changing in any human-perceptible way, and we control for external factors like video compression and environment (lighting), the accuracy of video analytics application fluctuates noticeably. These fluctuations occur because successive frames produced by the video camera may look similar visually, but these frames are perceived quite differently by the video analytics applications. We observed that the root cause for these fluctuations is the dynamic camera parameter changes that a video camera automatically makes in order to capture and produce a visually pleasing video. The camera inadvertently acts as an unintentional adversary because these slight changes in the image pixel values in consecutive frames, as we show, have a noticeably adverse impact on the accuracy of insights from video analytics tasks that re-use image-trained deep learning models. To address this inadvertent adversarial effect from the camera, we explore the use of transfer learning techniques to improve learning in video analytics tasks through the transfer of knowledge from learning on image analytics tasks. In particular, we show that our newly trained Yolov5 model reduces fluctuation in object detection across frames, which leads to better tracking of objects(40% fewer mistakes in tracking). Our paper also provides new directions and techniques to mitigate the camera's adversarial effect on deep learning models used for video analytics applications.
    Multi-instrument Music Synthesis with Spectrogram Diffusion. (arXiv:2206.05408v2 [cs.SD] UPDATED)
    An ideal music synthesizer should be both interactive and expressive, generating high-fidelity audio in realtime for arbitrary combinations of instruments and notes. Recent neural synthesizers have exhibited a tradeoff between domain-specific models that offer detailed control of only specific instruments, or raw waveform models that can train on any music but with minimal control and slow generation. In this work, we focus on a middle ground of neural synthesizers that can generate audio from MIDI sequences with arbitrary combinations of instruments in realtime. This enables training on a wide range of transcription datasets with a single model, which in turn offers note-level control of composition and instrumentation across a wide range of instruments. We use a simple two-stage process: MIDI to spectrograms with an encoder-decoder Transformer, then spectrograms to audio with a generative adversarial network (GAN) spectrogram inverter. We compare training the decoder as an autoregressive model and as a Denoising Diffusion Probabilistic Model (DDPM) and find that the DDPM approach is superior both qualitatively and as measured by audio reconstruction and Fr\'echet distance metrics. Given the interactivity and generality of this approach, we find this to be a promising first step towards interactive and expressive neural synthesis for arbitrary combinations of instruments and notes.
    Prospect Theory-inspired Automated P2P Energy Trading with Q-learning-based Dynamic Pricing. (arXiv:2208.12777v1 [cs.AI])
    The widespread adoption of distributed energy resources, and the advent of smart grid technologies, have allowed traditionally passive power system users to become actively involved in energy trading. Recognizing the fact that the traditional centralized grid-driven energy markets offer minimal profitability to these users, recent research has shifted focus towards decentralized peer-to-peer (P2P) energy markets. In these markets, users trade energy with each other, with higher benefits than buying or selling to the grid. However, most researches in P2P energy trading largely overlook the user perception in the trading process, assuming constant availability, participation, and full compliance. As a result, these approaches may result in negative attitudes and reduced engagement over time. In this paper, we design an automated P2P energy market that takes user perception into account. We employ prospect theory to model the user perception and formulate an optimization framework to maximize the buyer's perception while matching demand and production. Given the non-linear and non-convex nature of the optimization problem, we propose Differential Evolution-based Algorithm for Trading Energy called DEbATE. Additionally, we introduce a risk-sensitive Q-learning algorithm, named Pricing mechanism with Q-learning and Risk-sensitivity (PQR), which learns the optimal price for sellers considering their perceived utility. Results based on real traces of energy consumption and production, as well as realistic prospect theory functions, show that our approach achieves a 26% higher perceived value for buyers and generates 7% more reward for sellers, compared to a recent state of the art approach.
    Zero-Shot Learning with Common Sense Knowledge Graphs. (arXiv:2006.10713v4 [cs.LG] UPDATED)
    Zero-shot learning relies on semantic class representations such as hand-engineered attributes or learned embeddings to predict classes without any labeled examples. We propose to learn class representations by embedding nodes from common sense knowledge graphs in a vector space. Common sense knowledge graphs are an untapped source of explicit high-level knowledge that requires little human effort to apply to a range of tasks. To capture the knowledge in the graph, we introduce ZSL-KG, a general-purpose framework with a novel transformer graph convolutional network (TrGCN) for generating class representations. Our proposed TrGCN architecture computes non-linear combinations of node neighbourhoods. Our results show that ZSL-KG improves over existing WordNet-based methods on five out of six zero-shot benchmark datasets in language and vision.
    Catch Me If You Hear Me: Audio-Visual Navigation in Complex Unmapped Environments with Moving Sounds. (arXiv:2111.14843v2 [cs.SD] UPDATED)
    Audio-visual navigation combines sight and hearing to navigate to a sound-emitting source in an unmapped environment. While recent approaches have demonstrated the benefits of audio input to detect and find the goal, they focus on clean and static sound sources and struggle to generalize to unheard sounds. In this work, we propose the novel dynamic audio-visual navigation benchmark which requires catching a moving sound source in an environment with noisy and distracting sounds, posing a range of new challenges. We introduce a reinforcement learning approach that learns a robust navigation policy for these complex settings. To achieve this, we propose an architecture that fuses audio-visual information in the spatial feature space to learn correlations of geometric information inherent in both local maps and audio signals. We demonstrate that our approach consistently outperforms the current state-of-the-art by a large margin across all tasks of moving sounds, unheard sounds, and noisy environments, on two challenging 3D scanned real-world environments, namely Matterport3D and Replica. The benchmark is available at this http URL
    Learned Turbulence Modelling with Differentiable Fluid Solvers: Physics-based Loss-functions and Optimisation Horizons. (arXiv:2202.06988v2 [physics.flu-dyn] UPDATED)
    In this paper, we train turbulence models based on convolutional neural networks. These learned turbulence models improve under-resolved low resolution solutions to the incompressible Navier-Stokes equations at simulation time. Our study involves the development of a differentiable numerical solver that supports the propagation of optimisation gradients through multiple solver steps. The significance of this property is demonstrated by the superior stability and accuracy of those models that unroll more solver steps during training. Furthermore, we introduce loss terms based on turbulence physics that further improve the model accuracy. This approach is applied to three two-dimensional turbulence flow scenarios, a homogeneous decaying turbulence case, a temporally evolving mixing layer, and a spatially evolving mixing layer. Our models achieve significant improvements of long-term a-posteriori statistics when compared to no-model simulations, without requiring these statistics to be directly included in the learning targets. At inference time, our proposed method also gains substantial performance improvements over similarly accurate, purely numerical methods.
    Defect Prediction Using Stylistic Metrics. (arXiv:2206.10959v3 [cs.SE] UPDATED)
    Defect prediction is one of the most popular research topics due to its potential to minimize software quality assurance efforts. Existing approaches have examined defect prediction from various perspectives such as complexity and developer metrics. However, none of these consider programming style for defect prediction. This paper aims at analyzing the impact of stylistic metrics on both within-project and crossproject defect prediction. For prediction, 4 widely used machine learning algorithms namely Naive Bayes, Support Vector Machine, Decision Tree and Logistic Regression are used. The experiment is conducted on 14 releases of 5 popular, open source projects. F1, Precision and Recall are inspected to evaluate the results. Results reveal that stylistic metrics are a good predictor of defects.
    Algebraically Explainable Controllers: Decision Trees and Support Vector Machines Join Forces. (arXiv:2208.12804v1 [cs.LG])
    Recently, decision trees (DT) have been used as an explainable representation of controllers (a.k.a. strategies, policies, schedulers). Although they are often very efficient and produce small and understandable controllers for discrete systems, complex continuous dynamics still pose a challenge. In particular, when the relationships between variables take more complex forms, such as polynomials, they cannot be obtained using the available DT learning procedures. In contrast, support vector machines provide a more powerful representation, capable of discovering many such relationships, but not in an explainable form. Therefore, we suggest to combine the two frameworks in order to obtain an understandable representation over richer, domain-relevant algebraic predicates. We demonstrate and evaluate the proposed method experimentally on established benchmarks.
    Quality Diversity Evolutionary Learning of Decision Trees. (arXiv:2208.12758v1 [cs.NE])
    Addressing the need for explainable Machine Learning has emerged as one of the most important research directions in modern Artificial Intelligence (AI). While the current dominant paradigm in the field is based on black-box models, typically in the form of (deep) neural networks, these models lack direct interpretability for human users, i.e., their outcomes (and, even more so, their inner working) are opaque and hard to understand. This is hindering the adoption of AI in safety-critical applications, where high interests are at stake. In these applications, explainable by design models, such as decision trees, may be more suitable, as they provide interpretability. Recent works have proposed the hybridization of decision trees and Reinforcement Learning, to combine the advantages of the two approaches. So far, however, these works have focused on the optimization of those hybrid models. Here, we apply MAP-Elites for diversifying hybrid models over a feature space that captures both the model complexity and its behavioral variability. We apply our method on two well-known control problems from the OpenAI Gym library, on which we discuss the "illumination" patterns projected by MAP-Elites, comparing its results against existing similar approaches.
    Yformer: U-Net Inspired Transformer Architecture for Far Horizon Time Series Forecasting. (arXiv:2110.08255v2 [cs.LG] UPDATED)
    Time series data is ubiquitous in research as well as in a wide variety of industrial applications. Effectively analyzing the available historical data and providing insights into the far future allows us to make effective decisions. Recent research has witnessed the superior performance of transformer-based architectures, especially in the regime of far horizon time series forecasting. However, the current state of the art sparse Transformer architectures fail to couple down- and upsampling procedures to produce outputs in a similar resolution as the input. We propose the Yformer model, based on a novel Y-shaped encoder-decoder architecture that (1) uses direct connection from the downscaled encoder layer to the corresponding upsampled decoder layer in a U-Net inspired architecture, (2) Combines the downscaling/upsampling with sparse attention to capture long-range effects, and (3) stabilizes the encoder-decoder stacks with the addition of an auxiliary reconstruction loss. Extensive experiments have been conducted with relevant baselines on four benchmark datasets, demonstrating an average improvement of 19.82, 18.41 percentage MSE and 13.62, 11.85 percentage MAE in comparison to the current state of the art for the univariate and the multivariate settings respectively.
    An Analytical Approach to Compute the Exact Preimage of Feed-Forward Neural Networks. (arXiv:2203.00438v4 [cs.LG] UPDATED)
    Neural networks are a convenient way to automatically fit functions that are too complex to be described by hand. The downside of this approach is that it leads to build a black-box without understanding what happened inside. Finding the preimage would help to better understand how and why such neural networks had given such outputs. Because most of the neural networks are noninjective function, it is often impossible to compute it entirely only by a numerical way. The point of this study is to give a method to compute the exact preimage of any Feed-Forward Neural Network with linear or piecewise linear activation functions for hidden layers. In contrast to other methods, this one is not returning a unique solution for a unique output but returns analytically the entire and exact preimage.
    Compilation and Optimizations for Efficient Machine Learning on Embedded Systems. (arXiv:2206.03326v2 [cs.LG] UPDATED)
    Deep Neural Networks (DNNs) have achieved great success in a variety of machine learning (ML) applications, delivering high-quality inferencing solutions in computer vision, natural language processing, and virtual reality, etc. However, DNN-based ML applications also bring much increased computational and storage requirements, which are particularly challenging for embedded systems with limited compute/storage resources, tight power budgets, and small form factors. Challenges also come from the diverse application-specific requirements, including real-time responses, high-throughput performance, and reliable inference accuracy. To address these challenges, we introduce a series of effective design methodologies, including efficient ML model designs, customized hardware accelerator designs, and hardware/software co-design strategies to enable efficient ML applications on embedded systems.
    A Framework for Inherently Interpretable Optimization Models. (arXiv:2208.12570v1 [math.OC])
    With dramatic improvements in optimization software, the solution of large-scale problems that seemed intractable decades ago are now a routine task. This puts even more real-world applications into the reach of optimizers. At the same time, solving optimization problems often turns out to be one of the smaller difficulties when putting solutions into practice. One major barrier is that the optimization software can be perceived as a black box, which may produce solutions of high quality, but can create completely different solutions when circumstances change leading to low acceptance of optimized solutions. Such issues of interpretability and explainability have seen significant attention in other areas, such as machine learning, but less so in optimization. In this paper we propose an optimization framework to derive solutions that inherently come with an easily comprehensible explanatory rule, under which circumstances which solution should be chosen. Focussing on decision trees to represent explanatory rules, we propose integer programming formulations as well as a heuristic method that ensure applicability of our approach even for large-scale problems. Computational experiments using random and real-world data indicate that the costs of inherent interpretability can be very small.
    PDD-SHAP: Fast Approximations for Shapley Values using Functional Decomposition. (arXiv:2208.12595v1 [cs.LG])
    Because of their strong theoretical properties, Shapley values have become very popular as a way to explain predictions made by black box models. Unfortuately, most existing techniques to compute Shapley values are computationally very expensive. We propose PDD-SHAP, an algorithm that uses an ANOVA-based functional decomposition model to approximate the black-box model being explained. This allows us to calculate Shapley values orders of magnitude faster than existing methods for large datasets, significantly reducing the amortized cost of computing Shapley values when many predictions need to be explained.
    Semantic Text-to-Face GAN -ST^2FG. (arXiv:2107.10756v3 [cs.CV] UPDATED)
    Faces generated using generative adversarial networks (GANs) have reached unprecedented realism. These faces, also known as "Deep Fakes", appear as realistic photographs with very little pixel-level distortions. While some work has enabled the training of models that lead to the generation of specific properties of the subject, generating a facial image based on a natural language description has not been fully explored. For security and criminal identification, the ability to provide a GAN-based system that works like a sketch artist would be incredibly useful. In this paper, we present a novel approach to generate facial images from semantic text descriptions. The learned model is provided with a text description and an outline of the type of face, which the model uses to sketch the features. Our models are trained using an Affine Combination Module (ACM) mechanism to combine the text embedding from BERT and the GAN latent space using a self-attention matrix. This avoids the loss of features due to inadequate "attention", which may happen if text embedding and latent vector are simply concatenated. Our approach is capable of generating images that are very accurately aligned to the exhaustive textual descriptions of faces with many fine detail features of the face and helps in generating better images. The proposed method is also capable of making incremental changes to a previously generated image if it is provided with additional textual descriptions or sentences.
    Deepened Graph Auto-Encoders Help Stabilize and Enhance Link Prediction. (arXiv:2103.11414v2 [cs.LG] UPDATED)
    Graph neural networks have been used for a variety of learning tasks, such as link prediction, node classification, and node clustering. Among them, link prediction is a relatively under-studied graph learning task, with current state-of-the-art models based on one- or two-layer of shallow graph auto-encoder (GAE) architectures. In this paper, we focus on addressing a limitation of current methods for link prediction, which can only use shallow GAEs and variational GAEs, and creating effective methods to deepen (variational) GAE architectures to achieve stable and competitive performance. Our proposed methods innovatively incorporate standard auto-encoders (AEs) into the architectures of GAEs, where standard AEs are leveraged to learn essential, low-dimensional representations via seamlessly integrating the adjacency information and node features, while GAEs further build multi-scaled low-dimensional representations via residual connections to learn a compact overall embedding for link prediction. Empirically, extensive experiments on various benchmarking datasets verify the effectiveness of our methods and demonstrate the competitive performance of our deepened graph models for link prediction. Theoretically, we prove that our deep extensions inclusively express multiple polynomial filters with different orders.
    AI2-THOR: An Interactive 3D Environment for Visual AI. (arXiv:1712.05474v4 [cs.CV] UPDATED)
    We introduce The House Of inteRactions (THOR), a framework for visual AI research, available at this http URL AI2-THOR consists of near photo-realistic 3D indoor scenes, where AI agents can navigate in the scenes and interact with objects to perform tasks. AI2-THOR enables research in many different domains including but not limited to deep reinforcement learning, imitation learning, learning by interaction, planning, visual question answering, unsupervised representation learning, object detection and segmentation, and learning models of cognition. The goal of AI2-THOR is to facilitate building visually intelligent models and push the research forward in this domain.
    I still know it's you! On Challenges in Anonymizing Source Code. (arXiv:2208.12553v1 [cs.CR])
    The source code of a program not only defines its semantics but also contains subtle clues that can identify its author. Several studies have shown that these clues can be automatically extracted using machine learning and allow for determining a program's author among hundreds of programmers. This attribution poses a significant threat to developers of anti-censorship and privacy-enhancing technologies, as they become identifiable and may be prosecuted. An ideal protection from this threat would be the anonymization of source code. However, neither theoretical nor practical principles of such an anonymization have been explored so far. In this paper, we tackle this problem and develop a framework for reasoning about code anonymization. We prove that the task of generating a $k$-anonymous program -- a program that cannot be attributed to one of $k$ authors -- is not computable and thus a dead end for research. As a remedy, we introduce a relaxed concept called $k$-uncertainty, which enables us to measure the protection of developers. Based on this concept, we empirically study candidate techniques for anonymization, such as code normalization, coding style imitation, and code obfuscation. We find that none of the techniques provides sufficient protection when the attacker is aware of the anonymization. While we introduce an approach for removing remaining clues from the code, the main result of our work is negative: Anonymization of source code is a hard and open problem.
    Symbolic Explanation of Affinity-Based Reinforcement Learning Agents with Markov Models. (arXiv:2208.12627v1 [cs.LG])
    The proliferation of artificial intelligence is increasingly dependent on model understanding. Understanding demands both an interpretation - a human reasoning about a model's behavior - and an explanation - a symbolic representation of the functioning of the model. Notwithstanding the imperative of transparency for safety, trust, and acceptance, the opacity of state-of-the-art reinforcement learning algorithms conceals the rudiments of their learned strategies. We have developed a policy regularization method that asserts the global intrinsic affinities of learned strategies. These affinities provide a means of reasoning about a policy's behavior, thus making it inherently interpretable. We have demonstrated our method in personalized prosperity management where individuals' spending behavior in time dictate their investment strategies, i.e. distinct spending personalities may have dissimilar associations with different investment classes. We now explain our model by reproducing the underlying prototypical policies with discretized Markov models. These global surrogates are symbolic representations of the prototypical policies.
    Socially Fair Reinforcement Learning. (arXiv:2208.12584v1 [cs.LG])
    We consider the problem of episodic reinforcement learning where there are multiple stakeholders with different reward functions. Our goal is to output a policy that is socially fair with respect to different reward functions. Prior works have proposed different objectives that a fair policy must optimize including minimum welfare, and generalized Gini welfare. We first take an axiomatic view of the problem, and propose four axioms that any such fair objective must satisfy. We show that the Nash social welfare is the unique objective that uniquely satisfies all four objectives, whereas prior objectives fail to satisfy all four axioms. We then consider the learning version of the problem where the underlying model i.e. Markov decision process is unknown. We consider the problem of minimizing regret with respect to the fair policies maximizing three different fair objectives -- minimum welfare, generalized Gini welfare, and Nash social welfare. Based on optimistic planning, we propose a generic learning algorithm and derive its regret bound with respect to the three different policies. For the objective of Nash social welfare, we also derive a lower bound in regret that grows exponentially with $n$, the number of agents. Finally, we show that for the objective of minimum welfare, one can improve regret by a factor of $O(H)$ for a weaker notion of regret.
    On the Implicit Bias in Deep-Learning Algorithms. (arXiv:2208.12591v1 [cs.LG])
    Gradient-based deep-learning algorithms exhibit remarkable performance in practice, but it is not well-understood why they are able to generalize despite having more parameters than training examples. It is believed that implicit bias is a key factor in their ability to generalize, and hence it has been widely studied in recent years. In this short survey, we explain the notion of implicit bias, review main results and discuss their implications.
    Mel Spectrogram Inversion with Stable Pitch. (arXiv:2208.12782v1 [cs.SD])
    Vocoders are models capable of transforming a low-dimensional spectral representation of an audio signal, typically the mel spectrogram, to a waveform. Modern speech generation pipelines use a vocoder as their final component. Recent vocoder models developed for speech achieve a high degree of realism, such that it is natural to wonder how they would perform on music signals. Compared to speech, the heterogeneity and structure of the musical sound texture offers new challenges. In this work we focus on one specific artifact that some vocoder models designed for speech tend to exhibit when applied to music: the perceived instability of pitch when synthesizing sustained notes. We argue that the characteristic sound of this artifact is due to the lack of horizontal phase coherence, which is often the result of using a time-domain target space with a model that is invariant to time-shifts, such as a convolutional neural network. We propose a new vocoder model that is specifically designed for music. Key to improving the pitch stability is the choice of a shift-invariant target space that consists of the magnitude spectrum and the phase gradient. We discuss the reasons that inspired us to re-formulate the vocoder task, outline a working example, and evaluate it on musical signals. Our method results in 60% and 10% improved reconstruction of sustained notes and chords with respect to existing models, using a novel harmonic error metric.
    Safety-Critical Learning of Robot Control with Temporal Logic Specifications. (arXiv:2109.02791v7 [cs.RO] UPDATED)
    Reinforcement learning (RL) is a promising approach. However, success is limited to real-world applications, because ensuring safe exploration and facilitating adequate exploitation is a challenge for controlling robotic systems with unknown models and measurement uncertainties. The learning problem becomes even more difficult for complex tasks over continuous state-action. In this paper, we propose a learning-based robotic control framework consisting of several aspects: (1) we leverage Linear Temporal Logic (LTL) to express complex tasks over infinite horizons that are translated to a novel automaton structure; (2) we detail an innovative reward scheme for LTL satisfaction with a probabilistic guarantee. Then, by applying a reward shaping technique, we develop a modular policy-gradient architecture exploiting the benefits of the automaton structure to decompose overall tasks and enhance the performance of learned controllers; (3) by incorporating Gaussian Processes (GPs) to estimate the uncertain dynamic systems, we synthesize a model-based safe exploration during the learning process using Exponential Control Barrier Functions (ECBFs) that generalize systems with high-order relative degrees; (4) to further improve the efficiency of exploration, we utilize the properties of LTL automata and ECBFs to propose a safe guiding process. Finally, we demonstrate the effectiveness of the framework via several robotic environments. We show an ECBF-based modular deep RL algorithm that achieves near-perfect success rates and safety guarding with high probability confidence during training.
    Deep learning-based fast time-resolved flame emission spectroscopy in high-pressure combustion environment. (arXiv:2208.12544v1 [cs.LG])
    A novel deep learning strategy is developed for fast and accurate gas property measurements using flame emission spectroscopy (FES). Particularly, the short-gated fast FES is essential to resolve fast-evolving combustion behaviors. However, as the exposure time for capturing the flame emission spectrum gets shorter, the signal-to-noise ratio (SNR) decreases, and characteristic spectral features indicating the gas properties become relatively weaker. Then, the property estimation based on the short-gated spectrum is difficult and inaccurate. Denoising convolutional neural networks (CNN) can enhance the SNR of the short-gated spectrum. A new CNN architecture including a reversible down- and up-sampling (DU) operator and a loss function based on proper orthogonal decomposition (POD) coefficients is proposed. For training and testing the CNN, flame chemiluminescence spectra were captured from a stable methane-air flat flame using a portable spectrometer (spectral range: 250-850 nm, resolution: 0.5 nm) with varied equivalence ratio (0.8-1.2), pressure (1-10 bar), and exposure time (0.05, 0.2, 0.4, and 2 s). The long exposure (2 s) spectra were used as the ground truth when training the denoising CNN. A kriging model with POD is trained by the long-gated spectra for calibration and then prediction of the gas properties taking the denoised short-gated spectrum as the input. The measurement or property prediction errors of pressure and equivalence ratio using the new technique were estimated to be 5.7% and 1.5% with 0.2 s exposure, which are exceptionally good and typically not achievable with such low SNR spectrum signals without a signal amplifier.
    Federated and Privacy-Preserving Learning of Accounting Data in Financial Statement Audits. (arXiv:2208.12708v1 [cs.LG])
    The ongoing 'digital transformation' fundamentally changes audit evidence's nature, recording, and volume. Nowadays, the International Standards on Auditing (ISA) requires auditors to examine vast volumes of a financial statement's underlying digital accounting records. As a result, audit firms also 'digitize' their analytical capabilities and invest in Deep Learning (DL), a successful sub-discipline of Machine Learning. The application of DL offers the ability to learn specialized audit models from data of multiple clients, e.g., organizations operating in the same industry or jurisdiction. In general, regulations require auditors to adhere to strict data confidentiality measures. At the same time, recent intriguing discoveries showed that large-scale DL models are vulnerable to leaking sensitive training data information. Today, it often remains unclear how audit firms can apply DL models while complying with data protection regulations. In this work, we propose a Federated Learning framework to train DL models on auditing relevant accounting data of multiple clients. The framework encompasses Differential Privacy and Split Learning capabilities to mitigate data confidentiality risks at model inference. We evaluate our approach to detect accounting anomalies in three real-world datasets of city payments. Our results provide empirical evidence that auditors can benefit from DL models that accumulate knowledge from multiple sources of proprietary client data.
    Coefficient-based Regularized Distribution Regression. (arXiv:2208.12427v1 [stat.ML])
    In this paper, we consider the coefficient-based regularized distribution regression which aims to regress from probability measures to real-valued responses over a reproducing kernel Hilbert space (RKHS), where the regularization is put on the coefficients and kernels are assumed to be indefinite. The algorithm involves two stages of sampling, the first stage sample consists of distributions and the second stage sample is obtained from these distributions. Asymptotic behaviors of the algorithm in different regularity ranges of the regression function are comprehensively studied and learning rates are derived via integral operator techniques. We get the optimal rates under some mild conditions, which matches the one-stage sampled minimax optimal rate. Compared with the kernel methods for distribution regression in the literature, the algorithm under consideration does not require the kernel to be symmetric and positive semi-definite and hence provides a simple paradigm for designing indefinite kernel methods, which enriches the theme of the distribution regression. To the best of our knowledge, this is the first result for distribution regression with indefinite kernels, and our algorithm can improve the saturation effect.
    EGFR Mutation Prediction of Lung Biopsy Images using Deep Learning. (arXiv:2208.12506v1 [cs.CV])
    The standard diagnostic procedures for targeted therapies in lung cancer treatment involve histological subtyping and subsequent detection of key driver mutations, such as EGFR. Even though molecular profiling can uncover the driver mutation, the process is often expensive and time-consuming. Deep learning-oriented image analysis offers a more economical alternative for discovering driver mutations directly from whole slide images (WSIs). In this work, we used customized deep learning pipelines with weak supervision to identify the morphological correlates of EGFR mutation from hematoxylin and eosin-stained WSIs, in addition to detecting tumor and histologically subtyping it. We demonstrate the effectiveness of our pipeline by conducting rigorous experiments and ablation studies on two lung cancer datasets - TCGA and a private dataset from India. With our pipeline, we achieved an average area under the curve (AUC) of 0.964 for tumor detection, and 0.942 for histological subtyping between adenocarcinoma and squamous cell carcinoma on the TCGA dataset. For EGFR detection, we achieved an average AUC of 0.864 on the TCGA dataset and 0.783 on the dataset from India. Our key learning points include the following. Firstly, there is no particular advantage of using a feature extractor layers trained on histology, if one is going to fine-tune the feature extractor on the target dataset. Secondly, selecting patches with high cellularity, presumably capturing tumor regions, is not always helpful, as the sign of a disease class may be present in the tumor-adjacent stroma.
    Uniform-in-Phase-Space Data Selection with Iterative Normalizing Flows. (arXiv:2112.15446v2 [cs.LG] UPDATED)
    Improvements in computational and experimental capabilities are rapidly increasing the amount of scientific data that is routinely generated. In applications that are constrained by memory and computational intensity, excessively large datasets may hinder scientific discovery, making data reduction a critical component of data-driven methods. Datasets are growing in two directions: the number of data points and their dimensionality. Whereas data compression techniques are concerned with reducing dimensionality, the focus here is on reducing the number of data points. A strategy is proposed to select data points such that they uniformly span the phase-space of the data. The algorithm proposed relies on estimating the probability map of the data and using it to construct an acceptance probability. An iterative method is used to accurately estimate the probability of the rare data points when only a small subset of the dataset is used to construct the probability map. Instead of binning the phase-space to estimate the probability map, its functional form is approximated with a normalizing flow. Therefore, the method naturally extends to high-dimensional datasets. The proposed framework is demonstrated as a viable pathway to enable data-efficient machine learning when abundant data is available. An implementation of the method is available in a companion repository (https://github.com/NREL/Phase-space-sampling).
    Reduce Communication Costs and Preserve Privacy: Prompt Tuning Method in Federated Learning. (arXiv:2208.12268v1 [cs.LG])
    Federated learning (FL) has enabled global model training on decentralized data in a privacy-preserving way by aggregating model updates. However, for many natural language processing (NLP) tasks that utilize pre-trained language models (PLMs) with large numbers of parameters, there are considerable communication costs associated with FL. Recently, prompt tuning, which tunes some soft prompts without modifying PLMs, has achieved excellent performance as a new learning paradigm. Therefore we want to combine the two methods and explore the effect of prompt tuning under FL. In this paper, we propose "FedPrompt" as the first work study prompt tuning in a model split learning way using FL, and prove that split learning greatly reduces the communication cost, only 0.01% of the PLMs' parameters, with little decrease on accuracy both on IID and Non-IID data distribution. This improves the efficiency of FL method while also protecting the data privacy in prompt tuning.In addition, like PLMs, prompts are uploaded and downloaded between public platforms and personal users, so we try to figure out whether there is still a backdoor threat using only soft prompt in FL scenarios. We further conduct backdoor attacks by data poisoning on FedPrompt. Our experiments show that normal backdoor attack can not achieve a high attack success rate, proving the robustness of FedPrompt.We hope this work can promote the application of prompt in FL and raise the awareness of the possible security threats.
    A Compact Pretraining Approach for Neural Language Models. (arXiv:2208.12367v1 [cs.CL])
    Domain adaptation for large neural language models (NLMs) is coupled with massive amounts of unstructured data in the pretraining phase. In this study, however, we show that pretrained NLMs learn in-domain information more effectively and faster from a compact subset of the data that focuses on the key information in the domain. We construct these compact subsets from the unstructured data using a combination of abstractive summaries and extractive keywords. In particular, we rely on BART to generate abstractive summaries, and KeyBERT to extract keywords from these summaries (or the original unstructured text directly). We evaluate our approach using six different settings: three datasets combined with two distinct NLMs. Our results reveal that the task-specific classifiers trained on top of NLMs pretrained using our method outperform methods based on traditional pretraining, i.e., random masking on the entire data, as well as methods without pretraining. Further, we show that our strategy reduces pretraining time by up to five times compared to vanilla pretraining. The code for all of our experiments is publicly available at https://github.com/shahriargolchin/compact-pretraining.
    Leveraging Symmetrical Convolutional Transformer Networks for Speech to Singing Voice Style Transfer. (arXiv:2208.12410v1 [cs.SD])
    In this paper, we propose a model to perform style transfer of speech to singing voice. Contrary to the previous signal processing-based methods, which require high-quality singing templates or phoneme synchronization, we explore a data-driven approach for the problem of converting natural speech to singing voice. We develop a novel neural network architecture, called SymNet, which models the alignment of the input speech with the target melody while preserving the speaker identity and naturalness. The proposed SymNet model is comprised of symmetrical stack of three types of layers - convolutional, transformer, and self-attention layers. The paper also explores novel data augmentation and generative loss annealing methods to facilitate the model training. Experiments are performed on the NUS and NHSS datasets which consist of parallel data of speech and singing voice. In these experiments, we show that the proposed SymNet model improves the objective reconstruction quality significantly over the previously published methods and baseline architectures. Further, a subjective listening test confirms the improved quality of the audio obtained using the proposed approach (absolute improvement of 0.37 in mean opinion score measure over the baseline system).
    Fundamentals of Task-Agnostic Data Valuation. (arXiv:2208.12354v1 [cs.LG])
    We study valuing the data of a data owner/seller for a data seeker/buyer. Data valuation is often carried out for a specific task assuming a particular utility metric, such as test accuracy on a validation set, that may not exist in practice. In this work, we focus on task-agnostic data valuation without any validation requirements. The data buyer has access to a limited amount of data (which could be publicly available) and seeks more data samples from a data seller. We formulate the problem as estimating the differences in the statistical properties of the data at the seller with respect to the baseline data available at the buyer. We capture these statistical differences through second moment by measuring diversity and relevance of the seller's data for the buyer; we estimate these measures through queries to the seller without requesting raw data. We design the queries with the proposed approach so that the seller is blind to the buyer's raw data and has no knowledge to fabricate responses to queries to obtain a desired outcome of the diversity and relevance trade-off.We will show through extensive experiments on real tabular and image datasets that the proposed estimates capture the diversity and relevance of the seller's data for the buyer.
    GHN-Q: Parameter Prediction for Unseen Quantized Convolutional Architectures via Graph Hypernetworks. (arXiv:2208.12489v1 [cs.LG])
    Deep convolutional neural network (CNN) training via iterative optimization has had incredible success in finding optimal parameters. However, modern CNN architectures often contain millions of parameters. Thus, any given model for a single architecture resides in a massive parameter space. Models with similar loss could have drastically different characteristics such as adversarial robustness, generalizability, and quantization robustness. For deep learning on the edge, quantization robustness is often crucial. Finding a model that is quantization-robust can sometimes require significant efforts. Recent works using Graph Hypernetworks (GHN) have shown remarkable performance predicting high-performant parameters of varying CNN architectures. Inspired by these successes, we wonder if the graph representations of GHN-2 can be leveraged to predict quantization-robust parameters as well, which we call GHN-Q. We conduct the first-ever study exploring the use of graph hypernetworks for predicting parameters of unseen quantized CNN architectures. We focus on a reduced CNN search space and find that GHN-Q can in fact predict quantization-robust parameters for various 8-bit quantized CNNs. Decent quantized accuracies are observed even with 4-bit quantization despite GHN-Q not being trained on it. Quantized finetuning of GHN-Q at lower bitwidths may bring further improvements and is currently being explored.
    Neuro-Dynamic State Estimation for Networked Microgrids. (arXiv:2208.12288v1 [eess.SY])
    We devise neuro-dynamic state estimation (Neuro-DSE), a learning-based dynamic state estimation (DSE) algorithm for networked microgrids (NMs) under unknown subsystems. Our contributions include: 1) a data-driven Neuro-DSE algorithm for NMs DSE with partially unidentified dynamic models, which incorporates the neural-ordinary-differential-equations (ODE-Net) into Kalman filters; 2) a self-refining Neuro-DSE algorithm (Neuro-DSE+) which enables data-driven DSE under limited and noisy measurements by establishing an automatic filtering, augmenting and correcting framework; 3) a Neuro-KalmanNet-DSE algorithm which further integrates KalmanNet with Neuro-DSE to relieve the model mismatch of both neural- and physics-based dynamic models; and 4) an augmented Neuro-DSE for joint estimation of NMs states and unknown parameters (e.g., inertia). Extensive case studies demonstrate the efficacy of Neuro-DSE and its variants under different noise levels, control modes, power sources, observabilities and model knowledge, respectively.
    Augmenting Reinforcement Learning with Transformer-based Scene Representation Learning for Decision-making of Autonomous Driving. (arXiv:2208.12263v1 [cs.LG])
    Decision-making for urban autonomous driving is challenging due to the stochastic nature of interactive traffic participants and the complexity of road structures. Although reinforcement learning (RL)-based decision-making scheme is promising to handle urban driving scenarios, it suffers from low sample efficiency and poor adaptability. In this paper, we propose Scene-Rep Transformer to improve the RL decision-making capabilities with better scene representation encoding and sequential predictive latent distillation. Specifically, a multi-stage Transformer (MST) encoder is constructed to model not only the interaction awareness between the ego vehicle and its neighbors but also intention awareness between the agents and their candidate routes. A sequential latent Transformer (SLT) with self-supervised learning objectives is employed to distill the future predictive information into the latent scene representation, in order to reduce the exploration space and speed up training. The final decision-making module based on soft actor-critic (SAC) takes as input the refined latent scene representation from the Scene-Rep Transformer and outputs driving actions. The framework is validated in five challenging simulated urban scenarios with dense traffic, and its performance is manifested quantitatively by the substantial improvements in data efficiency and performance in terms of success rate, safety, and efficiency. The qualitative results reveal that our framework is able to extract the intentions of neighbor agents to help make decisions and deliver more diversified driving behaviors.
    SNAP: Efficient Extraction of Private Properties with Poisoning. (arXiv:2208.12348v1 [cs.LG])
    Property inference attacks allow an adversary to extract global properties of the training dataset from a machine learning model. Such attacks have privacy implications for data owners who share their datasets to train machine learning models. Several existing approaches for property inference attacks against deep neural networks have been proposed, but they all rely on the attacker training a large number of shadow models, which induces large computational overhead. In this paper, we consider the setting of property inference attacks in which the attacker can poison a subset of the training dataset and query the trained target model. Motivated by our theoretical analysis of model confidences under poisoning, we design an efficient property inference attack, SNAP, which obtains higher attack success and requires lower amounts of poisoning than the state-of-the-art poisoning-based property inference attack by Mahloujifar et al. For example, on the Census dataset, SNAP achieves 34% higher success rate than Mahloujifar et al. while being 56.5x faster. We also extend our attack to determine if a certain property is present at all in training, and estimate the exact proportion of a property of interest efficiently. We evaluate our attack on several properties of varying proportions from four datasets, and demonstrate SNAP's generality and effectiveness.
    OOD-Probe: A Neural Interpretation of Out-of-Domain Generalization. (arXiv:2208.12352v1 [cs.LG])
    The ability to generalize out-of-domain (OOD) is an important goal for deep neural network development, and researchers have proposed many high-performing OOD generalization methods from various foundations. While many OOD algorithms perform well in various scenarios, these systems are evaluated as ``black-boxes''. Instead, we propose a flexible framework that evaluates OOD systems with finer granularity using a probing module that predicts the originating domain from intermediate representations. We find that representations always encode some information about the domain. While the layerwise encoding patterns remain largely stable across different OOD algorithms, they vary across the datasets. For example, the information about rotation (on RotatedMNIST) is the most visible on the lower layers, while the information about style (on VLCS and PACS) is the most visible on the middle layers. In addition, the high probing results correlate to the domain generalization performances, leading to further directions in developing OOD generalization systems.
    BITS: Bi-level Imitation for Traffic Simulation. (arXiv:2208.12403v1 [cs.RO])
    Simulation is the key to scaling up validation and verification for robotic systems such as autonomous vehicles. Despite advances in high-fidelity physics and sensor simulation, a critical gap remains in simulating realistic behaviors of road users. This is because, unlike simulating physics and graphics, devising first principle models for human-like behaviors is generally infeasible. In this work, we take a data-driven approach and propose a method that can learn to generate traffic behaviors from real-world driving logs. The method achieves high sample efficiency and behavior diversity by exploiting the bi-level hierarchy of driving behaviors by decoupling the traffic simulation problem into high-level intent inference and low-level driving behavior imitation. The method also incorporates a planning module to obtain stable long-horizon behaviors. We empirically validate our method, named Bi-level Imitation for Traffic Simulation (BITS), with scenarios from two large-scale driving datasets and show that BITS achieves balanced traffic simulation performance in realism, diversity, and long-horizon stability. We also explore ways to evaluate behavior realism and introduce a suite of evaluation metrics for traffic simulation. Finally, as part of our core contributions, we develop and open source a software tool that unifies data formats across different driving datasets and converts scenes from existing datasets into interactive simulation environments. For additional information and videos, see https://sites.google.com/view/nvr-bits2022/home
    Decoding speech from non-invasive brain recordings. (arXiv:2208.12266v1 [eess.AS])
    Decoding language from brain activity is a long-awaited goal in both healthcare and neuroscience. Major milestones have recently been reached thanks to intracranial devices: subject-specific pipelines trained on invasive brain responses to basic language tasks now start to efficiently decode interpretable features (e.g. letters, words, spectrograms). However, scaling this approach to natural speech and non-invasive brain recordings remains a major challenge. Here, we propose a single end-to-end architecture trained with contrastive learning across a large cohort of individuals to predict self-supervised representations of natural speech. We evaluate our model on four public datasets, encompassing 169 volunteers recorded with magneto- or electro-encephalography (M/EEG), while they listened to natural speech. The results show that our model can identify, from 3s of MEG signals, the corresponding speech segment with up to 72.5% top-10 accuracy out of 1,594 distinct segments (and 44% top-1 accuracy), and up to 19.1% out of 2,604 segments for EEG recordings -- hence allowing the decoding of phrases absent from the training set. Model comparison and ablation analyses show that these performances directly benefit from our original design choices, namely the use of (i) a contrastive objective, (ii) pretrained representations of speech and (iii) a common convolutional architecture simultaneously trained across several participants. Together, these results delineate a promising path to decode natural language processing in real time from non-invasive recordings of brain activity.
    DiVa: An Accelerator for Differentially Private Machine Learning. (arXiv:2208.12392v1 [cs.AR])
    The widespread deployment of machine learning (ML) is raising serious concerns on protecting the privacy of users who contributed to the collection of training data. Differential privacy (DP) is rapidly gaining momentum in the industry as a practical standard for privacy protection. Despite DP's importance, however, little has been explored within the computer systems community regarding the implication of this emerging ML algorithm on system designs. In this work, we conduct a detailed workload characterization on a state-of-the-art differentially private ML training algorithm named DP-SGD. We uncover several unique properties of DP-SGD (e.g., its high memory capacity and computation requirements vs. non-private ML), root-causing its key bottlenecks. Based on our analysis, we propose an accelerator for differentially private ML named DiVa, which provides a significant improvement in compute utilization, leading to 2.6x higher energy-efficiency vs. conventional systolic arrays.
    Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning. (arXiv:2205.05638v2 [cs.LG] UPDATED)
    Few-shot in-context learning (ICL) enables pre-trained language models to perform a previously-unseen task without any gradient-based training by feeding a small number of training examples as part of the input. ICL incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made. Parameter-efficient fine-tuning (PEFT) (e.g. adapter modules, prompt tuning, sparse update methods, etc.) offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task. In this paper, we rigorously compare few-shot ICL and PEFT and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs. Along the way, we introduce a new PEFT method called (IA)$^3$ that scales activations by learned vectors, attaining stronger performance while only introducing a relatively tiny amount of new parameters. We also propose a simple recipe based on the T0 model called T-Few that can be applied to new tasks without task-specific tuning or modifications. We validate the effectiveness of T-Few on completely unseen tasks by applying it to the RAFT benchmark, attaining super-human performance for the first time and outperforming the state-of-the-art by 6% absolute. All of the code used in our experiments is publicly available.
    Race and ethnicity data for first, middle, and last names. (arXiv:2208.12443v1 [stat.OT])
    We provide the largest compiled publicly available dictionaries of first, middle, and last names for the purpose of imputing race and ethnicity using, for example, Bayesian Improved Surname Geocoding (BISG). The dictionaries are based on the voter files of six Southern states that collect self-reported racial data upon voter registration. Our data cover a much larger scope of names than any comparable dataset, containing roughly one million first names, 1.1 million middle names, and 1.4 million surnames. Individuals are categorized into five mutually exclusive racial and ethnic groups -- White, Black, Hispanic, Asian, and Other -- and racial/ethnic counts by name are provided for every name in each dictionary. Counts can then be normalized row-wise or column-wise to obtain conditional probabilities of race given name or name given race. These conditional probabilities can then be deployed for imputation in a data analytic task for which ground truth racial and ethnic data is not available.
    Music Separation Enhancement with Generative Modeling. (arXiv:2208.12387v1 [cs.SD])
    Despite phenomenal progress in recent years, state-of-the-art music separation systems produce source estimates with significant perceptual shortcomings, such as adding extraneous noise or removing harmonics. We propose a post-processing model (the Make it Sound Good (MSG) post-processor) to enhance the output of music source separation systems. We apply our post-processing model to state-of-the-art waveform-based and spectrogram-based music source separators, including a separator unseen by MSG during training. Our analysis of the errors produced by source separators shows that waveform models tend to introduce more high-frequency noise, while spectrogram models tend to lose transients and high frequency content. We introduce objective measures to quantify both kinds of errors and show MSG improves the source reconstruction of both kinds of errors. Crowdsourced subjective evaluations demonstrate that human listeners prefer source estimates of bass and drums that have been post-processed by MSG.
    2nd Place Solutions for UG2+ Challenge 2022 -- D$^{3}$Net for Mitigating Atmospheric Turbulence from Images. (arXiv:2208.12332v1 [cs.CV])
    This technical report briefly introduces to the D$^{3}$Net proposed by our team "TUK-IKLAB" for Atmospheric Turbulence Mitigation in $UG2^{+}$ Challenge at CVPR 2022. In the light of test and validation results on textual images to improve text recognition performance and hot-air balloon images for image enhancement, we can say that the proposed method achieves state-of-the-art performance. Furthermore, we also provide a visual comparison with publicly available denoising, deblurring, and frame averaging methods with respect to the proposed work. The proposed method ranked 2nd on the final leader-board of the aforementioned challenge in the testing phase, respectively.
    Prerequisite-driven Q-matrix Refinement for Learner Knowledge Assessment: A Case Study in Online Learning Context. (arXiv:2208.12642v1 [cs.CY])
    The ever growing abundance of learning traces in the online learning platforms promises unique insights into the learner knowledge assessment (LKA), a fundamental personalized-tutoring technique for enabling various further adaptive tutoring services in these platforms. Precise assessment of learner knowledge requires the fine-grained Q-matrix, which is generally designed by experts to map the items to skills in the domain. Due to the subjective tendency, some misspecifications may degrade the performance of LKA. Some efforts have been made to refine the small-scale Q-matrix, however, it is difficult to extend the scalability and apply these methods to the large-scale online learning context with numerous items and massive skills. Moreover, the existing LKA models employ flexible deep learning models that excel at this task, but the adequacy of LKA is still challenged by the representation capability of the models on the quite sparse item-skill graph and the learners' exercise data. To overcome these issues, in this paper we propose a prerequisite-driven Q-matrix refinement framework for learner knowledge assessment (PQRLKA) in online context. We infer the prerequisites from learners' response data and use it to refine the expert-defined Q-matrix, which enables the interpretability and the scalability to apply it to the large-scale online learning context. Based on the refined Q-matrix, we propose a Metapath2Vec enhanced convolutional representation method to obtain the comprehensive representations of the items with rich information, and feed them to the PQRLKA model to finally assess the learners' knowledge. Experiments conducted on three real-world datasets demonstrate the capability of our model to infer the prerequisites for Q-matrix refinement, and also its superiority for the LKA task.
    Spatial-Temporal Meta-path Guided Explainable Crime Prediction. (arXiv:2205.01901v2 [cs.LG] UPDATED)
    Exposure to crime and violence can harm individuals' quality of life and the economic growth of communities. In light of the rapid development in machine learning, there is a rise in the need to explore automated solutions to prevent crimes. With the increasing availability of both fine-grained urban and public service data, there is a recent surge in fusing such cross-domain information to facilitate crime prediction. By capturing the information about social structure, environment, and crime trends, existing machine learning predictive models have explored the dynamic crime patterns from different views. However, these approaches mostly convert such multi-source knowledge into implicit and latent representations (e.g., learned embeddings of districts), making it still a challenge to investigate the impacts of explicit factors for the occurrences of crimes behind the scenes. In this paper, we present a Spatial-Temporal Metapath guided Explainable Crime prediction (STMEC) framework to capture dynamic patterns of crime behaviours and explicitly characterize how the environmental and social factors mutually interact to produce the forecasts. Extensive experiments show the superiority of STMEC compared with other advanced spatiotemporal models, especially in predicting felonies (e.g., robberies and assaults with dangerous weapons).
    FooDI-ML: a large multi-language dataset of food, drinks and groceries images and descriptions. (arXiv:2110.02035v2 [cs.CV] UPDATED)
    In this paper we introduce the FooDI-ML dataset. This dataset contains over 1.5M unique images and over 9.5M store names, product names descriptions, and collection sections gathered from the Glovo application. The data made available corresponds to food, drinks and groceries products from 37 countries in Europe, the Middle East, Africa and Latin America. The dataset comprehends 33 languages, including 870K samples of languages of countries from Eastern Europe and Western Asia such as Ukrainian and Kazakh, which have been so far underrepresented in publicly available visio-linguistic datasets. The dataset also includes widely spoken languages such as Spanish and English. To assist further research, we include benchmarks over two tasks: text-image retrieval and conditional image generation.
    FixEval: Execution-based Evaluation of Program Fixes for Competitive Programming Problems. (arXiv:2206.07796v2 [cs.SE] UPDATED)
    Source code repositories consist of large codebases, often containing error-prone programs. The increasing complexity of software has led to a drastic rise in time and costs for identifying and fixing these defects. Various methods exist to automatically generate fixes for buggy code. However, due to the large combinatorial space of possible solutions for a particular bug, there are not many tools and datasets available to evaluate generated code effectively. In this work, we introduce FixEval, a benchmark comprising buggy code submissions to competitive programming problems and their respective fixes. We introduce a rich test suite to evaluate and assess the correctness of model-generated program fixes. We consider two Transformer language models pretrained on programming languages as our baselines, and compare them using match-based and execution-based evaluation metrics. Our experiments show that match-based metrics do not reflect model-generated program fixes accurately, while execution-based methods evaluate programs through all cases and scenarios specifically designed for that solution. Therefore, we believe FixEval provides a step towards real-world automatic bug fixing and model-generated code evaluation.
    Modelling spatio-temporal trends of air pollution in Africa. (arXiv:2208.12719v1 [physics.ao-ph])
    Atmospheric pollution remains one of the major public health threat worldwide with an estimated 7 millions deaths annually. In Africa, rapid urbanization and poor transport infrastructure are worsening the problem. In this paper, we have analysed spatio-temporal variations of PM2.5 across different geographical regions in Africa. The West African region remains the most affected by the high levels of pollution with a daily average of 40.856 $\mu g/m^3$ in some cities like Lagos, Abuja and Bamako. In East Africa, Uganda is reporting the highest pollution level with a daily average concentration of 56.14 $\mu g/m^3$ and 38.65 $\mu g/m^3$ for Kigali. In countries located in the central region of Africa, the highest daily average concentration of PM2.5 of 90.075 $\mu g/m^3$ was recorded in N'Djamena. We compare three data driven models in predicting future trends of pollution levels. Neural network is outperforming Gaussian processes and ARIMA models.
    Adaptive Learning for Service Monitoring Data. (arXiv:2208.12281v1 [cs.LG])
    Service monitoring applications continuously produce data to monitor their availability. Hence, it is critical to classify incoming data in real-time and accurately. For this purpose, our study develops an adaptive classification approach using Learn++ that can handle evolving data distributions. This approach sequentially predicts and updates the monitoring model with new data, gradually forgets past knowledge and identifies sudden concept drift. We employ consecutive data chunks obtained from an industrial application to evaluate the performance of the predictors incrementally.
    Learning with Few Labeled Nodes via Augmented Graph Self-Training. (arXiv:2208.12422v1 [cs.LG])
    It is well known that the success of graph neural networks (GNNs) highly relies on abundant human-annotated data, which is laborious to obtain and not always available in practice. When only few labeled nodes are available, how to develop highly effective GNNs remains understudied. Though self-training has been shown to be powerful for semi-supervised learning, its application on graph-structured data may fail because (1) larger receptive fields are not leveraged to capture long-range node interactions, which exacerbates the difficulty of propagating feature-label patterns from labeled nodes to unlabeled nodes; and (2) limited labeled data makes it challenging to learn well-separated decision boundaries for different node classes without explicitly capturing the underlying semantic structure. To address the challenges of capturing informative structural and semantic knowledge, we propose a new graph data augmentation framework, AGST (Augmented Graph Self-Training), which is built with two new (i.e., structural and semantic) augmentation modules on top of a decoupled GST backbone. In this work, we investigate whether this novel framework can learn an effective graph predictive model with extremely limited labeled nodes. We conduct comprehensive evaluations on semi-supervised node classification under different scenarios of limited labeled-node data. The experimental results demonstrate the unique contributions of the novel data augmentation framework for node classification with few labeled data.
    Comparing Apples to Oranges: Learning Similarity Functions for Data Produced by Different Distributions. (arXiv:2208.12731v1 [cs.LG])
    Similarity functions measure how comparable pairs of elements are, and play a key role in a wide variety of applications, e.g., Clustering problems and considerations of Individual Fairness. However, access to an accurate similarity function should not always be considered guaranteed. Specifically, when the elements to be compared are produced by different distributions, or in other words belong to different ``demographic'' groups, knowledge of their true similarity might be very difficult to obtain. In this work, we present a sampling framework that learns these across-groups similarity functions, using only a limited amount of experts' feedback. We show analytical results with rigorous bounds, and empirically validate our algorithms via a large suite of experiments.
    Variance Reduction based Experience Replay for Policy Optimization. (arXiv:2208.12341v1 [stat.ML])
    For reinforcement learning on complex stochastic systems where many factors dynamically impact the output trajectories, it is desirable to effectively leverage the information from historical samples collected in previous iterations to accelerate policy optimization. Classical experience replay allows agents to remember by reusing historical observations. However, the uniform reuse strategy that treats all observations equally overlooks the relative importance of different samples. To overcome this limitation, we propose a general variance reduction based experience replay (VRER) framework that can selectively reuse the most relevant samples to improve policy gradient estimation. This selective mechanism can adaptively put more weight on past samples that are more likely to be generated by the current target distribution. Our theoretical and empirical studies show that the proposed VRER can accelerate the learning of optimal policy and enhance the performance of state-of-the-art policy optimization approaches.
    Deformation equivariant cross-modality image synthesis with paired non-aligned training data. (arXiv:2208.12491v1 [cs.CV])
    Cross-modality image synthesis is an active research topic with multiple medical clinically relevant applications. Recently, methods allowing training with paired but misaligned data have started to emerge. However, no robust and well-performing methods applicable to a wide range of real world data sets exist. In this work, we propose a generic solution to the problem of cross-modality image synthesis with paired but non-aligned data by introducing new deformation equivariance encouraging loss functions. The method consists of joint training of an image synthesis network together with separate registration networks and allows adversarial training conditioned on the input even with misaligned data. The work lowers the bar for new clinical applications by allowing effortless training of cross-modality image synthesis networks for more difficult data sets and opens up opportunities for the development of new generic learning based cross-modality registration algorithms.
    Shallow decision trees for explainable $k$-means clustering. (arXiv:2112.14718v2 [cs.LG] UPDATED)
    A number of recent works have employed decision trees for the construction of explainable partitions that aim to minimize the $k$-means cost function. These works, however, largely ignore metrics related to the depths of the leaves in the resulting tree, which is perhaps surprising considering how the explainability of a decision tree depends on these depths. To fill this gap in the literature, we propose an efficient algorithm that takes into account these metrics. In experiments on 16 datasets, our algorithm yields better results than decision-tree clustering algorithms such as the ones presented in \cite{dasgupta2020explainable}, \cite{frost2020exkmc}, \cite{laber2021price} and \cite{DBLP:conf/icml/MakarychevS21}, typically achieving lower or equivalent costs with considerably shallower trees. We also show, through a simple adaptation of existing techniques, that the problem of building explainable partitions induced by binary trees for the $k$-means cost function does not admit an $(1+\epsilon)$-approximation in polynomial time unless $P=NP$, which justifies the quest for approximation algorithms and/or heuristics.  ( 3 min )
    Regime-based Implied Stochastic Volatility Model for Crypto Option Pricing. (arXiv:2208.12614v1 [q-fin.CP])
    The increasing adoption of Digital Assets (DAs), such as Bitcoin (BTC), rises the need for accurate option pricing models. Yet, existing methodologies fail to cope with the volatile nature of the emerging DAs. Many models have been proposed to address the unorthodox market dynamics and frequent disruptions in the microstructure caused by the non-stationarity, and peculiar statistics, in DA markets. However, they are either prone to the curse of dimensionality, as additional complexity is required to employ traditional theories, or they overfit historical patterns that may never repeat. Instead, we leverage recent advances in market regime (MR) clustering with the Implied Stochastic Volatility Model (ISVM). Time-regime clustering is a temporal clustering method, that clusters the historic evolution of a market into different volatility periods accounting for non-stationarity. ISVM can incorporate investor expectations in each of the sentiment-driven periods by using implied volatility (IV) data. In this paper, we applied this integrated time-regime clustering and ISVM method (termed MR-ISVM) to high-frequency data on BTC options at the popular trading platform Deribit. We demonstrate that MR-ISVM contributes to overcome the burden of complex adaption to jumps in higher order characteristics of option pricing models. This allows us to price the market based on the expectations of its participants in an adaptive fashion.  ( 2 min )
    Image augmentation improves few-shot classification performance in plant disease recognition. (arXiv:2208.12613v1 [cs.CV])
    With the world population projected to near 10 billion by 2050, minimizing crop damage and guaranteeing food security has never been more important. Machine learning has been proposed as a solution to quickly and efficiently identify diseases in crops. Convolutional Neural Networks typically require large datasets of annotated data which are not available on demand. Collecting this data is a long and arduous process which involves manually picking, imaging, and annotating each individual leaf. I tackle the problem of plant image data scarcity by exploring the efficacy of various data augmentation techniques when used in conjunction with transfer learning. I evaluate the impact of various data augmentation techniques both individually and combined on the performance of a ResNet. I propose an augmentation scheme utilizing a sequence of different augmentations which consistently improves accuracy through many trials. Using only 10 total seed images, I demonstrate that my augmentation framework can increase model accuracy by upwards of 25\%.  ( 2 min )
    TMIC: App Inventor Extension for the Deployment of Image Classification Models Exported from Teachable Machine. (arXiv:2208.12637v1 [cs.CY])
    TMIC is an App Inventor extension for the deployment of ML models for image classification developed with Google Teachable Machine in educational settings. Google Teachable Machine, is an intuitive visual tool that provides workflow-oriented support for the development of ML models for image classification. Aiming at the usage of models developed with Google Teachable Machine, the extension TMIC enables the deployment of the trained models exported as TensorFlow.js to Google Cloud as part of App Inventor, one of the most popular block-based programming environments for teaching computing in K-12. The extension was created with the App Inventor extension framework based on the extension PIC and is available under the BSD 3 license. It can be used for teaching ML in K-12, in introductory courses in higher education or by anyone interested in creating intelligent apps with image classification. The extension TMIC is being developed by the initiative Computa\c{c}\~ao na Escola of the Department of Informatics and Statistics at the Federal University of Santa Catarina/Brazil as part of a research effort aiming at introducing AI education in K-12.  ( 3 min )
    Confusion Matrices and Accuracy Statistics for Binary Classifiers Using Unlabeled Data: The Diagnostic Test Approach. (arXiv:2208.12664v1 [stat.ML])
    Medical researchers have solved the problem of estimating the sensitivity and specificity of binary medical diagnostic tests without gold standard tests for comparison. That problem is the same as estimating confusion matrices for classifiers on unlabeled data. This article describes how to modify the diagnostic test solutions to estimate confusion matrices and accuracy statistics for supervised or unsupervised binary classifiers on unlabeled data.  ( 2 min )
    Elastic Product Quantization for Time Series. (arXiv:2201.01856v2 [cs.LG] UPDATED)
    Analyzing numerous or long time series is difficult in practice due to the high storage costs and computational requirements. Therefore, techniques have been proposed to generate compact similarity-preserving representations of time series, enabling real-time similarity search on large in-memory data collections. However, the existing techniques are not ideally suited for assessing similarity when sequences are locally out of phase. In this paper, we propose the use of product quantization for efficient similarity-based comparison of time series under time warping. The idea is to first compress the data by partitioning the time series into equal length sub-sequences which are represented by a short code. The distance between two time series can then be efficiently approximated by pre-computed elastic distances between their codes. The partitioning into sub-sequences forces unwanted alignments, which we address with a pre-alignment step using the maximal overlap discrete wavelet transform (MODWT). To demonstrate the efficiency and accuracy of our method, we perform an extensive experimental evaluation on benchmark datasets in nearest neighbors classification and clustering applications. Overall, the proposed solution emerges as a highly efficient (both in terms of memory usage and computation time) replacement for elastic measures in time series applications.  ( 2 min )
    Automatic detection of faults in race walking from a smartphone camera: a comparison of an Olympic medalist and university athletes. (arXiv:2208.12646v1 [cs.CV])
    Automatic fault detection is a major challenge in many sports. In race walking, referees visually judge faults according to the rules. Hence, ensuring objectivity and fairness while judging is important. To address this issue, some studies have attempted to use sensors and machine learning to automatically detect faults. However, there are problems associated with sensor attachments and equipment such as a high-speed camera, which conflict with the visual judgement of referees, and the interpretability of the fault detection models. In this study, we proposed a fault detection system for non-contact measurement. We used pose estimation and machine learning models trained based on the judgements of multiple qualified referees to realize fair fault judgement. We verified them using smartphone videos of normal race walking and walking with intentional faults in several athletes including the medalist of the Tokyo Olympics. The validation results show that the proposed system detected faults with an average accuracy of over 90%. We also revealed that the machine learning model detects faults according to the rules of race walking. In addition, the intentional faulty walking movement of the medalist was different from that of university walkers. This finding informs realization of a more general fault detection model. The code and data are available at https://github.com/SZucchini/racewalk-aijudge.  ( 3 min )
    Task Selection for AutoML System Evaluation. (arXiv:2208.12754v1 [cs.LG])
    Our goal is to assess if AutoML system changes - i.e., to the search space or hyperparameter optimization - will improve the final model's performance on production tasks. However, we cannot test the changes on production tasks. Instead, we only have access to limited descriptors about tasks that our AutoML system previously executed, like the number of data points or features. We also have a set of development tasks to test changes, ex., sampled from OpenML with no usage constraints. However, the development and production task distributions are different leading us to pursue changes that only improve development and not production. This paper proposes a method to leverage descriptor information about AutoML production tasks to select a filtered subset of the most relevant development tasks. Empirical studies show that our filtering strategy improves the ability to assess AutoML system changes on holdout tasks with different distributions than development.  ( 2 min )
    Laplacian Pyramid-like Autoencoder. (arXiv:2208.12484v1 [cs.CV])
    In this paper, we develop the Laplacian pyramid-like autoencoder (LPAE) by adding the Laplacian pyramid (LP) concept widely used to analyze images in Signal Processing. LPAE decomposes an image into the approximation image and the detail image in the encoder part and then tries to reconstruct the original image in the decoder part using the two components. We use LPAE for experiments on classifications and super-resolution areas. Using the detail image and the smaller-sized approximation image as inputs of a classification network, our LPAE makes the model lighter. Moreover, we show that the performance of the connected classification networks has remained substantially high. In a super-resolution area, we show that the decoder part gets a high-quality reconstruction image by setting to resemble the structure of LP. Consequently, LPAE improves the original results by combining the decoder part of the autoencoder and the super-resolution network.  ( 2 min )
    Physics-Aware Neural Networks for Boundary Layer Linear Problems. (arXiv:2208.12559v1 [math.NA])
    Physics-Informed Neural Networks (PINNs) are machine learning tools that approximate the solution of general partial differential equations (PDEs) by adding them in some form as terms of the loss/cost function of a Neural Network. Most pieces of work in the area of PINNs tackle non-linear PDEs. Nevertheless, many interesting problems involving linear PDEs may benefit from PINNs; these include parametric studies, multi-query problems, and parabolic (transient) PDEs. The purpose of this paper is to explore PINNs for linear PDEs whose solutions may present one or more boundary layers. More specifically, we analyze the steady-state reaction-advection-diffusion equation in regimes in which the diffusive coefficient is small in comparison with the reactive or advective coefficients. We show that adding information about these coefficients as predictor variables in a PINN results in better prediction models than in a PINN that only uses spatial information as predictor variables. This finding may be instrumental in multiscale problems where the coefficients of the PDEs present high variability in small spatiotemporal regions of the domain, and therefore PINNs may be employed together with domain decomposition techniques to efficiently approximate the PDEs locally at each partition of the spatiotemporal domain, without resorting to different learned PINN models at each of these partitions.  ( 2 min )
    Automatic Testing and Validation of Level of Detail Reductions Through Supervised Learning. (arXiv:2208.12674v1 [cs.GR])
    Modern video games are rapidly growing in size and scale, and to create rich and interesting environments, a large amount of content is needed. As a consequence, often several thousands of detailed 3D assets are used to create a single scene. As each asset's polygon mesh can contain millions of polygons, the number of polygons that need to be drawn every frame may exceed several billions. Therefore, the computational resources often limit how many detailed objects that can be displayed in a scene. To push this limit and to optimize performance one can reduce the polygon count of the assets when possible. Basically, the idea is that an object at farther distance from the capturing camera, consequently with relatively smaller screen size, its polygon count may be reduced without affecting the perceived quality. Level of Detail (LOD) refers to the complexity level of a 3D model representation. The process of removing complexity is often called LOD reduction and can be done automatically with an algorithm or by hand by artists. However, this process may lead to deterioration of the visual quality if the different LODs differ significantly, or if LOD reduction transition is not seamless. Today the validation of these results is mainly done manually requiring an expert to visually inspect the results. However, this process is slow, mundane, and therefore prone to error. Herein we propose a method to automate this process based on the use of deep convolutional networks. We report promising results and envision that this method can be used to automate the process of LOD reduction testing and validation.  ( 3 min )
    Learning energy-efficient driving behaviors by imitating experts. (arXiv:2208.12534v1 [cs.RO])
    The rise of vehicle automation has generated significant interest in the potential role of future automated vehicles (AVs). In particular, in highly dense traffic settings, AVs are expected to serve as congestion-dampeners, mitigating the presence of instabilities that arise from various sources. However, in many applications, such maneuvers rely heavily on non-local sensing or coordination by interacting AVs, thereby rendering their adaptation to real-world settings a particularly difficult challenge. To address this challenge, this paper examines the role of imitation learning in bridging the gap between such control strategies and realistic limitations in communication and sensing. Treating one such controller as an "expert", we demonstrate that imitation learning can succeed in deriving policies that, if adopted by 5% of vehicles, may boost the energy-efficiency of networks with varying traffic conditions by 15% using only local observations. Results and code are available online at https://sites.google.com/view/il-traffic/home.  ( 2 min )
    Four Algorithms for Correlation Clustering: A Survey. (arXiv:2208.12636v1 [cs.DS])
    In the Correlation Clustering problem, we are given a set of objects with pairwise similarity information. Our aim is to partition these objects into clusters that match this information as closely as possible. More specifically, the pairwise information is given as a weighted graph $G$ with its edges labelled as ``similar" or ``dissimilar" by a binary classifier. The goal is to produce a clustering that minimizes the weight of ``disagreements": the sum of the weights of similar edges across clusters and dissimilar edges within clusters. In this exposition we focus on the case when $G$ is complete and unweighted. We explore four approximation algorithms for the Correlation Clustering problem under this assumption. In particular, we describe the following algorithms: (i) the $17429-$approximation algorithm by Bansal, Blum, and Chawla, (ii) the $4-$approximation algorithm by Charikar, Guruswami, and Wirth (iii) the $3-$approximation algorithm by Ailon, Charikar, and Newman (iv) the $2.06-$approximation algorithm by Chawla, Makarychev, Schramm, and Yaroslavtsev.  ( 2 min )
    Ab-initio quantum chemistry with neural-network wavefunctions. (arXiv:2208.12590v1 [physics.chem-ph])
    Machine learning and specifically deep-learning methods have outperformed human capabilities in many pattern recognition and data processing problems, in game playing, and now also play an increasingly important role in scientific discovery. A key application of machine learning in the molecular sciences is to learn potential energy surfaces or force fields from ab-initio solutions of the electronic Schr\"odinger equation using datasets obtained with density functional theory, coupled cluster, or other quantum chemistry methods. Here we review a recent and complementary approach: using machine learning to aid the direct solution of quantum chemistry problems from first principles. Specifically, we focus on quantum Monte Carlo (QMC) methods that use neural network ansatz functions in order to solve the electronic Schr\"odinger equation, both in first and second quantization, computing ground and excited states, and generalizing over multiple nuclear configurations. Compared to existing quantum chemistry methods, these new deep QMC methods have the potential to generate highly accurate solutions of the Schr\"odinger equation at relatively modest computational cost.  ( 2 min )
    Another Use of SMOTE for Interpretable Data Collaboration Analysis. (arXiv:2208.12458v1 [cs.LG])
    Recently, data collaboration (DC) analysis has been developed for privacy-preserving integrated analysis across multiple institutions. DC analysis centralizes individually constructed dimensionality-reduced intermediate representations and realizes integrated analysis via collaboration representations without sharing the original data. To construct the collaboration representations, each institution generates and shares a shareable anchor dataset and centralizes its intermediate representation. Although, random anchor dataset functions well for DC analysis in general, using an anchor dataset whose distribution is close to that of the raw dataset is expected to improve the recognition performance, particularly for the interpretable DC analysis. Based on an extension of the synthetic minority over-sampling technique (SMOTE), this study proposes an anchor data construction technique to improve the recognition performance without increasing the risk of data leakage. Numerical results demonstrate the efficiency of the proposed SMOTE-based method over the existing anchor data constructions for artificial and real-world datasets. Specifically, the proposed method achieves 9 percentage point and 38 percentage point performance improvements regarding accuracy and essential feature selection, respectively, over existing methods for an income dataset. The proposed method provides another use of SMOTE not for imbalanced data classifications but for a key technology of privacy-preserving integrated analysis.  ( 2 min )
    2060: Civilization, Energy, and Progression of Mankind on the Kardashev Scale. (arXiv:2208.12617v1 [cs.CY])
    Energy has been propelling the development of human civilization for millennia, and technologies acquiring energy beyond human and animal power have been continuously advanced and transformed. In 1964, the Kardashev Scale was proposed to quantify the relationship between energy consumption and the development of civilizations. Human civilization presently stands at Type 0.7276 on this scale. Projecting the future energy consumption, estimating the change of its constituting structure, and evaluating the influence of possible technological revolutions are critical in the context of civilization development. In this study, we use two machine learning models, random forest (RF) and autoregressive integrated moving average (ARIMA), to simulate and predict energy consumption on a global scale. We further project the position of human civilization on the Kardashev Scale in 2060. The result shows that the global energy consumption is expected to reach 928-940 EJ in 2060, with a total growth of over 50% in the coming 40 years, and our civilization is expected to achieve Type 0.7474 on the Kardashev Scale, still far away from a Type 1 civilization. Additionally, we discuss the potential energy segmentation change before 2060 and present the influence of the advent of nuclear fusion in this context.  ( 2 min )
    Pushing the limits of fairness impossibility: Who's the fairest of them all?. (arXiv:2208.12606v1 [cs.CY])
    The impossibility theorem of fairness is a foundational result in the algorithmic fairness literature. It states that outside of special cases, one cannot exactly and simultaneously satisfy all three common and intuitive definitions of fairness - demographic parity, equalized odds, and predictive rate parity. This result has driven most works to focus on solutions for one or two of the metrics. Rather than follow suit, in this paper we present a framework that pushes the limits of the impossibility theorem in order to satisfy all three metrics to the best extent possible. We develop an integer-programming based approach that can yield a certifiably optimal post-processing method for simultaneously satisfying multiple fairness criteria under small violations. We show experiments demonstrating that our post-processor can improve fairness across the different definitions simultaneously with minimal model performance reduction. We also discuss applications of our framework for model selection and fairness explainability, thereby attempting to answer the question: who's the fairest of them all?  ( 2 min )
    Take One Gram of Neural Features, Get Enhanced Group Robustness. (arXiv:2208.12625v1 [cs.LG])
    Predictive performance of machine learning models trained with empirical risk minimization (ERM) can degrade considerably under distribution shifts. The presence of spurious correlations in training datasets leads ERM-trained models to display high loss when evaluated on minority groups not presenting such correlations. Extensive attempts have been made to develop methods improving worst-group robustness. However, they require group information for each training input or at least, a validation set with group labels to tune their hyperparameters, which may be expensive to get or unknown a priori. In this paper, we address the challenge of improving group robustness without group annotation during training or validation. To this end, we propose to partition the training dataset into groups based on Gram matrices of features extracted by an ``identification'' model and to apply robust optimization based on these pseudo-groups. In the realistic context where no group labels are available, our experiments show that our approach not only improves group robustness over ERM but also outperforms all recent baselines  ( 2 min )
    Lower Difficulty and Better Robustness: A Bregman Divergence Perspective for Adversarial Training. (arXiv:2208.12511v1 [cs.LG])
    In this paper, we investigate on improving the adversarial robustness obtained in adversarial training (AT) via reducing the difficulty of optimization. To better study this problem, we build a novel Bregman divergence perspective for AT, in which AT can be viewed as the sliding process of the training data points on the negative entropy curve. Based on this perspective, we analyze the learning objectives of two typical AT methods, i.e., PGD-AT and TRADES, and we find that the optimization process of TRADES is easier than PGD-AT for that TRADES separates PGD-AT. In addition, we discuss the function of entropy in TRADES, and we find that models with high entropy can be better robustness learners. Inspired by the above findings, we propose two methods, i.e., FAIT and MER, which can both not only reduce the difficulty of optimization under the 10-step PGD adversaries, but also provide better robustness. Our work suggests that reducing the difficulty of optimization under the 10-step PGD adversaries is a promising approach for enhancing the adversarial robustness in AT.  ( 2 min )
    Enabling Weakly-Supervised Temporal Action Localization from On-Device Learning of the Video Stream. (arXiv:2208.12673v1 [cs.CV])
    Detecting actions in videos have been widely applied in on-device applications. Practical on-device videos are always untrimmed with both action and background. It is desirable for a model to both recognize the class of action and localize the temporal position where the action happens. Such a task is called temporal action location (TAL), which is always trained on the cloud where multiple untrimmed videos are collected and labeled. It is desirable for a TAL model to continuously and locally learn from new data, which can directly improve the action detection precision while protecting customers' privacy. However, it is non-trivial to train a TAL model, since tremendous video samples with temporal annotations are required. However, annotating videos frame by frame is exorbitantly time-consuming and expensive. Although weakly-supervised TAL (W-TAL) has been proposed to learn from untrimmed videos with only video-level labels, such an approach is also not suitable for on-device learning scenarios. In practical on-device learning applications, data are collected in streaming. Dividing such a long video stream into multiple video segments requires lots of human effort, which hinders the exploration of applying the TAL tasks to realistic on-device learning applications. To enable W-TAL models to learn from a long, untrimmed streaming video, we propose an efficient video learning approach that can directly adapt to new environments. We first propose a self-adaptive video dividing approach with a contrast score-based segment merging approach to convert the video stream into multiple segments. Then, we explore different sampling strategies on the TAL tasks to request as few labels as possible. To the best of our knowledge, we are the first attempt to directly learn from the on-device, long video stream.  ( 3 min )
    Towards Automated Imbalanced Learning with Deep Hierarchical Reinforcement Learning. (arXiv:2208.12433v1 [cs.LG])
    Imbalanced learning is a fundamental challenge in data mining, where there is a disproportionate ratio of training samples in each class. Over-sampling is an effective technique to tackle imbalanced learning through generating synthetic samples for the minority class. While numerous over-sampling algorithms have been proposed, they heavily rely on heuristics, which could be sub-optimal since we may need different sampling strategies for different datasets and base classifiers, and they cannot directly optimize the performance metric. Motivated by this, we investigate developing a learning-based over-sampling algorithm to optimize the classification performance, which is a challenging task because of the huge and hierarchical decision space. At the high level, we need to decide how many synthetic samples to generate. At the low level, we need to determine where the synthetic samples should be located, which depends on the high-level decision since the optimal locations of the samples may differ for different numbers of samples. To address the challenges, we propose AutoSMOTE, an automated over-sampling algorithm that can jointly optimize different levels of decisions. Motivated by the success of SMOTE~\cite{chawla2002smote} and its extensions, we formulate the generation process as a Markov decision process (MDP) consisting of three levels of policies to generate synthetic samples within the SMOTE search space. Then we leverage deep hierarchical reinforcement learning to optimize the performance metric on the validation data. Extensive experiments on six real-world datasets demonstrate that AutoSMOTE significantly outperforms the state-of-the-art resampling algorithms. The code is at https://github.com/daochenzha/autosmote  ( 3 min )
    Meta Objective Guided Disambiguation for Partial Label Learning. (arXiv:2208.12459v1 [cs.LG])
    Partial label learning (PLL) is a typical weakly supervised learning framework, where each training instance is associated with a candidate label set, among which only one label is valid. To solve PLL problems, typically methods try to perform disambiguation for candidate sets by either using prior knowledge, such as structure information of training data, or refining model outputs in a self-training manner. Unfortunately, these methods often fail to obtain a favorable performance due to the lack of prior information or unreliable predictions in the early stage of model training. In this paper, we propose a novel framework for partial label learning with meta objective guided disambiguation (MoGD), which aims to recover the ground-truth label from candidate labels set by solving a meta objective on a small validation set. Specifically, to alleviate the negative impact of false positive labels, we re-weight each candidate label based on the meta loss on the validation set. Then, the classifier is trained by minimizing the weighted cross entropy loss. The proposed method can be easily implemented by using various deep networks with the ordinary SGD optimizer. Theoretically, we prove the convergence property of meta objective and derive the estimation error bounds of the proposed method. Extensive experiments on various benchmark datasets and real-world PLL datasets demonstrate that the proposed method can achieve competent performance when compared with the state-of-the-art methods.  ( 3 min )
    Extreme Gradient Boosting for Yield Estimation compared with Deep Learning Approaches. (arXiv:2208.12633v1 [cs.LG])
    Accurate prediction of crop yield before harvest is of great importance for crop logistics, market planning, and food distribution around the world. Yield prediction requires monitoring of phenological and climatic characteristics over extended time periods to model the complex relations involved in crop development. Remote sensing satellite images provided by various satellites circumnavigating the world are a cheap and reliable way to obtain data for yield prediction. The field of yield prediction is currently dominated by Deep Learning approaches. While the accuracies reached with those approaches are promising, the needed amounts of data and the ``black-box'' nature can restrict the application of Deep Learning methods. The limitations can be overcome by proposing a pipeline to process remote sensing images into feature-based representations that allow the employment of Extreme Gradient Boosting (XGBoost) for yield prediction. A comparative evaluation of soybean yield prediction within the United States shows promising prediction accuracies compared to state-of-the-art yield prediction systems based on Deep Learning. Feature importances expose the near-infrared spectrum of light as an important feature within our models. The reported results hint at the capabilities of XGBoost for yield prediction and encourage future experiments with XGBoost for yield prediction on other crops in regions all around the world.  ( 3 min )
    Can a Hebbian-like learning rule be avoiding the curse of dimensionality in sparse distributed data?. (arXiv:2208.12564v1 [cs.NE])
    It is generally assumed that the brain uses something akin to sparse distributed representations. These representations, however, are high-dimensional and consequently they affect classification performance of traditional Machine Learning models due to "the curse of dimensionality". In tasks for which there is a vast amount of labeled data, Deep Networks seem to solve this issue with many layers and a non-Hebbian backpropagation algorithm. The brain, however, seems to be able to solve the problem with few layers. In this work, we hypothesize that this happens by using Hebbian learning. Actually, the Hebbian-like learning rule of Restricted Boltzmann Machines learns the input patterns asymmetrically. It exclusively learns the correlation between non-zero values and ignores the zeros, which represent the vast majority of the input dimensionality. By ignoring the zeros "the curse of dimensionality" problem can be avoided. To test our hypothesis, we generated several sparse datasets and compared the performance of a Restricted Boltzmann Machine classifier with some Backprop-trained networks. The experiments using these codes confirm our initial intuition as the Restricted Boltzmann Machine shows a good generalization performance, while the Neural Networks trained with the backpropagation algorithm overfit the training data.  ( 3 min )
    Deep Hypergraph Structure Learning. (arXiv:2208.12547v1 [cs.LG])
    Learning on high-order correlation has shown superiority in data representation learning, where hypergraph has been widely used in recent decades. The performance of hypergraph-based representation learning methods, such as hypergraph neural networks, highly depends on the quality of the hypergraph structure. How to generate the hypergraph structure among data is still a challenging task. Missing and noisy data may lead to "bad connections" in the hypergraph structure and destroy the hypergraph-based representation learning process. Therefore, revealing the high-order structure, i.e., the hypergraph behind the observed data, becomes an urgent but important task. To address this issue, we design a general paradigm of deep hypergraph structure learning, namely DeepHGSL, to optimize the hypergraph structure for hypergraph-based representation learning. Concretely, inspired by the information bottleneck principle for the robustness issue, we first extend it to the hypergraph case, named by the hypergraph information bottleneck (HIB) principle. Then, we apply this principle to guide the hypergraph structure learning, where the HIB is introduced to construct the loss function to minimize the noisy information in the hypergraph structure. The hypergraph structure can be optimized and this process can be regarded as enhancing the correct connections and weakening the wrong connections in the training phase. Therefore, the proposed method benefits to extract more robust representations even on a heavily noisy structure. Finally, we evaluate the model on four benchmark datasets for representation learning. The experimental results on both graph- and hypergraph-structured data demonstrate the effectiveness and robustness of our method compared with other state-of-the-art methods.  ( 3 min )
    Understanding Deep Learning using Topological Dynamical Systems, Index Theory, and Homology. (arXiv:2208.12562v1 [cs.LG])
    In this paper we investigate Deep Learning Models using topological dynamical systems, index theory, and computational homology. These mathematical machinery was invented initially by Henri Poincare around 1900 and developed over time to understand shapes and dynamical systems whose structure and behavior is too complicated to solve for analytically but can be understood via global relationships. In particular, we show how individual neurons in a neural network can correspond to simplexes in a simplicial complex manifold approximation to the decision surface learned by the NN, and how these simplexes can be used to compute topological invariants from algebraic topology for the decision manifold with an explicit computation of homology groups by hand in a simple case. We also show how the gradient of the probability density function learned by the NN creates a dynamical system, which can be analyzed by a myriad of topological tools such as Conley Index Theory, Morse Theory, and Stable Manifolds. We solve analytically for associated the differential equation for a trained NN with a single hidden layer of 256 Neurons applied to the MINST digit dataset, and approximately numerically that it a sink and basin of attraction for each of the 10 classes, but the sinks and strong attracting manifolds lie in regions not corresponding to images of actual digits. Index theory implies the existence of saddles. Level sets of the probability functions are 783-dimensional manifolds which can only change topology at critical points of the dynamical system, and these changes in topology can be investigated with Morse Theory.  ( 3 min )
    Towards Higher-order Topological Consistency for Unsupervised Network Alignment. (arXiv:2208.12463v1 [cs.LG])
    Network alignment task, which aims to identify corresponding nodes in different networks, is of great significance for many subsequent applications. Without the need for labeled anchor links, unsupervised alignment methods have been attracting more and more attention. However, the topological consistency assumptions defined by existing methods are generally low-order and less accurate because only the edge-indiscriminative topological pattern is considered, which is especially risky in an unsupervised setting. To reposition the focus of the alignment process from low-order to higher-order topological consistency, in this paper, we propose a fully unsupervised network alignment framework named HTC. The proposed higher-order topological consistency is formulated based on edge orbits, which is merged into the information aggregation process of a graph convolutional network so that the alignment consistencies are transformed into the similarity of node embeddings. Furthermore, the encoder is trained to be multi-orbit-aware and then be refined to identify more trusted anchor links. Node correspondence is comprehensively evaluated by integrating all different orders of consistency. {In addition to sound theoretical analysis, the superiority of the proposed method is also empirically demonstrated through extensive experimental evaluation. On three pairs of real-world datasets and two pairs of synthetic datasets, our HTC consistently outperforms a wide variety of unsupervised and supervised methods with the least or comparable time consumption. It also exhibits robustness to structural noise as a result of our multi-orbit-aware training mechanism.  ( 3 min )
    Identifying and Overcoming Transformation Bias in Forecasting Models. (arXiv:2208.12264v1 [cs.LG])
    Log and square root transformations of target variable are routinely used in forecasting models to predict future sales. These transformations often lead to better performing models. However, they also introduce a systematic negative bias (under-forecasting). In this paper, we demonstrate the existence of this bias, dive deep into its root cause and introduce two methods to correct for the bias. We conclude that the proposed bias correction methods improve model performance (by up to 50%) and make a case for incorporating bias correction in modeling workflow. We also experiment with `Tweedie' family of cost functions which circumvents the transformation bias issue by modeling directly on sales. We conclude that Tweedie regression gives the best performance so far when modeling on sales making it a strong alternative to working with a transformed target variable.  ( 2 min )
  • Open

    Information Theory with Kernel Methods. (arXiv:2202.08545v2 [cs.IT] UPDATED)
    We consider the analysis of probability distributions through their associated covariance operators from reproducing kernel Hilbert spaces. We show that the von Neumann entropy and relative entropy of these operators are intimately related to the usual notions of Shannon entropy and relative entropy, and share many of their properties. They come together with efficient estimation algorithms from various oracles on the probability distributions. We also consider product spaces and show that for tensor product kernels, we can define notions of mutual information and joint entropies, which can then characterize independence perfectly, but only partially conditional independence. We finally show how these new notions of relative entropy lead to new upper-bounds on log partition functions, that can be used together with convex optimization within variational inference methods, providing a new family of probabilistic inference methods.
    Universal Mini-Batch Consistency for Set Encoding Functions. (arXiv:2208.12401v1 [cs.LG])
    Previous works have established solid foundations for neural set functions, as well as effective architectures which preserve the necessary properties for operating on sets, such as being invariant to permutations of the set elements. Subsequently, Mini-Batch Consistency (MBC), the ability to sequentially process any permutation of any random set partition scheme while maintaining consistency guarantees on the output, has been established but with limited options for network architectures. We further study the MBC property in neural set encoding functions, establishing a method for converting arbitrary non-MBC models to satisfy MBC. In doing so, we provide a framework for a universally-MBC (UMBC) class of set functions. Additionally, we explore an interesting dropout strategy made possible by our framework, and investigate its effects on probabilistic calibration under test-time distributional shifts. We validate UMBC with proofs backed by unit tests, also providing qualitative/quantitative experiments on toy data, clean and corrupted point cloud classification, and amortized clustering on ImageNet. The results demonstrate the utility of UMBC, and we further discover that our dropout strategy improves uncertainty calibration.
    Constructing unbiased gradient estimators with finite variance for conditional stochastic optimization. (arXiv:2206.01991v2 [math.NA] UPDATED)
    We study stochastic gradient descent for solving conditional stochastic optimization problems, in which an objective to be minimized is given by a parametric nested expectation with an outer expectation taken with respect to one random variable and an inner conditional expectation with respect to the other random variable. The gradient of such a parametric nested expectation is again expressed as a nested expectation, which makes it hard for the standard nested Monte Carlo estimator to be unbiased. In this paper, we show under some conditions that a multilevel Monte Carlo gradient estimator is unbiased and has finite variance and finite expected computational cost, so that the standard theory from stochastic optimization for a parametric (non-nested) expectation directly applies. We also discuss a special case for which yet another unbiased gradient estimator with finite variance and cost can be constructed.
    Confusion Matrices and Accuracy Statistics for Binary Classifiers Using Unlabeled Data: The Diagnostic Test Approach. (arXiv:2208.12664v1 [stat.ML])
    Medical researchers have solved the problem of estimating the sensitivity and specificity of binary medical diagnostic tests without gold standard tests for comparison. That problem is the same as estimating confusion matrices for classifiers on unlabeled data. This article describes how to modify the diagnostic test solutions to estimate confusion matrices and accuracy statistics for supervised or unsupervised binary classifiers on unlabeled data.
    Replication Robust Payoff Allocation in Submodular Cooperative Games. (arXiv:2006.14583v5 [cs.LG] UPDATED)
    Submodular functions have been a powerful mathematical model for a wide range of real-world applications. Recently, submodular functions are becoming increasingly important in machine learning (ML) for modelling notions such as information and redundancy among entities such as data and features. Among these applications, a key question is payoff allocation, i.e., how to evaluate the importance of each entity towards the collective objective? To this end, classic solution concepts from cooperative game theory offer principled approaches to payoff allocation. However, despite the extensive body of game-theoretic literature, payoff allocation in submodular games are relatively under-researched. In particular, an important notion that arises in the emerging submodular applications is redundancy, which may occur from various sources such as abundant data or malicious manipulations where a player replicates its resource and act under multiple identities. Though many game-theoretic solution concepts can be directly used in submodular games, naively applying them for payoff allocation in these settings may incur robustness issues against replication. In this paper, we systematically study the replication manipulation in submodular games and investigate replication robustness, a metric that quantitatively measures the robustness of solution concepts against replication. Using this metric, we present conditions which theoretically characterise the robustness of semivalues, a wide family of solution concepts including the Shapley and Banzhaf value. Moreover, we empirically validate our theoretical results on an emerging submodular ML application, i.e., the ML data market.
    Variance Reduction based Experience Replay for Policy Optimization. (arXiv:2208.12341v1 [stat.ML])
    For reinforcement learning on complex stochastic systems where many factors dynamically impact the output trajectories, it is desirable to effectively leverage the information from historical samples collected in previous iterations to accelerate policy optimization. Classical experience replay allows agents to remember by reusing historical observations. However, the uniform reuse strategy that treats all observations equally overlooks the relative importance of different samples. To overcome this limitation, we propose a general variance reduction based experience replay (VRER) framework that can selectively reuse the most relevant samples to improve policy gradient estimation. This selective mechanism can adaptively put more weight on past samples that are more likely to be generated by the current target distribution. Our theoretical and empirical studies show that the proposed VRER can accelerate the learning of optimal policy and enhance the performance of state-of-the-art policy optimization approaches.
    Fundamentals of Task-Agnostic Data Valuation. (arXiv:2208.12354v1 [cs.LG])
    We study valuing the data of a data owner/seller for a data seeker/buyer. Data valuation is often carried out for a specific task assuming a particular utility metric, such as test accuracy on a validation set, that may not exist in practice. In this work, we focus on task-agnostic data valuation without any validation requirements. The data buyer has access to a limited amount of data (which could be publicly available) and seeks more data samples from a data seller. We formulate the problem as estimating the differences in the statistical properties of the data at the seller with respect to the baseline data available at the buyer. We capture these statistical differences through second moment by measuring diversity and relevance of the seller's data for the buyer; we estimate these measures through queries to the seller without requesting raw data. We design the queries with the proposed approach so that the seller is blind to the buyer's raw data and has no knowledge to fabricate responses to queries to obtain a desired outcome of the diversity and relevance trade-off.We will show through extensive experiments on real tabular and image datasets that the proposed estimates capture the diversity and relevance of the seller's data for the buyer.
    Dynamic Regret of Online Markov Decision Processes. (arXiv:2208.12483v1 [cs.LG])
    We investigate online Markov Decision Processes (MDPs) with adversarially changing loss functions and known transitions. We choose dynamic regret as the performance measure, defined as the performance difference between the learner and any sequence of feasible changing policies. The measure is strictly stronger than the standard static regret that benchmarks the learner's performance with a fixed compared policy. We consider three foundational models of online MDPs, including episodic loop-free Stochastic Shortest Path (SSP), episodic SSP, and infinite-horizon MDPs. For these three models, we propose novel online ensemble algorithms and establish their dynamic regret guarantees respectively, in which the results for episodic (loop-free) SSP are provably minimax optimal in terms of time horizon and certain non-stationarity measure. Furthermore, when the online environments encountered by the learner are predictable, we design improved algorithms and achieve better dynamic regret bounds for the episodic (loop-free) SSP; and moreover, we demonstrate impossibility results for the infinite-horizon MDPs.
    Electronic excited states in deep variational Monte Carlo. (arXiv:2203.09472v2 [physics.chem-ph] UPDATED)
    Obtaining accurate ground and low-lying excited states of electronic systems is crucial in a multitude of important applications. One ab initio method for solving the Schr\"odinger equation that scales favorably for large systems is variational quantum Monte Carlo (QMC). The recently introduced deep QMC approach uses ansatzes represented by deep neural networks and generates nearly exact ground-state solutions for molecules containing up to a few dozen electrons, with the potential to scale to much larger systems where other highly accurate methods are not feasible. In this paper, we extend one such ansatz (PauliNet) to compute electronic excited states. We demonstrate our method on various small atoms and molecules and consistently achieve high accuracy for low-lying states. To highlight the method's potential, we compute the first excited state of the much larger benzene molecule, as well as the conical intersection of ethylene, with PauliNet matching results of more expensive high-level methods.
    Causal Bandits for Linear Structural Equation Models. (arXiv:2208.12764v1 [stat.ML])
    This paper studies the problem of designing an optimal sequence of interventions in a causal graphical model to minimize the cumulative regret with respect to the best intervention in hindsight. This is, naturally, posed as a causal bandit problem. The focus is on causal bandits for linear structural equation models (SEMs) and soft interventions. It is assumed that the graph's structure is known, and it has $N$ nodes. Two linear mechanisms, one soft intervention and one observational, are assumed for each node, giving rise to $2^N$ possible interventions. The existing causal bandit algorithms assume that at least the interventional distributions of the reward node's parents are fully specified. However, there are $2^N$ such distributions (one corresponding to each intervention), acquiring which becomes prohibitive even in moderate-sized graphs. This paper dispenses with the assumption of knowing these distributions. Two algorithms are proposed for the frequentist (UCB-based) and Bayesian (Thompson Sampling-based) settings. The key idea of these algorithms is to avoid directly estimating the $2^N$ reward distributions and instead estimate the parameters that fully specify the SEMs (linear in $N$) and use them to compute the rewards. In both algorithms, under boundedness assumptions on noise and the parameter space, the cumulative regrets scale as $\tilde{\cal O} ((2d)^L L \sqrt{T})$, where $d$ is the graph's maximum degree, and $L$ is the length of its longest causal path.
    Large-N dynamics of the spiked tensor model with random initial conditions. (arXiv:2208.12586v1 [hep-th])
    In these notes, we develop a path integral approach for the partial differential equations with random initial conditions. Then, we apply it to the dynamics of the spiked tensor model and show that the large-$N$ saddle point equations are dominated by the melonic type diagrams.
    Constraining Gaussian Processes to Systems of Linear Ordinary Differential Equations. (arXiv:2208.12515v1 [cs.LG])
    Data in many applications follows systems of Ordinary Differential Equations (ODEs). This paper presents a novel algorithmic and symbolic construction for covariance functions of Gaussian Processes (GPs) with realizations strictly following a system of linear homogeneous ODEs with constant coefficients, which we call LODE-GPs. Introducing this strong inductive bias into a GP improves modelling of such data. Using smith normal form algorithms, a symbolic technique, we overcome two current restrictions in the state of the art: (1) the need for certain uniqueness conditions in the set of solutions, typically assumed in classical ODE solvers and their probabilistic counterparts, and (2) the restriction to controllable systems, typically assumed when encoding differential equations in covariance functions. We show the effectiveness of LODE-GPs in a number of experiments, for example learning physically interpretable parameters by maximizing the likelihood.
    Zero-Shot Learning with Common Sense Knowledge Graphs. (arXiv:2006.10713v4 [cs.LG] UPDATED)
    Zero-shot learning relies on semantic class representations such as hand-engineered attributes or learned embeddings to predict classes without any labeled examples. We propose to learn class representations by embedding nodes from common sense knowledge graphs in a vector space. Common sense knowledge graphs are an untapped source of explicit high-level knowledge that requires little human effort to apply to a range of tasks. To capture the knowledge in the graph, we introduce ZSL-KG, a general-purpose framework with a novel transformer graph convolutional network (TrGCN) for generating class representations. Our proposed TrGCN architecture computes non-linear combinations of node neighbourhoods. Our results show that ZSL-KG improves over existing WordNet-based methods on five out of six zero-shot benchmark datasets in language and vision.
    Estimating means of bounded random variables by betting. (arXiv:2010.09686v7 [math.ST] UPDATED)
    This paper derives confidence intervals (CI) and time-uniform confidence sequences (CS) for the classical problem of estimating an unknown mean from bounded observations. We present a general approach for deriving concentration bounds, that can be seen as a generalization and improvement of the celebrated Chernoff method. At its heart, it is based on a class of composite nonnegative martingales, with strong connections to testing by betting and the method of mixtures. We show how to extend these ideas to sampling without replacement, another heavily studied problem. In all cases, our bounds are adaptive to the unknown variance, and empirically vastly outperform existing approaches based on Hoeffding or empirical Bernstein inequalities and their recent supermartingale generalizations. In short, we establish a new state-of-the-art for four fundamental problems: CSs and CIs for bounded means, when sampling with and without replacement.
    MuLan: A Joint Embedding of Music Audio and Natural Language. (arXiv:2208.12415v1 [eess.AS])
    Music tagging and content-based retrieval systems have traditionally been constructed using pre-defined ontologies covering a rigid set of music attributes or text queries. This paper presents MuLan: a first attempt at a new generation of acoustic models that link music audio directly to unconstrained natural language music descriptions. MuLan takes the form of a two-tower, joint audio-text embedding model trained using 44 million music recordings (370K hours) and weakly-associated, free-form text annotations. Through its compatibility with a wide range of music genres and text styles (including conventional music tags), the resulting audio-text representation subsumes existing ontologies while graduating to true zero-shot functionalities. We demonstrate the versatility of the MuLan embeddings with a range of experiments including transfer learning, zero-shot music tagging, language understanding in the music domain, and cross-modal retrieval applications.
    Coefficient-based Regularized Distribution Regression. (arXiv:2208.12427v1 [stat.ML])
    In this paper, we consider the coefficient-based regularized distribution regression which aims to regress from probability measures to real-valued responses over a reproducing kernel Hilbert space (RKHS), where the regularization is put on the coefficients and kernels are assumed to be indefinite. The algorithm involves two stages of sampling, the first stage sample consists of distributions and the second stage sample is obtained from these distributions. Asymptotic behaviors of the algorithm in different regularity ranges of the regression function are comprehensively studied and learning rates are derived via integral operator techniques. We get the optimal rates under some mild conditions, which matches the one-stage sampled minimax optimal rate. Compared with the kernel methods for distribution regression in the literature, the algorithm under consideration does not require the kernel to be symmetric and positive semi-definite and hence provides a simple paradigm for designing indefinite kernel methods, which enriches the theme of the distribution regression. To the best of our knowledge, this is the first result for distribution regression with indefinite kernels, and our algorithm can improve the saturation effect.
    Ab-initio quantum chemistry with neural-network wavefunctions. (arXiv:2208.12590v1 [physics.chem-ph])
    Machine learning and specifically deep-learning methods have outperformed human capabilities in many pattern recognition and data processing problems, in game playing, and now also play an increasingly important role in scientific discovery. A key application of machine learning in the molecular sciences is to learn potential energy surfaces or force fields from ab-initio solutions of the electronic Schr\"odinger equation using datasets obtained with density functional theory, coupled cluster, or other quantum chemistry methods. Here we review a recent and complementary approach: using machine learning to aid the direct solution of quantum chemistry problems from first principles. Specifically, we focus on quantum Monte Carlo (QMC) methods that use neural network ansatz functions in order to solve the electronic Schr\"odinger equation, both in first and second quantization, computing ground and excited states, and generalizing over multiple nuclear configurations. Compared to existing quantum chemistry methods, these new deep QMC methods have the potential to generate highly accurate solutions of the Schr\"odinger equation at relatively modest computational cost.
    Randomly pivoted Cholesky: Practical approximation of a kernel matrix with few entry evaluations. (arXiv:2207.06503v3 [math.NA] UPDATED)
    Randomly pivoted Cholesky (RPCholesky) is a natural algorithm for computing a rank-k approximation of an N x N positive semidefinite (psd) matrix. RPCholesky can be implemented with just a few lines of code. It requires only (k+1)N entry evaluations and O(k^2 N) additional arithmetic operations. This paper offers the first serious investigation of its experimental and theoretical behavior. Empirically, RPCholesky matches or improves on the performance of alternative algorithms for low-rank psd approximation. Furthermore, RPCholesky provably achieves near-optimal approximation guarantees. The simplicity, effectiveness, and robustness of this algorithm strongly support its use in scientific computing and machine learning applications.
    On the Implicit Bias in Deep-Learning Algorithms. (arXiv:2208.12591v1 [cs.LG])
    Gradient-based deep-learning algorithms exhibit remarkable performance in practice, but it is not well-understood why they are able to generalize despite having more parameters than training examples. It is believed that implicit bias is a key factor in their ability to generalize, and hence it has been widely studied in recent years. In this short survey, we explain the notion of implicit bias, review main results and discuss their implications.

  • Open

    CleanRL now includes a handy hyperparameter tuner 🎉
    submitted by /u/vwxyzjn [link] [comments]  ( 87 min )
    Has Hierarchical Reinforcement Learning been abandoned?
    I haven't seen recently much research being done in the field of HRL (Hierarchical Reinforcement Learning). Is there a specific reason? submitted by /u/andrewspano [link] [comments]  ( 87 min )
    Solving 'Continuous Blackjack'
    submitted by /u/gwern [link] [comments]  ( 87 min )
    Agent trains great with PPO but terrible with SAC --> Advice for Hyperparameters
    Hey all, so in my (custom) environment the agent shall learn to control a 6 -axis robot and place 5 segments in their designated positions to form a ring. Collisions are possible not penalized for now (similar to peg-in-hole problem) Using PPO works fantastically, but my SAC agent doesnt really improve after 2M steps. Does anyone see an obvious error in my hyperparameters? Any advice more than welcomed :) ​ MDP is: 100 steps = 1 second observation space: endeffektor: position, orientation, velocity target: position, orientation target_id: Which of the 5 possible target segments am I trying to fit atm? action space: torque on each of the 6 axis reward = - (distance_to_target + distance_orientation)*const. if distance_to_target 2000: done = True My PPO settings are: n_steps = 140k (equals to > 70 episodes) batch_size = 140k learning_rate = .003 (descending to .00003) ent_coef = .005 clip_range = .2 PPO My SAC settings are: buffer_size = 100k (equals to > 50 episodes) batch_size = 40k learning_rate = .003 (descending to .00003) ent_coef = "auto" learning_starts = 20k train_freq = (40,"episode") SAC submitted by /u/disdisinform [link] [comments]  ( 91 min )
    Have you ever considered changing the way you learn
    submitted by /u/MindsetTransformat [link] [comments]  ( 103 min )
  • Open

    such great artists
    submitted by /u/echo_zoo [link] [comments]  ( 91 min )
    A Radiohead video made with the latest AI tech (Stable Diffusion)
    submitted by /u/glenniszen [link] [comments]  ( 87 min )
    How to create AI Videos with Stable DIffusion Part 1:2D Animation mode
    submitted by /u/prfitofthesngularity [link] [comments]  ( 87 min )
    Learn TensorFlow for Data Science, ML and AI
    submitted by /u/skj8 [link] [comments]  ( 86 min )
    What are the most promising text-to-image models and what are the differences between them?
    I try to put some structure to my knowledge, so correct me if I'm wrong. There is: DALL-E - the most famous now, a paid service Imagen by Google - not launched publicly yet but doing better at text interpretation than DALL-E Parti by Google - better than Imagen in understanding complex descriptions Stable Diffusion by Stability AI - ??? Craiyon (DALL-E mini) - lightweight, free and publicly available, but not very powerful. ​ Could you help me understand what are the main differences other than the ones I wrote above? submitted by /u/queen_of_suburbia [link] [comments]  ( 88 min )
    One of my favorite choreographies. It's just a brain explosion! 🤯
    submitted by /u/nalr00n [link] [comments]  ( 87 min )
    Are there even any other subsets of AI other than ML and DL?
    Anytime I search for 'how to learn AI', all I get is learning about Machine/Deep Learning. Don't get me wrong- I know that they are the main fields in AI right now but they are supposed to be a subsets of AI, not the entire AI itself. So I searched 'Subsets of AI' and this is what I got: 1) Machine Learning 2)Deep Learning 3)Computer Vision 4) Neural Processing 5) Robotics Can anyone tell me how in the world is Robotics a subset of AI lol? Anyways, I am interested in Robotics. I might be completely wrong, but I think Machine Learning and Deep Learning might be used in Robotics, rather than there being a separate AI field for robotics. Also, I'm interested in AGI aswell. Now that's not really an established field right now, but my question is: Are the researchers in this field using Machine Learning and Deep Learning to develop AGI? Like will this field likely be developed using both these two? So should I learn ML or DL to get into this? Also, can anyone tell me about the researchers who are working on AGI currently? submitted by /u/Soldiac [link] [comments]  ( 91 min )
    Bioluminescence terrarium
    submitted by /u/widgia [link] [comments]  ( 87 min )
    LaMDa 2 available now
    submitted by /u/roblox22y [link] [comments]  ( 87 min )
    OpenAI Jukebox + CogVideo, completely generated music + video
    https://www.youtube.com/watch?v=KJOZEwJ3w0o&ab_channel=JohannezzMusic submitted by /u/Dreamaster015 [link] [comments]  ( 92 min )
    A.I. Websites You Probably Didn't Know Existed
    submitted by /u/Marketing_introverts [link] [comments]  ( 86 min )
    Saw a post on this subreddit a while ago, can anyone help me find it?
    I remember seeing this on reddit a few years ago, this post had a link to a website. On this site, you would press one of two buttons. The site uses some algorithm to predict which one of the two buttons you will press next. For every correct prediction from the algorithm, you lose points, for an incorrect prediction, you gain points. You start gaining points but at some point, the algorithm latches on to your patterns, and you quickly lose all your points. Does anyone know such a website? submitted by /u/llllIlllIlllllll [link] [comments]  ( 87 min )
    Stable Diffusion Grisk GUI 0.1
    submitted by /u/StantheBrain [link] [comments]  ( 87 min )
  • Open

    [D] Quantity Recommandation Using ML / Deep learning
    Hello Everyone , I am trying to find an algorithm for Quantity recommendation with the output having this form: ( X= product , Y = Quantity ( 1,2,3 .......)) ​ https://preview.redd.it/ga8h44yt1jk91.jpg?width=495&format=pjpg&auto=webp&s=5fa274fb5e2d4dca97f00226dd6a703341034bc6 ​ ​ ​ I tried using Xgboost , the problem with this approach is i couldn't refer to the product ( Ex : 2 of which product) ​ ​ ​ ​ ​ ​ ​ Can anyone help me find a solution ? submitted by /u/SpiritedBee9303 [link] [comments]  ( 89 min )
    [D] text similarity/ text classification
    Hi folks, I hope I find you all well. I wanna ask for your advise: I need to classify ~13M companies names into 80k "clean names". I got ~6M entries that have been matched already using dictionaries and labeling functions. My goal is to build an estimator to predict the clean name from the dirty name (not normalized). What I tried so far: - using text distance (ie levenstein) to pick the closer clean name for each dirty name. It took too much time, so I couldnt test it for real and I have to abandon this option. - text vectorization + tfidf (eg tfidfVectorizer from scikit learn) calculating the prob of the category based on the frequency of the terms. This gave me some good outputs, but clearly I need to go further to have something for production. Beside the company name i might have address and country. What would you recommended to do? Or which is the best approach for my problem? Can you recommend some topics to read? I am really not sure which kind of problem is this, text similarity or text classification or ..? Tl;dr: i need to classify 13M text entries into 80k categories. What would you do? submitted by /u/AndroidePsicokiller [link] [comments]  ( 91 min )
    [D] What are some dead ideas in machine learning or machine learning textbooks?
    Every now and then I flip one of those books on ML from the 80s and see a bunch of algorithms or models such as Adaline, Helmholtz and Boltzmann machine, and wonder why virtually nobody talks about them anymore. Can someone in this field point out some algorithms/ideas that are basically dead or abandoned at this point? submitted by /u/fromnighttilldawn [link] [comments]  ( 91 min )
    [D] Simple Questions Thread
    Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead! Thread will stay alive until next one so keep posting after the date in the title. Thanks to everyone for answering questions in the previous thread! submitted by /u/AutoModerator [link] [comments]  ( 91 min )
    [D] What happened to Projection Pursuit?
    Projection Pursuit and NNs are pretty much the same in the fashion they escape curse of dimensionality, yet from what it looks like research on Projection Pursuit is pretty much dead. Did Projection Pursuit fail to bring anything interesting to the table over (ever basic) NN architectures? submitted by /u/NoFreeCrunchTheorem [link] [comments]  ( 88 min )
    [D] Can neural networks perceive time ?
    The recent debate about neural networks being conscious made me curious about different attributes of consciousness and to what extent neural networks exhibit them. I came across this interesting paper (https://www.nature.com/articles/s41467-018-08194-7) which seems to claim that activity in visual classification networks (like AlexNet) can be used as a proxy for subjective time ! ​ The machine learning relevant section from the paper is the following: At each time-step, a video frame was fed into the input layer of the network and the subsequent higher layers were activated. For each frame, we extracted the activations of all neurons from layers conv2, pool5, fc7 and the output probabilities. For each layer, we calculated the Euclidean distance between successive states. If the acti…  ( 116 min )
    [P] CatalyzeX Update - Now see code for papers as you're browsing IEEE and Google Scholar
    IEEE and Google Scholar are used by many here, so I'm sharing an update! So now with the CatalyzeX browser extension: - you can see code buttons in IEEE pages by popular demand 🧑‍💻 (make sure you enable permissions as shown in the snapshot) - all code implementations now linked on Google Scholar 🔗 - misc. fixes and improvements 🔨 All this thanks to the latest release of the CatalyzeX browser extension: https://chrome.google.com/webstore/detail/aiml-papers-with-code-eve/aikkeehnlfpamidigaffhfmgbkdeheil Do check it out live, and your feedback and constructive criticism is highly welcome anytime! 🙏 (Disclaimer: I am one of the creators of CatalyzeX) Processing img 6lawn3gnofk91... https://preview.redd.it/jjxa14gnofk91.jpg?width=1454&format=pjpg&auto=webp&s=d40e40c0de2bddb0f65c44d231131b638d9c3119 https://preview.redd.it/x6v11agnofk91.png?width=2880&format=png&auto=webp&s=62c681fd5af4d7fa6c74640ddee376d6b903067a submitted by /u/MLtinkerer [link] [comments]  ( 89 min )
    [P] Pseudo Label Generation for Unsupervised Video Anomaly Detection
    I am trying to implement this paper: (https://arxiv.org/pdf/2203.03962.pdf) for unsupervised video anomaly detection. The gist of the paper seems to be: - Create a dataset for an unsupervised setting, by mixing up the train and anomalous videos (section 4) - Divide each video into $p(=16)$ segments each, so for video $i$ we have the segments $s_{ij}$, $j= 1,2...p$. - Using a Feature Extractor (ResNext) ,compute a '$d$' dimensional feature vector $f_{ij}$ for each $s_{ij}$. (Section 3.1) - Use this to (pre)train the Generator (a simple Autoencoder). Once G is (pre)trained, it is used to generate pseudo labels. The pseudo labels from G are then used to (pre)train the Discriminator (D). (section 3.3) - Pretrained G and D are then put into a collaborative loop, where pseudo labels from D are used to improve G and likewise. I am having trouble with generating the pseudo labels using G. The paper clearly mentions in section 3.3: "Once G is pretrained, it is used to generate pseudo labels". So does the pseudo label generation come into play once training is done? Like after the model is ready, we pass the dataset again and keep note of the reconstruction losses, and generate labels accordingly? Or are they generated during training? (Section 3.2.2 mentions the pseudo label generation and section 4 mentions the threshold selection). submitted by /u/esem29 [link] [comments]  ( 110 min )
    [D] Advice for teaching ML in undergrad
    I will be teaching an introductory course to ML to undergrad students. I am an undergrad student myself, but I performed very well on the exam of that course so I got the position. I typically enjoy teaching math/cs related things to others. Came here to ask if anyone has advice/tips etc. for teaching ML in a context like this. Edit: I don’t have any formal teaching experience, I helped some students (high school) with math and cs when they were nearing graduation, that’s it. I‘ll be running the tutorials/labs of the course, those will mainly be made up of answering questions, going over the material again and explaining homework tasks which include questions/programming/calculations. The professor does the lecture and I do the tutorials/labs by myself. submitted by /u/MaikRequim [link] [comments]  ( 98 min )
    [P] Einstein Instant NeRF
    submitted by /u/KaleidoscopeBest1569 [link] [comments]  ( 89 min )
    [D] Off-topic: is there any good SSH management tool?
    I recently acquired some servers to SSH to. However, I have to save the credentials somewhere, then input it into cmd or Terminal everytime I want to log in the servers. Is there any good SSH management tool that you guys are using? Something that can save the credentials while having good UI (like Windows Terminal). submitted by /u/AerysSk [link] [comments]  ( 107 min )
    [D] ML Ideas
    Any creative ideas for using machine learning to change an aspect of your life or area? Let’s discuss it! submitted by /u/SuspectWorking8722 [link] [comments]  ( 88 min )
  • Open

    Learn TensorFlow for Data Science, ML and AI
    submitted by /u/skj8 [link] [comments]  ( 93 min )
    Are neural networks basically just applying random* maths to data, then tuning the results until the results are accurate enough?
    Obviously this is a simplification, but as a beginner in the ML space, this is what it feels like. It seems like we're just hoping that a random* set of mathematical functions, when applied to input data, brings us closer to our desired results. Is there no place for manually tuning the analytical functions used within a neural network (or similar structure/methodology) to actually look for specific data points? For example - if I am analyzing code to predict security vulnerabilities, can I not add some sort of key indicators (in the forms of functions) with relevant weights manually anywhere in the network? What I'm looking for is essentially some use of signature-based algorithms in a neural network in cases where I can provide some hints and certainty in the analysis of the input data. Again, I'm still fairly new to ML in general, so there may very well be research papers on this very subject that I have yet to discover. ​ * "random" is not entirely accurate, but the mathematical functions used in neural networks often appear to me as just applying new ways to mix, slice, or alter the input data amongst the hidden layers without some clear reasoning for how these functions relate to the problem/data at hand. submitted by /u/battingagainstavg [link] [comments]  ( 94 min )
    Why is Everyone Training Very Deep Neural Network with Skip Connections?[Article]
    submitted by /u/JoshuaDaD [link] [comments]  ( 87 min )
    Why should you learn Few-Shot Learning with Reptile?
    Reptile was developed by open ai and it'll open yo eye🗿 submitted by /u/JoshuaDaD [link] [comments]  ( 87 min )
  • Open

    Radius, circumference, and area in non-Euclidean geometry
    How does the circumference of a circle vary with its radius? What about the area? The answers are simple and familiar in Euclidean geometry, but not as simple or as familiar in non-Euclidean geometry. This post will look at how circumference and area vary as a function of radius in three spaces: a surface with […] Radius, circumference, and area in non-Euclidean geometry first appeared on John D. Cook.  ( 5 min )

  • Open

    Civilian AI Tech Is Already Being Misused by the Bad Guys
    submitted by /u/Ok-Prior-8856 [link] [comments]  ( 87 min )
    looking for a team for a project
    I am looking for people to help me make a stock prediction program based off of our combined knowledge in the market. This is a collaborative project that does not offer direct pay but instead access to the final product when we are done as this project is not at this point intended to be a publicly available product. I have 5 years in the market and have managed to consistently hold over 1000% gains based off base investment amounts and profits taken. I have somehow managed to not ever go into the red despite plenty of absolutely horrific calls that cost me dearly over the years. I want to streamline the elements of my personal algorithm that have always lead to my big wins and remove the (usually) emotionally fueled elements that eat up at least 25% of my overall gains. I am posting this in multiple relevant forums in the hope to pull together as many perspectives, experience, and expertise into this project as possible to make a truly powerful tool for us all to use. If this interests you let's talk and get the ball rolling. submitted by /u/throwawayc117 [link] [comments]  ( 97 min )
    Best Book on the History of AI
    Hey, since I wanted to get a better overview of the field of AI and its roots, I was searching for a good books that cover this topic. After reading four very different books, I wrote a small blog-post evaluating which book does what - maybe you find it useful if you were wondering where to start yourself! Have a look! I succinctly compare: 1. “A Brief History of Artificial Intelligence: What It Is, Where We Are, and Where We Are Going” by Michael Wooldridge 2. “Darwin Among The Machines: The Evolution Of Global Intelligence” by George Dyson 3. “The Quest for Artificial Intelligence” by Nils J. Nilsson 4. “Artificial Intelligence: A Modern Approach” by Stuart Russell and Peter Norvig Do you have any other books you would recommend, or do you disagree with my thoughts on those books? Let me know! submitted by /u/werzum [link] [comments]  ( 87 min )
    Best AI image generator/differences?
    Have been using Dall-e 2 for a couple of weeks now but I'm curious about the other AI image generators, mainly Mid journey and Stable Diffusion. How do these compare and which one do you prefer? submitted by /u/Han-Sharvey [link] [comments]  ( 87 min )
    AI Dream 72 - The EPIC search for Space Broccoli
    submitted by /u/LordPewPew777 [link] [comments]  ( 87 min )
    Future of Industrial Design and/or Ui Design
    Anyone familiar with Industrial Design and/or Ui Design? In your opinion, what is the future of these industries? I got a degree relating to these fields and I can not help but think it's not worth making a career out of if technology is going to reduce the jobs to a certain degree. People in these industries have this optimistic view that they're just tools but I think that's naive. AI technology can only reduce jobs and lessen the labor of jobs remaining which means only a smaller percentile of designers will remain in these field right? submitted by /u/BernardorAbsurd [link] [comments]  ( 88 min )
    Causal Analysis of Generic Time Series Data Applied for Market Prediction
    submitted by /u/akolonin [link] [comments]  ( 87 min )
    What is Stable Diffusion? (Latent Diffusion Models Explained)
    submitted by /u/OnlyProggingForFun [link] [comments]  ( 91 min )
    Stable Diffusion selected results
    submitted by /u/fmurph22 [link] [comments]  ( 87 min )
    GLM-130B: The most powerful AI language model currently available comes from China
    submitted by /u/much_successes [link] [comments]  ( 87 min )
    Stable Diffusion Video is Here! Cyber Electro Mecha Zod A short AI anima...
    submitted by /u/prfitofthesngularity [link] [comments]  ( 92 min )
    Looking for a free program on level with copyai for me to write my short stories with
    Any suggestions ?? submitted by /u/Realistic_Monk_4703 [link] [comments]  ( 87 min )
    Make a picture a child
    Is there software that will take an adult aged picture, and try to reverse age to a child? submitted by /u/jeffsmith202 [link] [comments]  ( 87 min )
  • Open

    [P] Run Transformers models directly in JavaScript, Java and Rust with ONNX;
    The ONNX runtime provides a popular serialization format for machine learning models. ONNX enables direct inference on a number of different platforms/languages. For example, a model could be run directly on Android to limit data sent to a third party service. ONNX is an exciting development with a lot of promise. The referenced notebook below covers how to export models to ONNX using txtai. These models will then be directly run in Python, JavaScript, Java and Rust. Currently, txtai supports all these languages through it's API but this notebook runs everything direct within each language! The following is a non-exhaustive list of use cases. * Build locally executed models for mobile/edge devices * Run models with Java/JavaScript/Rust development stacks when teams prefer not to add Python to the mix * Export models to ONNX for Python inference to improve CPU performance and/or reduce number of software dependencies Notebook: https://colab.research.google.com/github/neuml/txtai/blob/master/examples/18_Export_and_run_models_with_ONNX.ipynb ‏‏‎ GitHub: https://github.com/neuml/txtai submitted by /u/RacconOG [link] [comments]  ( 89 min )
    [R] A Flaw in a SIGKDD Paper renders it Meaningless (time series anomaly detection)
    Dear Colleagues. I spotted a major flaw in a 2022 SIGKDD Paper. I believe that it renders the entire paper meaningless. I have no ill will against these authors or SIGKDD, but I think it is worth pointing this problem out to help prevent others making the same mistake [a]. (I don’t want to embarrass anyone, so I am going to slightly change their text below, so it is not trivially googleable) The authors are doing time series anomaly detection. They have a real dataset, and they create ground truth for it by “we assign the top 10% values as ground truth”. The reviewers should have seen this and immediately marked reject. Do you see why? Suppose I have a paper “Classifying Flemish Paintings as Beautiful or Ugly”. I have a complex algorithm for this (deep learning, what else). But I me…  ( 91 min )
    [D]What is the path learning machine learning and the lingo used on this reddit
    When scrolling on this reddit all I can see are foreign language. I have some experience with programing and a little bit of machine learning but I can still not understand anything on this reddit. Some people on this reddit seem to know the most cutting edge research NASA been working on for 3 years. What is the path to understand all this. Is there books that cover everything or is the knowledge locked behind 10 more years of schooling. submitted by /u/Assar2 [link] [comments]  ( 90 min )
    [D] ML Applications
    If you could design a project to “machine learning-fy” an aspect of your life or surroundings, what would it be and why? submitted by /u/According-Relief544 [link] [comments]  ( 109 min )
    [N] OpenAI cuts prices for GPT-3 by two thirds
    submitted by /u/brthornbury [link] [comments]  ( 88 min )
    [D] The Future of Art is Artificial: AI will enable creative people to do amazing things
    submitted by /u/owlthatissuperb [link] [comments]  ( 118 min )
    [P] Run Stable Diffusion locally with a web UI + artist workflow video
    submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 89 min )
    [P] Demo and Tutorial on Multiagent Reinforcement Learning using RLlib
    https://deumbra.com/2022/08/rllib-for-deep-hierarchical-multiagent-reinforcement-learning/ submitted by /u/jmugan [link] [comments]  ( 88 min )
    [R] PeRFception — a pipeline of generating large scale 3D datasets with radiance fields [POSTECH, NVIDIA, CALTECH]
    submitted by /u/SpatialComputing [link] [comments]  ( 109 min )
    [D] Why is ONNX Runtime so fast?
    I ran a gemm kernel using OpenBLAS and other famous libraries for linear algebra, but it is always slower than running GEMM op on ONNX Runtime (on CPU). What kind of magic is used in ONNX Runtime to greatly improve gemm performance? submitted by /u/maekawatoshiki [link] [comments]  ( 90 min )
    [D] How To Find the Distance between two objects with opencv ?
    Hello everyone, I hope you are all doing well. I have a question related to finding the distance between two objects. I know how to find the distance between two objects from a fixed camera. But in my case camera is not always in the same position. It turns in different directions. Also video views are zoomin and zoomout. I need to find the distance between two objects in that video footage. For example see the video below. In that video we have to find the distance between the referee and the soccer ball. https://reddit.com/link/wyyjv9/video/pceimd7a68k91/player submitted by /u/GrouchyAd4055 [link] [comments]  ( 89 min )
    [P] How to use human knowledge in an NLP model
    Hi, for my use case I need to create a text classification model from a 20k records dataset with 100 classes. As an additional resource, I have a set of keywords which have been manually picked to create a regex-based algorithm (the legacy model) to do the prediction. My task is to improve on the legacy model. As first try, I used the keyword frequencies in each document as features to train a traditional ML model (a random forest), which already achieves good results. The model performs well but I would like to improve on that, for example by leveraging word / sentence / document embeddings in a neural network. Given the relatively small size of the dataset, I would also like to merge the embeddings with the manually extracted features (the keywords frequencies) in some way. Do you know any way to accomplish this? Should I just concatenate the two vectors (the embeddings and the keyword frequencies)? Is there a more elegant solution? submitted by /u/Calcifer777 [link] [comments]  ( 111 min )
    [D] Common Pitfalls in Machine Learning Projects
    Listing some of the most common pitfalls (both technical and management) based on learnings from executing machine learning projects. Sharing the blog post that goes into some of the examples and what to do about it. 1. Data Leakage 2. Weak variables 3. Spurious variables 4. Making your stakeholders wait forever 5. Don’t promise same performance when your models go live. 6. Include sufficient time for user testing as part of your project plan 7. Not factoring in Data Drifts 8. Not retraining your models often enough 9. Choice of algorithm 10. Not providing Model Interpretation Tools 11. Failing to do a RCA for incorrect predictions. Your models could be missing key business condition changes that may need to get factored in. 12. Doing everything your stakeholder asks you to do without rational. 13. Don’t do everything yourself. Would be great to hear from your personal experiences of project fails and possibly how it could've been avoided. submitted by /u/selva86 [link] [comments]  ( 108 min )
    [D] A thought I had on Yann LeCun's recent paper "A Path Towards Autonomous Machine Intelligence"
    In the paper, Yann seems to theoretically engineer an archietecture for machine intelligence that can learn to "perceive action plans at multiple levels of abstraction, enabling them to reason, predict, and plan at multiple time horizons". In another paper, called Reward is Enough (2021), David Silver makes the claim that "intelligence, and its associated abilities, can be understood as subserving the maximisation of reward", and that "accordingly, reward is enough to drive behaviour that exhibits abilities studied in natural and artificial intelligence, including knowledge, learning, perception, social intelligence, language, and imitation". These ideas seem in conflict. Yann, on one hand, attempts to engineer these abilities into the architecture of the agent. Silver, on the other hand, implies that with enough compute, data, and the correct reward function, concepts such as world models and complex planning arise from sheer optimization. It is my belief that fully end-to-end approaches are the most suitable path forward. Yann's estimation of an architecture which constrains the models learning abilities to a certain format (sort of) pushes against this along with Silvers hypothesis. Anyone else in here have some thoughts on this divergence between Yann and Silver? Can these two ideas coincide in some manner? P.S -- I looked for another thread to post this in but not a lot of people seem to be talking about this paper which is a little disappointing. With that being said, feel to bring up new areas of discussion here. submitted by /u/__dostoevsky__ [link] [comments]  ( 97 min )
    [D] Cannot Understand Output of Bounding Box Regressor
    Hello, I learn to implement Single Shot Detector (SSD) from scratch. However, I cannot understand the output in SSD. SSD tries to predict the prior/anchor box position offset and scale: ​ output of SSD [source: sgrvinod/a-PyTorch-Tutorial-to-Object-Detection: SSD: Single Shot MultiBox Detector | a PyTorch Tutorial to Object Detection (github.com)] My question is that why position offset need to be divided by width/height of prior, why cannot model directly output the "pure" offset ( groundtruth's center_x - prior's center_x ). Another question is that why we take log on scale of prior, I think model can learn to predict "pure" scale ( groundtruth's width / prior's width ). I really confused about this, thank you. submitted by /u/No_Championship142 [link] [comments]  ( 89 min )
  • Open

    Data cleansing for reliable analytics and business intelligence
    According to Forbes, data scientists spend about 80% of their time on data collection, cleansing, and preparation, while only 20% of it is left for actual data analysis. Organizations that don’t utilize master data management systems or data warehouses to keep their data clean and accurate end up basing crucial business decisions on bad data.  … Read More »Data cleansing for reliable analytics and business intelligence The post Data cleansing for reliable analytics and business intelligence appeared first on Data Science Central.  ( 20 min )
  • Open

    Unified Pythagorean Theorem
    A few days ago I wrote that the spherical counterpart of the Pythagorean theorem is cos(c) = cos(a) cos(b) where sides a and b are measured in radians. If we’re on a sphere of radius R and we measure the sides in terms of arc length rather than in radians, the formula becomes cos(c/R) = […] Unified Pythagorean Theorem first appeared on John D. Cook.  ( 5 min )
  • Open

    On the topic of generating humans with DNN's - why don't we just have individual models for breaking down the human body, with each model responsible for a single part? (1 model per finger, 1 model per arm, 1 model for accurately drawing 2 eyes)
    Asking this in a few different places: You wouldn't see a human artist drawing a bunch of random colourful noise, only to slowly turn it into something more meaningful later, one layer at a time? You'd have the human start with the head, or the body, and then they would do the arms and the legs, and each finger individually. So why wouldn't we just have a model responsible for drawing specific anatomical parts? I.e. if we wanted an AI to draw a complete human skeleton, we have a model trained to draw each individual bone - if the bone isn't somewhere visible (based on the main image) then it's simply not drawn. This guarantees 5 fingers on each hand, 2 arms, 2 legs, 1 head, 2 eyes, etc? You could break it down further, as well - one model is trained to appropriately draw belly buttons,…  ( 91 min )

  • Open

    Mobile photo editing app creator Lightricks launches text-to-image generator
    submitted by /u/frongwmwlwi [link] [comments]  ( 87 min )
    what is best free ai for making artwork of fictional characters?
    I wanted to use dalle but it's about 3 months I requested access and It got to nowhwre. I used craiyon but the faces it's make is kinda weird. So I wanna know is there a free ai that creates that kinds of art submitted by /u/mhjhacker1 [link] [comments]  ( 88 min )
    Paintings of Cappadocia, Turkiye by AI (DALL-E)
    submitted by /u/know-nonsense [link] [comments]  ( 87 min )
    Stable Diffusion AI Art Quick Setup with Free Google Colab and Deforum N...
    submitted by /u/prfitofthesngularity [link] [comments]  ( 87 min )
    AI Translates Brain Waves To Photos | Quantum Computing AI Breakthrough | Deep Learning Robotic Arm
    submitted by /u/belindahooper [link] [comments]  ( 87 min )
    Death Plays Rock n' Roll
    I keep playing with AI (https://beta.dreamstudio.ai/dream) ​ https://preview.redd.it/07nslp9383k91.png?width=1024&format=png&auto=webp&s=d3214b54190ac319e52430db0e07b646f465a7ae Cfg Scale: 20 Prompt: Imagine death playing heavy metal on an electric guitar, attention to detail, focus on the skull, cartoon style, realistic, 8K ​ https://preview.redd.it/vwtcoaz983k91.png?width=1024&format=png&auto=webp&s=cea0b24e11b385fd67cdfe4482c25e2e58a8ff55 submitted by /u/Viacheslav_Varenia [link] [comments]  ( 87 min )
    Mark Zuckerberg talks about holograms in JRE
    submitted by /u/cyberboii22 [link] [comments]  ( 91 min )
    A .EXE to run the Stable Diffusion model !
    It was created by #GRisk, the name of the exe is: "Stable Diffusion GRisk GUI 0.1". The page of the creator: https://grisk.itch.io/stable-diffusion-gui The download : https://drive.google.com/file/d/1ic9uCZkE6N3c1164VprB1d3qRljcxPgO/view ​ https://preview.redd.it/j22bhi7if2k91.png?width=867&format=png&auto=webp&s=7fc70f49517126d4a31880de05415dfb3dddfb81 submitted by /u/StantheBrain [link] [comments]  ( 88 min )
    AI Talks. Tek Raj Chettri: How To Start Your AI Career
    Join us for weekly AI talks with some of the brightest minds in AI as we explore the amazing potential of this technology. About this event The event will take place September 1st at 5:15PM CEST at www.twitch.tv/deeplearninglabs. Our guest this week is Tek Raj Chhetri. He is a Ph.D. student and researcher at STI Innsbruck, in the Department of Computer Science at the University of Innsbruck. His focus is on Neural Symbolic AI and enhancing the process of automated decision-making. https://preview.redd.it/y1by6s8xz1k91.png?width=2400&format=png&auto=webp&s=b1e3079b6801e6172ae8cf8f85dd1ed4c399a712 He has a Bachelor's in Computer Science & Engineering from Jawaharlal Nehru Technological University Kakinada, India and a Master's in Computer Science with a specialization in Distributed Systems from the University of Tartu, Estonia. submitted by /u/Viseden [link] [comments]  ( 88 min )
    Create your own image generator!
    Have you tried using DALL-E to create your own art pieces? Or maybe MidJourney? Of course you have. If you haven't maybe you would like to create your own model! For this reason we wrote a short and simple tutorial about building image generators using Google Colab. Check out our newest how to create image generator tutorial on lablab.ai! Have fun! submitted by /u/lablabai [link] [comments]  ( 87 min )
    What was the email database leaked that helped researchers analyse company behaviour and train AI models? (Are there other similar open data bases like this)
    submitted by /u/abittooambitious [link] [comments]  ( 93 min )
    Making a video with Midjourney + GPT-3 and Azure Neural Voice
    submitted by /u/kbf_ [link] [comments]  ( 87 min )
    selling 4 x Dall E accounts with emails
    Just received 4x dall e accounts. Selling with the linked email account as well. All fresh Gmail accounts. dm me submitted by /u/theniwaslost [link] [comments]  ( 91 min )
    Looking for a Chatbot With affiliate program
    Is there any Chatbots for adults with Affiliate programs? I'm talking about a Replika kind of bot. If there is multilangual options that would be great since my site is in Swedish. But English also works. Thanks in advance. submitted by /u/Hjarndoping [link] [comments]  ( 87 min )
  • Open

    What happened to OpenAI's "Gym" documentation?
    I have been working a project for school that uses Gym's reinforcement learning environments and sometime between last week and yesterday the website with all the documentation for gym seems to have disappeared from the internet. Is anyone else experiencing this? Does anyone know what is happening to gym? Edit: I was able to find an older thread on this exact topic just moments after posting this. The answer is that the documentation has moved to a new domain, which I'll add here incase anyone else is out there googling this issue: https://www.gymlibrary.dev/ submitted by /u/MediumBillHaywood [link] [comments]  ( 87 min )
    What happened to OpenAI's "Gym" documentation?
    I have been working a project for school that uses Gym's reinforcement learning environments and sometime between last week and yesterday the website with all the documentation for gym seems to have disappeared from the internet. Is anyone else experiencing this? Does anyone know what is happening to gym? submitted by /u/MediumBillHaywood [link] [comments]  ( 87 min )
    HELM - what’s your take ?
    I was looking for long past horizons models and I came upon https://arxiv.org/pdf/2205.12258.pdf#page11 with open source code. It seems quite legit, however I wonder if it can distill information on longer horizons than LSTM, the paper proved an improvement over LSTM but no clear evidence that it can handle longer past horizons. What’s your take on the subject ? Do you know any competing methods ? I’m not at all convinced by GTrXL. submitted by /u/Jogima-cyber [link] [comments]  ( 87 min )
    "Neural Payoff Machines: Predicting Fair and Stable Payoff Allocations Among Team Members", Cornelisse et al 2022 {DM} (NN approximation of Shapley values)
    submitted by /u/gwern [link] [comments]  ( 87 min )
    "TAP: Efficient Planning in a Compact Latent Action Space", Jiang et al 2022 (VQ-VAE + GPT-2 planning)
    submitted by /u/gwern [link] [comments]  ( 87 min )
    "A Provably Efficient Model-Free Posterior Sampling Method for Episodic Reinforcement Learning", Dann et al 2022
    submitted by /u/gwern [link] [comments]  ( 87 min )
    "Zeus: Understanding and Optimizing GPU Energy Consumption of DNN Training", You et al 2022 (Thompson sampling hyperparameter optimization)
    submitted by /u/gwern [link] [comments]  ( 103 min )
    "Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned", Ganguli et al 2022 (scaling helps RL preference learning)
    submitted by /u/gwern [link] [comments]  ( 88 min )
  • Open

    [R] Interactive Code Generation via Test-Driven User-Intent Formalization - TICODER - Microsoft Research 2022 - Improves the pass@1 code generation accuracy metric from 48.39% to 70.49% with a single user query, and up to 85.48% with up to 5 user queries!
    Paper: https://arxiv.org/abs/2208.05950 Abstract: Pre-trained large language models (LLMs) such as OpenAI Codex have shown immense potential in automating significant aspects of coding by producing natural code from informal natural language (NL) intent. However, the code produced does not have any correctness guarantees around satisfying user's intent. In fact, it is hard to define a notion of correctness since natural language can be ambiguous and lacks a formal semantics. In this paper, we take a first step towards addressing the problem above by proposing the workflow of test-driven user-intent formalization (TDUIF), which leverages lightweight user feedback to jointly (a) formalize the user intent as tests (a partial specification), and (b) generates code that meets the formal use…  ( 90 min )
    [R] Z-Code++: A Pre-trained Language Model Optimized for Abstractive Summarization - Microsoft Research 2022 - Outperforms the 600x larger PaLM-540B on XSum!
    Paper: https://arxiv.org/abs/2208.09770v1 Abstract: This paper presents Z-Code++, a new pre-trained language model optimized for abstractive text summarization. The model extends the state of the art encoder-decoder model using three techniques. First, we use a two-phase pre-training process to improve model's performance on low-resource summarization tasks. The model is first pre-trained using text corpora for language understanding, and then is continually pre-trained on summarization corpora for grounded text generation. Second, we replace self-attention layers in the encoder with disentangled attention layers, where each word is represented using two vectors that encode its content and position, respectively. Third, we use fusion-in-encoder, a simple yet effective method of encoding long sequences in a hierarchical manner. Z-Code++ creates new state of the art on 9 out of 13 text summarization tasks across 5 languages. Our model is parameter-efficient in that it outperforms the 600x larger PaLM-540B on XSum, and the finetuned 200x larger GPT3-175B on SAMSum. In zero-shot and few-shot settings, our model substantially outperforms the competing models. https://preview.redd.it/t9eeideh64k91.jpg?width=1007&format=pjpg&auto=webp&s=b47d0151635c1c330e20d8961babd2f9fef5c1d0 https://preview.redd.it/nmcm4keh64k91.jpg?width=1357&format=pjpg&auto=webp&s=71b4459e3e8cf042ed76c5855fc04e447a10c8bd https://preview.redd.it/dxk0dkeh64k91.jpg?width=1117&format=pjpg&auto=webp&s=db4a23a8b962f7312ddbb83ab001379d2fa07fef submitted by /u/Singularian2501 [link] [comments]  ( 89 min )
    [R] Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks - Microsoft 2022 - Many SOTA results spanning vision and vision-language tasks / Multimodal foundation model.
    Paper: https://arxiv.org/abs/2208.10442 Github: https://github.com/microsoft/unilm/tree/master/beit Code will be released here. I only found the link on paperswithcode.com ! Abstract: A big convergence of language, vision, and multimodal pretraining is emerging. In this work, we introduce a general-purpose multimodal foundation model BEiT-3, which achieves state-of-the-art transfer performance on both vision and vision-language tasks. Specifically, we advance the big convergence from three aspects: backbone architecture, pretraining task, and model scaling up. We introduce Multiway Transformers for general-purpose modeling, where the modular architecture enables both deep fusion and modality-specific encoding. Based on the shared backbone, we perform masked "language" modeling on images (Imglish), texts (English), and image-text pairs ("parallel sentences") in a unified manner. Experimental results show that BEiT-3 obtains state-of-the-art performance on object detection (COCO), semantic segmentation (ADE20K), image classification (ImageNet), visual reasoning (NLVR2), visual question answering (VQAv2), image captioning (COCO), and cross-modal retrieval (Flickr30K, COCO). https://preview.redd.it/7mw0plphv3k91.jpg?width=989&format=pjpg&auto=webp&s=546e42c0a77eb11dc3289160caf595936f5927e5 https://preview.redd.it/wd5mdsphv3k91.jpg?width=1006&format=pjpg&auto=webp&s=f659ea989510b73a75be59f88ac937641fe062f9 https://preview.redd.it/h6eanrphv3k91.jpg?width=1195&format=pjpg&auto=webp&s=0e7d786c955fc7f8b903d055ddc074a94462280f https://preview.redd.it/zh46psphv3k91.jpg?width=1216&format=pjpg&auto=webp&s=0d0314e8f191939fe2333aab2ac425f3ad077bb3 submitted by /u/Singularian2501 [link] [comments]  ( 89 min )
    [R] Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned - Anthropic - Ganguli et al 2022
    Paper: https://www.anthropic.com/red_teaming.pdf Github: https://github.com/anthropics/hh-rlhf Abstract: We describe our early efforts to red team language models in order to simultaneously discover, measure, and attempt to reduce their potentially harmful outputs. We make three main contributions. First, we investigate scaling behaviors for red teaming across 3 model sizes (2.7B, 13B, and 52B parameters) and 4 model types: a plain language model (LM); an LM prompted to be helpful, honest, and harmless; an LM with rejection sampling; and a model trained to be helpful and harmless using reinforcement learning from human feedback (RLHF). We find that the RLHF models are increasingly difficult to red team as they scale, and we find a flat trend with scale for the other model types. Second, we release our dataset of 38,961 red team attacks for others to analyze and learn from. We provide our own analysis of the data and find a variety of harmful outputs, which range from offensive language to more subtly harmful non-violent unethical outputs. Third, we exhaustively describe our instructions, processes, statistical methodologies, and uncertainty about red teaming. We hope that this transparency accelerates our ability to work together as a community in order to develop shared norms, practices, and technical standards for how to red team language models. Warning: this paper contains examples that may be offensive or upsetting! https://preview.redd.it/kubpwo64t3k91.jpg?width=1474&format=pjpg&auto=webp&s=ef3ab230250a7b5147bc589a22a5ad5b7504b217 https://preview.redd.it/o3ytit64t3k91.jpg?width=1485&format=pjpg&auto=webp&s=1ce85178d8d7b77ac42d6e3b3701a8688a37c78b https://preview.redd.it/hagjlo64t3k91.jpg?width=950&format=pjpg&auto=webp&s=a799cf65ea43068db48cda9f3b3f341212474f04 submitted by /u/Singularian2501 [link] [comments]  ( 90 min )
    [D] Text to Image startup is using Stable Diffusion without giving credit.
    Photosonic is a popular image generation tool by a YC backed startup Writesonic. Photosonic is getting pouplar on product hunt, in fact it got more upvotes than stability.ai’s Stable Diffusion Dream Studio. [link] Looks like it is simply using the open source Stable Diffusion model without giving any credit. Photosonic was launched just 3-4 days after the open source release of Stable Diffusion. Compared on random text inputs, it generates very similar images to Stable Diffusion. This is not definite proof, but It’s highly likely it’s the same model because they produce similar style images on gibberish input prompts. Examples attached. Source of original post : https://twitter.com/divamgupta/status/1563220590517186562 https://preview.redd.it/2xb5s0bbk3k91.png?width=1135&format=png&auto=webp&s=4f447ca20af1908a69daf57e40d56b7ee11ce506 https://preview.redd.it/fs5jl1bbk3k91.png?width=1641&format=png&auto=webp&s=85b11a49165624dcd887dd3af2ebf60d8eed9dbc https://preview.redd.it/x8lqt3bbk3k91.png?width=1244&format=png&auto=webp&s=4b7f5788a9e1deb092c80a4762a5c367121fe47e https://preview.redd.it/ugx011bbk3k91.png?width=1456&format=png&auto=webp&s=9f415ad6dfab18ad5eca6f5a1af7a1fbe998994d https://preview.redd.it/wnqhi3bbk3k91.png?width=1668&format=png&auto=webp&s=8c0f7c788c6342deb55478e7846ad8c8c412f123 https://preview.redd.it/wvh8f6bbk3k91.png?width=1493&format=png&auto=webp&s=964f6a3ef3f0dae2dcf08891c35a0e1f5ec7a273 ​ submitted by /u/After-Plastic-4777 [link] [comments]  ( 105 min )
    [R] Understanding Diffusion Models: A Unified Perspective
    submitted by /u/hardmaru [link] [comments]  ( 88 min )
    [D] Generalizable ways to encode entire graphs?
    I've been learning a bit about graph neural networks and from what I've seen, most focus on node-level or edge-level embeddings. Is there any way to encode entire graphs as fixed-size vectors? One that I've seen is graph2vec, but that seems to work similarly to doc2vec and is not suitable for encoding unseen graphs. submitted by /u/FlyingAmigos [link] [comments]  ( 104 min )
    [D] All regressions techniques bundled under a single method
    Generalized linear regression, time series, curve fitting, logistic regression, you name it. And more, like unsupervised clustering, under the same umbrella. What if there was one method to handle them all? Why learn 50 types of regressions when you can solve your problems with one simple generic version that covers all of them and more? I tried to achieve this goal and looking for feedback on my new article on this topic, published here. ​ Fitting an ellipse based on training set distributed around some arc: 250 experiments (one per video frame) submitted by /u/MLRecipes [link] [comments]  ( 88 min )
    Variable length input to neural network not text data [D]
    I have data that does not represent text data but it does have variable length. it is structured like so: [[(100 , 5), (200,5)], [(300,10)], [(500,3),(200,4),(600,3)]] It is a 2D array, each row is an array of tuples. I want to feed each row into a neural network but I am not sure the best way to do this. It seems through my initial research that simply zero-padding would be an option. or zero padding followed by an embedding layer. However, each row is not that long. I think the maximum length of a row in my data set is 6. All suggestions appreciated. submitted by /u/NameError-undefined [link] [comments]  ( 89 min )
    [R] DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
    Paper: https://arxiv.org/abs/2208.12242 Website: https://dreambooth.github.io/ ​ https://preview.redd.it/u642d28sr2k91.png?width=1650&format=png&auto=webp&s=ee53b879705c7e792fe326e9a7e2bde40a303e04 Abstract Large text-to-image models achieved a remarkable leap in the evolution of AI, enabling high-quality and diverse synthesis of images from a given text prompt. However, these models lack the ability to mimic the appearance of subjects in a given reference set and synthesize novel renditions of them in different contexts. In this work, we present a new approach for ``personalization'' of text-to-image diffusion models (specializing them to users' needs). Given as input just a few images of a subject, we fine-tune a pretrained text-to-image model (Imagen, although our method is not limited to a specific model) such that it learns to bind a unique identifier with that specific subject. Once the subject is embedded in the output domain of the model, the unique identifier can then be used to synthesize fully-novel photorealistic images of the subject contextualized in different scenes. By leveraging the semantic prior embedded in the model with a new autogenous class-specific prior preservation loss, our technique enables synthesizing the subject in diverse scenes, poses, views, and lighting conditions that do not appear in the reference images. We apply our technique to several previously-unassailable tasks, including subject recontextualization, text-guided view synthesis, appearance modification, and artistic rendering (all while preserving the subject's key features). submitted by /u/FlavoredQuark [link] [comments]  ( 93 min )
    [D] Real-World Applications of GAN
    I am working on learning milestone papers in the field of GAN by implementing them in PyTorch. This got me wondering what are some real world applications of GAN. submitted by /u/ghost_agni [link] [comments]  ( 90 min )
    %papers published by industry vs. academia [D]
    I'm writing a policy whitepaper about AI/ML research, and I'm trying to track down figures on the % of papers published by industry vs. academia. Georgetown's PARAT (Private-sector AI-Related Activity Tracker) publishes good data on AI publications, conferences pubs, and patents from the private sector — but does anyone have stats on how this compares to the total volume of papers coming out of academia? submitted by /u/ejmajor [link] [comments]  ( 91 min )
    [D] Does gradient accumulation achieve anything different than just using a smaller batch with a lower learning rate?
    I'm trying to understand the practical justification for gradient accumulation (ie. Running with an effectively larger batch size by summing gradients from smaller batches). Can't you achieve practically the same effect by lowering the learning rate and just running with smaller batches? Is there a theoretical reason why this is better than just small batch training? submitted by /u/WigglyHypersurface [link] [comments]  ( 93 min )
  • Open

    Run image segmentation with Amazon SageMaker JumpStart
    In December 2020, AWS announced the general availability of Amazon SageMaker JumpStart, a capability of Amazon SageMaker that helps you quickly and easily get started with machine learning (ML). JumpStart provides one-click fine-tuning and deployment of a wide variety of pre-trained models across popular ML tasks, as well as a selection of end-to-end solutions that […]  ( 10 min )
    Process mortgage documents with intelligent document processing using Amazon Textract and Amazon Comprehend
    Organizations in the lending and mortgage industry process thousands of documents on a daily basis. From a new mortgage application to mortgage refinance, these business processes involve hundreds of documents per application. There is limited automation available today to process and extract information from all the documents, especially due to varying formats and layouts. Due […]  ( 12 min )
  • Open

    AI Translates Brain Waves To Photos | Quantum Computing AI Breakthrough | Deep Learning Robotic Arm
    submitted by /u/belindahooper [link] [comments]  ( 87 min )
    Quick overview of methods used to handle overfitting in Shallow and Deep Neural Network
    submitted by /u/JoshuaDaD [link] [comments]  ( 87 min )
    Covid Detection and Diagnosis model
    Data Science ,ML,DL: Covid-MANet: Multi-task attention network for explainable diagnosis and severity assessment of COVID-19 from CXR images Tasks: Lung Segmentation, Pneumonia Classification, Severity quantification and Covid-19 pneumonia region segmentation, deep learning models, multi-scale feature fusion. Paper link: free download https://www.sciencedirect.com/science/article/pii/S0031320322003077 Image enhancement techniques on deep learning approaches for automated diagnosis of COVID-19 features using CXR images | SpringerLink Useful ideas : Medical Image Analysis, Convolutional neural networks, Transfer Learning, Image preprocessing , Class Imbalance https://link.springer.com/article/10.1007/s11042-022-13486-8 submitted by /u/ajaynice199 [link] [comments]  ( 87 min )
  • Open

    Top 10 Benefits of Digital Transformation Adoption
    Every business wants to be competitive. But most organizations ignore digital transformation. Who can blame them? Digital transformation (DX) is not simple. In fact, it is a challenging process.  It generally involves a number of online steps. You may also have to deal with continuous employee pushback. But that does not mean, ignoring digital transformation.… Read More »Top 10 Benefits of Digital Transformation Adoption The post Top 10 Benefits of Digital Transformation Adoption appeared first on Data Science Central.  ( 20 min )
  • Open

    Can Brownian motion do work?
    According to the latest episode of Eclectic Tech, Richard Feynman argued that Brownian motion cannot do work, but researchers at the University of Arkansas have demonstrated that it can by generating an electric current from Brownian motion in a sheet of graphene. You can read more in the physics journal article by the researchers. Unfortunately […] Can Brownian motion do work? first appeared on John D. Cook.  ( 4 min )
  • Open

    BIP: Boost Invariant Polynomials for Efficient Jet Tagging. (arXiv:2207.08272v2 [physics.comp-ph] UPDATED)
    Deep Learning approaches are becoming the go-to methods for data analysis in High Energy Physics (HEP). Nonetheless, most physics-inspired modern architectures are computationally inefficient and lack interpretability. This is especially the case with jet tagging algorithms, where computational efficiency is crucial considering the large amounts of data produced by modern particle detectors. In this work, we present a novel, versatile and transparent framework for jet representation; invariant to Lorentz group boosts, which achieves high accuracy on jet tagging benchmarks while being orders of magnitudes faster to train and evaluate than other modern approaches for both supervised and unsupervised schemes.
    Efficient Truncated Linear Regression with Unknown Noise Variance. (arXiv:2208.12042v1 [stat.ME])
    Truncated linear regression is a classical challenge in Statistics, wherein a label, $y = w^T x + \varepsilon$, and its corresponding feature vector, $x \in \mathbb{R}^k$, are only observed if the label falls in some subset $S \subseteq \mathbb{R}$; otherwise the existence of the pair $(x, y)$ is hidden from observation. Linear regression with truncated observations has remained a challenge, in its general form, since the early works of~\citet{tobin1958estimation,amemiya1973regression}. When the distribution of the error is normal with known variance, recent work of~\citet{daskalakis2019truncatedregression} provides computationally and statistically efficient estimators of the linear model, $w$. In this paper, we provide the first computationally and statistically efficient estimators for truncated linear regression when the noise variance is unknown, estimating both the linear model and the variance of the noise. Our estimator is based on an efficient implementation of Projected Stochastic Gradient Descent on the negative log-likelihood of the truncated sample. Importantly, we show that the error of our estimates is asymptotically normal, and we use this to provide explicit confidence regions for our estimates.
    Graph Contrastive Learning for Anomaly Detection. (arXiv:2108.07516v2 [cs.LG] UPDATED)
    Graph-based anomaly detection has been widely used for detecting malicious activities in real-world applications. Existing attempts to address this problem have thus far focused on structural feature engineering or learning in the binary classification regime. In this work, we propose to leverage graph contrastive coding and present the supervised GCCAD model for contrasting abnormal nodes with normal ones in terms of their distances to the global context (e.g., the average of all nodes). To handle scenarios with scarce labels, we further enable GCCAD as a self-supervised framework by designing a graph corrupting strategy for generating synthetic node labels. To achieve the contrastive objective, we design a graph neural network encoder that can infer and further remove suspicious links during message passing, as well as learn the global context of the input graph. We conduct extensive experiments on four public datasets, demonstrating that 1) GCCAD significantly and consistently outperforms various advanced baselines and 2) its self-supervised version without fine-tuning can achieve comparable performance with its fully supervised version.
    Learning Lattice Quantum Field Theories with Equivariant Continuous Flows. (arXiv:2207.00283v2 [hep-lat] UPDATED)
    We propose a novel machine learning method for sampling from the high-dimensional probability distributions of Lattice Quantum Field Theories. Instead of the deep architectures used so far for this task, our proposal is based on a single neural ODE layer and incorporates the full symmetries of the problem. We test our model on the $\phi^4$ theory, showing that it systematically outperforms previously proposed flow-based methods in sampling efficiency, and the improvement is especially pronounced for larger lattices. Compared to the previous baseline model, we improve a key metric, the effective sample size, from 1% to 91% on a lattice of size $32\times 32$. We also demonstrate that our model can successfully learn a continuous family of theories at once, and the results of learning can be transferred to larger lattices. Such generalization capacities further accentuate the potential advantages of machine learning methods compared to traditional MCMC-based methods.
    Online Learning via Offline Greedy Algorithms: Applications in Market Design and Optimization. (arXiv:2102.11050v2 [cs.LG] UPDATED)
    Motivated by online decision-making in time-varying combinatorial environments, we study the problem of transforming offline algorithms to their online counterparts. We focus on offline combinatorial problems that are amenable to a constant factor approximation using a greedy algorithm that is robust to local errors. For such problems, we provide a general framework that efficiently transforms offline robust greedy algorithms to online ones using Blackwell approachability. We show that the resulting online algorithms have $O(\sqrt{T})$ (approximate) regret under the full information setting. We further introduce a bandit extension of Blackwell approachability that we call Bandit Blackwell approachability. We leverage this notion to transform greedy robust offline algorithms into a $O(T^{2/3})$ (approximate) regret in the bandit setting. Demonstrating the flexibility of our framework, we apply our offline-to-online transformation to several problems at the intersection of revenue management, market design, and online optimization, including product ranking optimization in online platforms, reserve price optimization in auctions, and submodular maximization. We also extend our reduction to greedy-like first order methods used in continuous optimization, such as those used for maximizing continuous strong DR monotone submodular functions subject to convex constraints. We show that our transformation, when applied to these applications, leads to new regret bounds or improves the current known bounds. We complement our theoretical studies by conducting numerical simulations for two of our applications, in both of which we observe that the numerical performance of our transformations outperforms the theoretical guarantees in practical instances.
    Assesment of material layers in building walls using GeoRadar. (arXiv:2208.12064v1 [cs.LG])
    Assessing the structure of a building with non-invasive methods is an important problem. One of the possible approaches is to use GeoRadar to examine wall structures by analyzing the data obtained from the scans. We propose a data-driven approach to evaluate the material composition of a wall from its GPR radargrams. In order to generate training data, we use gprMax to model the scanning process. Using simulation data, we use a convolutional neural network to predict the thicknesses and dielectric properties of walls per layer. We evaluate the generalization abilities of the trained model on data collected from real buildings.
    Calibrated Selective Classification. (arXiv:2208.12084v1 [cs.LG])
    Selective classification allows models to abstain from making predictions (e.g., say "I don't know") when in doubt in order to obtain better effective accuracy. While typical selective models can be effective at producing more accurate predictions on average, they may still allow for wrong predictions that have high confidence, or skip correct predictions that have low confidence. Providing calibrated uncertainty estimates alongside predictions -- probabilities that correspond to true frequencies -- can be as important as having predictions that are simply accurate on average. However, uncertainty estimates can be unreliable for certain inputs. In this paper, we develop a new approach to selective classification in which we propose a method for rejecting examples with "uncertain" uncertainties. By doing so, we aim to make predictions with {well-calibrated} uncertainty estimates over the distribution of accepted examples, a property we call selective calibration. We present a framework for learning selectively calibrated models, where a separate selector network is trained to improve the selective calibration error of a given base model. In particular, our work focuses on achieving robust calibration, where the model is intentionally designed to be tested on out-of-domain data. We achieve this through a training strategy inspired by distributionally robust optimization, in which we apply simulated input perturbations to the known, in-domain training data. We demonstrate the empirical effectiveness of our approach on multiple image classification and lung cancer risk assessment tasks.
    A derivation of variational message passing (VMP) for latent Dirichlet allocation (LDA). (arXiv:2111.01480v2 [cs.LG] UPDATED)
    Latent Dirichlet Allocation (LDA) is a probabilistic model used to uncover latent topics in a corpus of documents. Inference is often performed using variational Bayes (VB) algorithms, which calculate a lower bound to the posterior distribution over the parameters. Deriving the variational update equations for new models requires considerable manual effort; variational message passing (VMP) has emerged as a "black-box" tool to expedite the process of variational inference. But applying VMP in practice still presents subtle challenges, and the existing literature does not contain the steps that are necessary to implement VMP for the standard smoothed LDA model, nor are available black-box probabilistic graphical modelling software able to do the word-topic updates necessary to implement LDA. In this paper, we therefore present a detailed derivation of the VMP update equations for LDA. We see this as a first step to enabling other researchers to calculate the VMP updates for similar graphical models.
    ECOD: Unsupervised Outlier Detection Using Empirical Cumulative Distribution Functions. (arXiv:2201.00382v3 [cs.LG] UPDATED)
    Outlier detection refers to the identification of data points that deviate from a general data distribution. Existing unsupervised approaches often suffer from high computational cost, complex hyperparameter tuning, and limited interpretability, especially when working with large, high-dimensional datasets. To address these issues, we present a simple yet effective algorithm called ECOD (Empirical-Cumulative-distribution-based Outlier Detection), which is inspired by the fact that outliers are often the "rare events" that appear in the tails of a distribution. In a nutshell, ECOD first estimates the underlying distribution of the input data in a nonparametric fashion by computing the empirical cumulative distribution per dimension of the data. ECOD then uses these empirical distributions to estimate tail probabilities per dimension for each data point. Finally, ECOD computes an outlier score of each data point by aggregating estimated tail probabilities across dimensions. Our contributions are as follows: (1) we propose a novel outlier detection method called ECOD, which is both parameter-free and easy to interpret; (2) we perform extensive experiments on 30 benchmark datasets, where we find that ECOD outperforms 11 state-of-the-art baselines in terms of accuracy, efficiency, and scalability; and (3) we release an easy-to-use and scalable (with distributed support) Python implementation for accessibility and reproducibility.
    A deep learning framework for geodesics under spherical Wasserstein-Fisher-Rao metric and its application for weighted sample generation. (arXiv:2208.12145v1 [cs.LG])
    Wasserstein-Fisher-Rao (WFR) distance is a family of metrics to gauge the discrepancy of two Radon measures, which takes into account both transportation and weight change. Spherical WFR distance is a projected version of WFR distance for probability measures so that the space of Radon measures equipped with WFR can be viewed as metric cone over the space of probability measures with spherical WFR. Compared to the case for Wasserstein distance, the understanding of geodesics under the spherical WFR is less clear and still an ongoing research focus. In this paper, we develop a deep learning framework to compute the geodesics under the spherical WFR metric, and the learned geodesics can be adopted to generate weighted samples. Our approach is based on a Benamou-Brenier type dynamic formulation for spherical WFR. To overcome the difficulty in enforcing the boundary constraint brought by the weight change, a Kullback-Leibler (KL) divergence term based on the inverse map is introduced into the cost function. Moreover, a new regularization term using the particle velocity is introduced as a substitute for the Hamilton-Jacobi equation for the potential in dynamic formula. When used for sample generation, our framework can be beneficial for applications with given weighted samples, especially in the Bayesian inference, compared to sample generation with previous flow models.
    Turning Mathematics Problems into Games: Reinforcement Learning and Gr\"obner bases together solve Integer Feasibility Problems. (arXiv:2208.12191v1 [cs.LG])
    Can agents be trained to answer difficult mathematical questions by playing a game? We consider the integer feasibility problem, a challenge of deciding whether a system of linear equations and inequalities has a solution with integer values. This is a famous NP-complete problem with applications in many areas of Mathematics and Computer Science. Our paper describes a novel algebraic reinforcement learning framework that allows an agent to play a game equivalent to the integer feasibility problem. We explain how to transform the integer feasibility problem into a game over a set of arrays with fixed margin sums. The game starts with an initial state (an array), and by applying a legal move that leaves the margins unchanged, we aim to eventually reach a winning state with zeros in specific positions. To win the game the player must find a path between the initial state and a final terminal winning state if one exists. Finding such a winning state is equivalent to solving the integer feasibility problem. The key algebraic ingredient is a Gr\"obner basis of the toric ideal for the underlying axial transportation polyhedron. The Gr\"obner basis can be seen as a set of connecting moves (actions) of the game. We then propose a novel RL approach that trains an agent to predict moves in continuous space to cope with the large size of action space. The continuous move is then projected onto the set of legal moves so that the path always leads to valid states. As a proof of concept we demonstrate in experiments that our agent can play well the simplest version of our game for 2-way tables. Our work highlights the potential to train agents to solve non-trivial mathematical queries through contemporary machine learning methods used to train agents to play games.
    Skin Lesion Analysis: A State-of-the-Art Survey, Systematic Review, and Future Trends. (arXiv:2208.12232v1 [eess.IV])
    The Computer-aided Diagnosis (CAD) system for skin lesion analysis is an emerging field of research that has the potential to relieve the burden and cost of skin cancer screening. Researchers have recently indicated increasing interest in developing such CAD systems, with the intention of providing a user-friendly tool to dermatologists in order to reduce the challenges that are raised by manual inspection. The purpose of this article is to provide a complete literature review of cutting-edge CAD techniques published between 2011 and 2020. The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) method was used to identify a total of 365 publications, 221 for skin lesion segmentation and 144 for skin lesion classification. These articles are analyzed and summarized in a number of different ways so that we can contribute vital information about the methods for the evolution of CAD systems. These ways include: relevant and essential definitions and theories, input data (datasets utilization, preprocessing, augmentations, and fixing imbalance problems), method configuration (techniques, architectures, module frameworks, and losses), training tactics (hyperparameter settings), and evaluation criteria (metrics). We also intend to investigate a variety of performance-enhancing methods, including ensemble and post-processing. In addition, in this survey, we highlight the primary problems associated with evaluating skin lesion segmentation and classification systems using minimal datasets, as well as the potential solutions to these plights. In conclusion, enlightening findings, recommendations, and trends are discussed for the purpose of future research surveillance in related fields of interest. It is foreseen that it will guide researchers of all levels, from beginners to experts, in the process of developing an automated and robust CAD system for skin lesion analysis.
    Rail break and derailment prediction using Probabilistic Graphical Modelling. (arXiv:2208.11940v1 [cs.LG])
    Rail breaks are one of the most common causes of derailments internationally. This is no different for the South African Iron Ore line. Many rail breaks occur as a heavy-haul train passes over a crack, large defect or defective weld. In such cases, it is usually too late for the train to slow down in time to prevent a de-railment. Knowing the risk of a rail break occurring associated with a train passing over a section of rail allows for better implementation of maintenance initiatives and mitigating measures. In this paper the Ore Line's specific challenges are discussed and the currently available data that can be used to create a rail break risk prediction model is reviewed. The development of a basic rail break risk prediction model for the Ore Line is then presented. Finally the insight gained from the model is demonstrated by means of discussing various scenarios of various rail break risk. In future work, we are planning on extending this basic model to allow input from live monitoring systems such as the ultrasonic broken rail detection system.
    Sustaining Fairness via Incremental Learning. (arXiv:2208.12212v1 [cs.LG])
    Machine learning systems are often deployed for making critical decisions like credit lending, hiring, etc. While making decisions, such systems often encode the user's demographic information (like gender, age) in their intermediate representations. This can lead to decisions that are biased towards specific demographics. Prior work has focused on debiasing intermediate representations to ensure fair decisions. However, these approaches fail to remain fair with changes in the task or demographic distribution. To ensure fairness in the wild, it is important for a system to adapt to such changes as it accesses new data in an incremental fashion. In this work, we propose to address this issue by introducing the problem of learning fair representations in an incremental learning setting. To this end, we present Fairness-aware Incremental Representation Learning (FaIRL), a representation learning system that can sustain fairness while incrementally learning new tasks. FaIRL is able to achieve fairness and learn new tasks by controlling the rate-distortion function of the learned representations. Our empirical evaluations show that FaIRL is able to make fair decisions while achieving high performance on the target task, outperforming several baselines.
    Lifelong Learning for Neural powered Mixed Integer Programming. (arXiv:2208.12226v1 [math.OC])
    Mixed Integer programs (MIPs) are typically solved by the Branch-and-Bound algorithm. Recently, Learning to imitate fast approximations of the expert strong branching heuristic has gained attention due to its success in reducing the running time for solving MIPs. However, existing learning-to-branch methods assume that the entire training data is available in a single session of training. This assumption is often not true, and if the training data is supplied in continual fashion over time, existing techniques suffer from catastrophic forgetting. In this work, we study the hitherto unexplored paradigm of Lifelong Learning to Branch on Mixed Integer Programs. To mitigate catastrophic forgetting, we propose LIMIP, which is powered by the idea of modeling an MIP instance in the form of a bipartite graph, which we map to an embedding space using a bipartite Graph Attention Network. This rich embedding space avoids catastrophic forgetting through the application of knowledge distillation and elastic weight consolidation, wherein we learn the parameters key towards retaining efficacy and are therefore protected from significant drift. We evaluate LIMIP on a series of NP-hard problems and establish that in comparison to existing baselines, LIMIP is up to 50% better when confronted with lifelong learning.
    Learning to Prune Instances of Steiner Tree Problem in Graphs. (arXiv:2208.11985v1 [cs.DS])
    We consider the Steiner tree problem on graphs where we are given a set of nodes and the goal is to find a tree sub-graph of minimum weight that contains all nodes in the given set, potentially including additional nodes. This is a classical NP-hard combinatorial optimisation problem. In recent years, a machine learning framework called learning-to-prune has been successfully used for solving a diverse range of combinatorial optimisation problems. In this paper, we use this learning framework on the Steiner tree problem and show that even on this problem, the learning-to-prune framework results in computing near-optimal solutions at a fraction of the time required by commercial ILP solvers. Our results underscore the potential of the learning-to-prune framework in solving various combinatorial optimisation problems.
    Scaleformer: Iterative Multi-scale Refining Transformers for Time Series Forecasting. (arXiv:2206.04038v2 [cs.LG] UPDATED)
    The performance of time series forecasting has recently been greatly improved by the introduction of transformers. In this paper, we propose a general multi-scale framework that can be applied to state-of-the-art transformer-based time series forecasting models including Autoformer and Informer. Using iteratively refining a forecasted time series at multiple scales with shared weights, architecture adaptations and a specially-designed normalization scheme, we are able to achieve significant performance improvements with minimal additional computational overhead. Via detailed ablation studies, we demonstrate the effectiveness of our proposed architectural and methodological innovations. Furthermore, our experiments on four public datasets show that the proposed multi-scale framework outperforms the corresponding baselines with an average improvement of 13% and 38% over Autoformer and Informer, respectively.
    Dual Diffusion Implicit Bridges for Image-to-Image Translation. (arXiv:2203.08382v2 [cs.CV] UPDATED)
    Common image-to-image translation methods rely on joint training over data from both source and target domains. This prevents the training process from preserving privacy of domain data (e.g., in a federated setting), and often means that a new model has to be trained for a new pair of domains. We present Dual Diffusion Implicit Bridges (DDIBs), an image translation method based on diffusion models, that circumvents training on domain pairs. Image translation with DDIBs relies on two diffusion models trained independently on each domain, and is a two-step process: DDIBs first obtain latent encodings for source images with the source diffusion model, and then decode such encodings using the target model to construct target images. Both steps are defined via an ODE, thus the process is cycle consistent only up to discretization errors of the ODE solvers. Theoretically, we interpret DDIBs as concatenation of source to latent, and latent to target Schr\"odinger Bridges, a form of entropy-regularized optimal transport, to explain the efficacy of the method. Experimentally, we apply DDIBs on both synthetic and high-resolution image datasets, to demonstrate their utility in a wide variety of translation tasks and their connections to existing optimal transport methods.
    The Informativeness of K -Means for Learning Mixture Models. (arXiv:1703.10534v4 [stat.ML] UPDATED)
    The learning of mixture models can be viewed as a clustering problem. Indeed, given data samples independently generated from a mixture of distributions, we often would like to find the {\it correct target clustering} of the samples according to which component distribution they were generated from. For a clustering problem, practitioners often choose to use the simple $k$-means algorithm. $k$-means attempts to find an {\it optimal clustering} that minimizes the sum-of-squares distance between each point and its cluster center. In this paper, we consider fundamental (i.e., information-theoretic) limits of the solutions (clusterings) obtained by optimizing the sum-of-squares distance. In particular, we provide sufficient conditions for the closeness of any optimal clustering and the correct target clustering assuming that the data samples are generated from a mixture of spherical Gaussian distributions. We also generalize our results to log-concave distributions. Moreover, we show that under similar or even weaker conditions on the mixture model, any optimal clustering for the samples with reduced dimensionality is also close to the correct target clustering. These results provide intuition for the informativeness of $k$-means (with and without dimensionality reduction) as an algorithm for learning mixture models.
    A Survey on Temporal Graph Representation Learning and Generative Modeling. (arXiv:2208.12126v1 [cs.LG])
    Temporal graphs represent the dynamic relationships among entities and occur in many real life application like social networks, e commerce, communication, road networks, biological systems, and many more. They necessitate research beyond the work related to static graphs in terms of their generative modeling and representation learning. In this survey, we comprehensively review the neural time dependent graph representation learning and generative modeling approaches proposed in recent times for handling temporal graphs. Finally, we identify the weaknesses of existing approaches and discuss the research proposal of our recently published paper TIGGER[24].
    Self-Adaptive Forecasting for Improved Deep Learning on Non-Stationary Time-Series. (arXiv:2202.02403v2 [cs.LG] UPDATED)
    Real-world time-series datasets often violate the assumptions of standard supervised learning for forecasting -- their distributions evolve over time, rendering the conventional training and model selection procedures suboptimal. In this paper, we propose a novel method, Self-Adaptive Forecasting (SAF), to modify the training of time-series forecasting models to improve their performance on forecasting tasks with such non-stationary time-series data. SAF integrates a self-adaptation stage prior to forecasting based on `backcasting', i.e. predicting masked inputs backward in time. This is a form of test-time training that creates a self-supervised learning problem on test samples before performing the prediction task. In this way, our method enables efficient adaptation of encoded representations to evolving distributions, leading to superior generalization. SAF can be integrated with any canonical encoder-decoder based time-series architecture such as recurrent neural networks or attention-based architectures. On synthetic and real-world datasets in domains where time-series data are known to be notoriously non-stationary, such as healthcare and finance, we demonstrate a significant benefit of SAF in improving forecasting accuracy.
    How to Learn and Represent Abstractions: An Investigation using Symbolic Alchemy. (arXiv:2112.08360v2 [cs.LG] UPDATED)
    Alchemy is a new meta-learning environment rich enough to contain interesting abstractions, yet simple enough to make fine-grained analysis tractable. Further, Alchemy provides an optional symbolic interface that enables meta-RL research without a large compute budget. In this work, we take the first steps toward using Symbolic Alchemy to identify design choices that enable deep-RL agents to learn various types of abstraction. Then, using a variety of behavioral and introspective analyses we investigate how our trained agents use and represent abstract task variables, and find intriguing connections to the neuroscience of abstraction. We conclude by discussing the next steps for using meta-RL and Alchemy to better understand the representation of abstract variables in the brain.
    OMB-Py: Python Micro-Benchmarks for Evaluating Performance of MPI Libraries on HPC Systems. (arXiv:2110.10659v2 [cs.DC] UPDATED)
    Python has become a dominant programming language for emerging areas like Machine Learning (ML), Deep Learning (DL), and Data Science (DS). An attractive feature of Python is that it provides easy-to-use programming interface while allowing library developers to enhance performance of their applications by harnessing the computing power offered by High Performance Computing (HPC) platforms. Efficient communication is key to scaling applications on parallel systems, which is typically enabled by the Message Passing Interface (MPI) standard and compliant libraries on HPC hardware. mpi4py is a Python-based communication library that provides an MPI-like interface for Python applications allowing application developers to utilize parallel processing elements including GPUs. However, there is currently no benchmark suite to evaluate communication performance of mpi4py -- and Python MPI codes in general -- on modern HPC systems. In order to bridge this gap, we propose OMB-Py -- Python extensions to the open-source OSU Micro-Benchmark (OMB) suite -- aimed to evaluate communication performance of MPI-based parallel applications in Python. To the best of our knowledge, OMB-Py is the first communication benchmark suite for parallel Python applications. OMB-Py consists of a variety of point-to-point and collective communication benchmark tests that are implemented for a range of popular Python libraries including NumPy, CuPy, Numba, and PyCUDA. Our evaluation reveals that mpi4py introduces a small overhead when compared to native MPI libraries. We plan to publicly release OMB-Py to benefit the Python HPC community.
    Image Based Food Energy Estimation With Depth Domain Adaptation. (arXiv:2208.12153v1 [cs.CV])
    Assessment of dietary intake has primarily relied on self-report instruments, which are prone to measurement errors. Dietary assessment methods have increasingly incorporated technological advances particularly mobile, image based approaches to address some of these limitations and further automation. Mobile, image-based methods can reduce user burden and bias by automatically estimating dietary intake from eating occasion images that are captured by mobile devices. In this paper, we propose an "Energy Density Map" which is a pixel-to-pixel mapping from the RGB image to the energy density of the food. We then incorporate the "Energy Density Map" with an associated depth map that is captured by a depth sensor to estimate the food energy. The proposed method is evaluated on the Nutrition5k dataset. Experimental results show improved results compared to baseline methods with an average error of 13.29 kCal and an average percentage error of 13.57% between the ground-truth and the estimated energy of the food.
    Online Influence Maximization under the Independent Cascade Model with Node-Level Feedback. (arXiv:2109.06077v3 [cs.SI] UPDATED)
    We study the online influence maximization (OIM) problem in social networks, where the learner repeatedly chooses seed nodes to generate cascades, observes the cascade feedback, and gradually learns the best seeds that generate the largest cascade in multiple rounds. In the demand of the real world, we work with node-level feedback instead of the common edge-level feedback in the literature. The edge-level feedback reveals all edges that pass through information in a cascade, whereas the node-level feedback only reveals the activated nodes with timestamps. The node-level feedback is arguably more realistic since in practice it is relatively easy to observe who is influenced but very difficult to observe from which relationship (edge) the influence comes. Previously, there is a nearly optimal $\tilde{O}(\sqrt{T})$-regret algorithm for OIM problem under the linear threshold (LT) diffusion model with node-level feedback. It remains unknown whether the same algorithm exists for the independent cascade (IC) diffusion model. In this paper, we resolve this open problem by presenting an $\tilde{O}(\sqrt{T})$-regret algorithm for OIM problem under the IC model with node-level feedback.
    SONAR: Joint Architecture and System Optimization Search. (arXiv:2208.12218v1 [cs.LG])
    There is a growing need to deploy machine learning for different tasks on a wide array of new hardware platforms. Such deployment scenarios require tackling multiple challenges, including identifying a model architecture that can achieve a suitable predictive accuracy (architecture search), and finding an efficient implementation of the model to satisfy underlying hardware-specific systems constraints such as latency (system optimization search). Existing works treat architecture search and system optimization search as separate problems and solve them sequentially. In this paper, we instead propose to solve these problems jointly, and introduce a simple but effective baseline method called SONAR that interleaves these two search problems. SONAR aims to efficiently optimize for predictive accuracy and inference latency by applying early stopping to both search processes. Our experiments on multiple different hardware back-ends show that SONAR identifies nearly optimal architectures 30 times faster than a brute force approach.
    Semantic Preserving Adversarial Attack Generation with Autoencoder and Genetic Algorithm. (arXiv:2208.12230v1 [cs.LG])
    Widely used deep learning models are found to have poor robustness. Little noises can fool state-of-the-art models into making incorrect predictions. While there is a great deal of high-performance attack generation methods, most of them directly add perturbations to original data and measure them using L_p norms; this can break the major structure of data, thus, creating invalid attacks. In this paper, we propose a black-box attack, which, instead of modifying original data, modifies latent features of data extracted by an autoencoder; then, we measure noises in semantic space to protect the semantics of data. We trained autoencoders on MNIST and CIFAR-10 datasets and found optimal adversarial perturbations using a genetic algorithm. Our approach achieved a 100% attack success rate on the first 100 data of MNIST and CIFAR-10 datasets with less perturbation than FGSM.
    Supervised Dimensionality Reduction and Classification with Convolutional Autoencoders. (arXiv:2208.12152v1 [cs.LG])
    The joint optimization of the reconstruction and classification error is a hard non convex problem, especially when a non linear mapping is utilized. In order to overcome this obstacle, a novel optimization strategy is proposed, in which a Convolutional Autoencoder for dimensionality reduction and a classifier composed by a Fully Connected Network, are combined to simultaneously produce supervised dimensionality reduction and predictions. It turned out that this methodology can also be greatly beneficial in enforcing explainability of deep learning architectures. Additionally, the resulting Latent Space, optimized for the classification task, can be utilized to improve traditional, interpretable classification algorithms. The experimental results, showed that the proposed methodology achieved competitive results against the state of the art deep learning methods, while being much more efficient in terms of parameter count. Finally, it was empirically justified that the proposed methodology introduces advanced explainability regarding, not only the data structure through the produced latent space, but also about the classification behaviour.
    Causal Inference with Corrupted Data: Measurement Error, Missing Values, Discretization, and Differential Privacy. (arXiv:2107.02780v3 [econ.EM] UPDATED)
    The 2020 US Census will be published with differential privacy, implemented by injecting synthetic noise into the data. Controversy has ensued, with debates that center on the painful trade-off between the privacy of respondents and the precision of economic analysis. Is this trade-off inevitable? To answer this question, we formulate a semiparametric model of causal inference with high dimensional data that may be noisy, missing, discretized, or privatized. We propose a new end-to-end procedure for data cleaning, estimation, and inference with data cleaning-adjusted confidence intervals. We prove consistency, Gaussian approximation, and semiparametric efficiency by finite sample arguments. The rate of Gaussian approximation is $n^{-1/2}$ for semiparametric estimands such as average treatment effect, and it degrades gracefully for nonparametric estimands such as heterogeneous treatment effect. Our key assumption is that the true covariates are approximately low rank, which we interpret as approximate repeated measurements and validate in the Census. In our analysis, we provide nonasymptotic theoretical contributions to matrix completion, statistical learning, and semiparametric statistics. We verify the coverage of the data cleaning-adjusted confidence intervals in simulations. Finally, we conduct a semi-synthetic exercise calibrated to privacy levels mandated for the 2020 US Census.
    DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation. (arXiv:2208.12242v1 [cs.CV])
    Large text-to-image models achieved a remarkable leap in the evolution of AI, enabling high-quality and diverse synthesis of images from a given text prompt. However, these models lack the ability to mimic the appearance of subjects in a given reference set and synthesize novel renditions of them in different contexts. In this work, we present a new approach for "personalization" of text-to-image diffusion models (specializing them to users' needs). Given as input just a few images of a subject, we fine-tune a pretrained text-to-image model (Imagen, although our method is not limited to a specific model) such that it learns to bind a unique identifier with that specific subject. Once the subject is embedded in the output domain of the model, the unique identifier can then be used to synthesize fully-novel photorealistic images of the subject contextualized in different scenes. By leveraging the semantic prior embedded in the model with a new autogenous class-specific prior preservation loss, our technique enables synthesizing the subject in diverse scenes, poses, views, and lighting conditions that do not appear in the reference images. We apply our technique to several previously-unassailable tasks, including subject recontextualization, text-guided view synthesis, appearance modification, and artistic rendering (all while preserving the subject's key features). Project page: https://dreambooth.github.io/
    Seamless Tracking of Group Targets and Ungrouped Targets Using Belief Propagation. (arXiv:2208.12035v1 [cs.LG])
    This paper considers the problem of tracking a large-scale number of group targets. Usually, multi-target in most tracking scenarios are assumed to have independent motion and are well-separated. However, for group target tracking (GTT), the targets within groups are closely spaced and move in a coordinated manner, the groups can split or merge, and the numbers of targets in groups may be large, which lead to more challenging data association, filtering and computation problems. Within the belief propagation (BP) framework, we propose a scalable group target belief propagation (GTBP) method by jointly inferring target existence variables, group structure, data association and target states. The method can efficiently calculate the approximations of the marginal posterior distributions of these variables by performing belief propagation on the devised factor graph. As a consequence, GTBP is capable of capturing the changes in group structure, e.g., group splitting and merging. Furthermore, we model the evolution of targets as the co-action of the group or single-target motions specified by the possible group structures and corresponding probabilities. This flexible modeling enables seamless and simultaneous tracking of multiple group targets and ungrouped targets. Particularly, GTBP has excellent scalability and low computational complexity. It not only maintains the same scalability as BP, i.e., scaling linearly in the number of sensor measurements and quadratically in the number of targets, but also only scales linearly in the number of preserved group partitions. Finally, numerical experiments are presented to demonstrate the effectiveness and scalability of the proposed GTBP method.
    Approximation of Images via Generalized Higher Order Singular Value Decomposition over Finite-dimensional Commutative Semisimple Algebra. (arXiv:2202.00450v8 [cs.LG] UPDATED)
    Low-rank approximation of images via singular value decomposition is well-received in the era of big data. However, singular value decomposition (SVD) is only for order-two data, i.e., matrices. It is necessary to flatten a higher order input into a matrix or break it into a series of order-two slices to tackle higher order data such as multispectral images and videos with the SVD. Higher order singular value decomposition (HOSVD) extends the SVD and can approximate higher order data using sums of a few rank-one components. We consider the problem of generalizing HOSVD over a finite dimensional commutative algebra. This algebra, referred to as a t-algebra, generalizes the field of complex numbers. The elements of the algebra, called t-scalars, are fix-sized arrays of complex numbers. One can generalize matrices and tensors over t-scalars and then extend many canonical matrix and tensor algorithms, including HOSVD, to obtain higher-performance versions. The generalization of HOSVD is called THOSVD. Its performance of approximating multi-way data can be further improved by an alternating algorithm. THOSVD also unifies a wide range of principal component analysis algorithms. To exploit the potential of generalized algorithms using t-scalars for approximating images, we use a pixel neighborhood strategy to convert each pixel to "deeper-order" t-scalar. Experiments on publicly available images show that the generalized algorithm over t-scalars, namely THOSVD, compares favorably with its canonical counterparts.
    Nonparametric Gaussian Mixture Models for the Multi-Armed Bandit. (arXiv:1808.02932v4 [stat.ML] UPDATED)
    We here adopt Bayesian nonparametric mixture models to extend multi-armed bandits in general, and Thompson sampling in particular, to scenarios where there is reward model uncertainty. In the stochastic multi-armed bandit, the reward for the played arm is generated from an unknown distribution. Reward uncertainty, i.e., the lack of knowledge about the reward-generating distribution, induces the exploration-exploitation trade-off: a bandit agent needs to simultaneously learn the properties of the reward distribution and sequentially decide which action to take next. In this work, we extend Thompson sampling to scenarios where there is reward model uncertainty by adopting Bayesian nonparametric Gaussian mixture models for flexible reward density estimation. The proposed Bayesian nonparametric mixture model Thompson sampling sequentially learns the reward model that best approximates the true, yet unknown, per-arm reward distribution, achieving successful regret performance. We derive, based on a novel posterior convergence based analysis, an asymptotic regret bound for the proposed method. In addition, we empirically evaluate its performance in diverse and previously elusive bandit environments, e.g., with rewards not in the exponential family, subject to outliers, and with different per-arm reward distributions. We show that the proposed Bayesian nonparametric Thompson sampling outperforms, both in averaged cumulative regret and in regret volatility, state-of-the-art alternatives. The proposed method is valuable in the presence of bandit reward model uncertainty, as it avoids stringent case-by-case model design choices, yet provides important regret savings.
    Partial Matrix Completion. (arXiv:2208.12063v1 [cs.LG])
    In the matrix completion problem, one wishes to reconstruct a low-rank matrix based on a revealed set of (possibly noisy) entries. Prior work considers completing the entire matrix, which may be highly inaccurate in the common case where the distribution over entries is non-uniform. We formalize the problem of Partial Matrix Completion where the goal is to complete a large subset of the entries, or equivalently to complete the entire matrix and specify an accurate subset of the entries. Interestingly, even though the distribution is unknown and arbitrarily complex, our efficient algorithm is able to guarantee: (a) high accuracy over all completed entries, and (b) high coverage, meaning that it covers at least as much of the matrix as the distribution of observations.
    CAS4DL: Christoffel Adaptive Sampling for function approximation via Deep Learning. (arXiv:2208.12190v1 [cs.LG])
    The problem of approximating smooth, multivariate functions from sample points arises in many applications in scientific computing, e.g., in computational Uncertainty Quantification (UQ) for science and engineering. In these applications, the target function may represent a desired quantity of interest of a parameterized Partial Differential Equation (PDE). Due to the large cost of solving such problems, where each sample is computed by solving a PDE, sample efficiency is a key concerning these applications. Recently, there has been increasing focus on the use of Deep Neural Networks (DNN) and Deep Learning (DL) for learning such functions from data. In this work, we propose an adaptive sampling strategy, CAS4DL (Christoffel Adaptive Sampling for Deep Learning) to increase the sample efficiency of DL for multivariate function approximation. Our novel approach is based on interpreting the second to last layer of a DNN as a dictionary of functions defined by the nodes on that layer. With this viewpoint, we then define an adaptive sampling strategy motivated by adaptive sampling schemes recently proposed for linear approximation schemes, wherein samples are drawn randomly with respect to the Christoffel function of the subspace spanned by this dictionary. We present numerical experiments comparing CAS4DL with standard Monte Carlo (MC) sampling. Our results demonstrate that CAS4DL often yields substantial savings in the number of samples required to achieve a given accuracy, particularly in the case of smooth activation functions, and it shows a better stability in comparison to MC. These results therefore are a promising step towards fully adapting DL towards scientific computing applications.
    A deep learning approach to predict the number of k-barriers for intrusion detection over a circular region using wireless sensor networks. (arXiv:2208.11887v1 [cs.LG])
    Wireless Sensor Networks (WSNs) is a promising technology with enormous applications in almost every walk of life. One of the crucial applications of WSNs is intrusion detection and surveillance at the border areas and in the defense establishments. The border areas are stretched in hundreds to thousands of miles, hence, it is not possible to patrol the entire border region. As a result, an enemy may enter from any point absence of surveillance and cause the loss of lives or destroy the military establishments. WSNs can be a feasible solution for the problem of intrusion detection and surveillance at the border areas. Detection of an enemy at the border areas and nearby critical areas such as military cantonments is a time-sensitive task as a delay of few seconds may have disastrous consequences. Therefore, it becomes imperative to design systems that are able to identify and detect the enemy as soon as it comes in the range of the deployed system. In this paper, we have proposed a deep learning architecture based on a fully connected feed-forward Artificial Neural Network (ANN) for the accurate prediction of the number of k-barriers for fast intrusion detection and prevention. We have trained and evaluated the feed-forward ANN model using four potential features, namely area of the circular region, sensing range of sensors, the transmission range of sensors, and the number of sensor for Gaussian and uniform sensor distribution. These features are extracted through Monte Carlo simulation. In doing so, we found that the model accurately predicts the number of k-barriers for both Gaussian and uniform sensor distribution with correlation coefficient (R = 0.78) and Root Mean Square Error (RMSE = 41.15) for the former and R = 0.79 and RMSE = 48.36 for the latter. Further, the proposed approach outperforms the other benchmark algorithms in terms of accuracy and computational time complexity.
    On confidence intervals for precision matrices and the eigendecomposition of covariance matrices. (arXiv:2208.11977v1 [math.ST])
    The eigendecomposition of a matrix is the central procedure in probabilistic models based on matrix factorization, for instance principal component analysis and topic models. Quantifying the uncertainty of such a decomposition based on a finite sample estimate is essential to reasoning under uncertainty when employing such models. This paper tackles the challenge of computing confidence bounds on the individual entries of eigenvectors of a covariance matrix of fixed dimension. Moreover, we derive a method to bound the entries of the inverse covariance matrix, the so-called precision matrix. The assumptions behind our method are minimal and require that the covariance matrix exists, and its empirical estimator converges to the true covariance. We make use of the theory of U-statistics to bound the $L_2$ perturbation of the empirical covariance matrix. From this result, we obtain bounds on the eigenvectors using Weyl's theorem and the eigenvalue-eigenvector identity and we derive confidence intervals on the entries of the precision matrix using matrix inversion perturbation bounds. As an application of these results, we demonstrate a new statistical test, which allows us to test for non-zero values of the precision matrix. We compare this test to the well-known Fisher-z test for partial correlations, and demonstrate the soundness and scalability of the proposed statistical test, as well as its application to real-world data from medical and physics domains.
    Human-Level Control through Directly-Trained Deep Spiking Q-Networks. (arXiv:2201.07211v2 [cs.NE] UPDATED)
    As the third-generation neural networks, Spiking Neural Networks (SNNs) have great potential on neuromorphic hardware because of their high energy-efficiency. However, Deep Spiking Reinforcement Learning (DSRL), i.e., the Reinforcement Learning (RL) based on SNNs, is still in its preliminary stage due to the binary output and the non-differentiable property of the spiking function. To address these issues, we propose a Deep Spiking Q-Network (DSQN) in this paper. Specifically, we propose a directly-trained deep spiking reinforcement learning architecture based on the Leaky Integrate-and-Fire (LIF) neurons and Deep Q-Network (DQN). Then, we adapt a direct spiking learning algorithm for the Deep Spiking Q-Network. We further demonstrate the advantages of using LIF neurons in DSQN theoretically. Comprehensive experiments have been conducted on 17 top-performing Atari games to compare our method with the state-of-the-art conversion method. The experimental results demonstrate the superiority of our method in terms of performance, stability, robustness and energy-efficiency. To the best of our knowledge, our work is the first one to achieve state-of-the-art performance on multiple Atari games with the directly-trained SNN.
    DOLPHINS: Dataset for Collaborative Perception enabled Harmonious and Interconnected Self-driving. (arXiv:2207.07609v1 [cs.CV] CROSS LISTED)
    Vehicle-to-Everything (V2X) network has enabled collaborative perception in autonomous driving, which is a promising solution to the fundamental defect of stand-alone intelligence including blind zones and long-range perception. However, the lack of datasets has severely blocked the development of collaborative perception algorithms. In this work, we release DOLPHINS: Dataset for cOllaborative Perception enabled Harmonious and INterconnected Self-driving, as a new simulated large-scale various-scenario multi-view multi-modality autonomous driving dataset, which provides a ground-breaking benchmark platform for interconnected autonomous driving. DOLPHINS outperforms current datasets in six dimensions: temporally-aligned images and point clouds from both vehicles and Road Side Units (RSUs) enabling both Vehicle-to-Vehicle (V2V) and Vehicle-to-Infrastructure (V2I) based collaborative perception; 6 typical scenarios with dynamic weather conditions make the most various interconnected autonomous driving dataset; meticulously selected viewpoints providing full coverage of the key areas and every object; 42376 frames and 292549 objects, as well as the corresponding 3D annotations, geo-positions, and calibrations, compose the largest dataset for collaborative perception; Full-HD images and 64-line LiDARs construct high-resolution data with sufficient details; well-organized APIs and open-source codes ensure the extensibility of DOLPHINS. We also construct a benchmark of 2D detection, 3D detection, and multi-view collaborative perception tasks on DOLPHINS. The experiment results show that the raw-level fusion scheme through V2X communication can help to improve the precision as well as to reduce the necessity of expensive LiDAR equipment on vehicles when RSUs exist, which may accelerate the popularity of interconnected self-driving vehicles. DOLPHINS is now available on https://dolphins-dataset.net/.
    ECG-ATK-GAN: Robustness against Adversarial Attacks on ECGs using Conditional Generative Adversarial Networks. (arXiv:2110.09983v3 [eess.SP] UPDATED)
    Automating arrhythmia detection from ECG requires a robust and trusted system that retains high accuracy under electrical disturbances. Many machine learning approaches have reached human-level performance in classifying arrhythmia from ECGs. However, these architectures are vulnerable to adversarial attacks, which can misclassify ECG signals by decreasing the model's accuracy. Adversarial attacks are small crafted perturbations injected in the original data which manifest the out-of-distribution shifts in signal to misclassify the correct class. Thus, security concerns arise for false hospitalization and insurance fraud abusing these perturbations. To mitigate this problem, we introduce the first novel Conditional Generative Adversarial Network (GAN), robust against adversarial attacked ECG signals and retaining high accuracy. Our architecture integrates a new class-weighted objective function for adversarial perturbation identification and new blocks for discerning and combining out-of-distribution shifts in signals in the learning process for accurately classifying various arrhythmia types. Furthermore, we benchmark our architecture on six different white and black-box attacks and compare them with other recently proposed arrhythmia classification models on two publicly available ECG arrhythmia datasets. The experiment confirms that our model is more robust against such adversarial attacks for classifying arrhythmia with high accuracy.
    Causal Strategic Linear Regression. (arXiv:2002.10066v3 [cs.LG] UPDATED)
    In many predictive decision-making scenarios, such as credit scoring and academic testing, a decision-maker must construct a model that accounts for agents' propensity to "game" the decision rule by changing their features so as to receive better decisions. Whereas the strategic classification literature has previously assumed that agents' outcomes are not causally affected by their features (and thus that strategic agents' goal is deceiving the decision-maker), we join concurrent work in modeling agents' outcomes as a function of their changeable attributes. As our main contribution, we provide efficient algorithms for learning decision rules that optimize three distinct decision-maker objectives in a realizable linear setting: accurately predicting agents' post-gaming outcomes (prediction risk minimization), incentivizing agents to improve these outcomes (agent outcome maximization), and estimating the coefficients of the true underlying model (parameter estimation). Our algorithms circumvent a hardness result of Miller et al. (2020) by allowing the decision maker to test a sequence of decision rules and observe agents' responses, in effect performing causal interventions through the decision rules.
    A Globally Convergent Gradient-based Bilevel Hyperparameter Optimization Method. (arXiv:2208.12118v1 [cs.LG])
    Hyperparameter optimization in machine learning is often achieved using naive techniques that only lead to an approximate set of hyperparameters. Although techniques such as Bayesian optimization perform an intelligent search on a given domain of hyperparameters, it does not guarantee an optimal solution. A major drawback of most of these approaches is an exponential increase of their search domain with number of hyperparameters, increasing the computational cost and making the approaches slow. The hyperparameter optimization problem is inherently a bilevel optimization task, and some studies have attempted bilevel solution methodologies for solving this problem. However, these studies assume a unique set of model weights that minimize the training loss, which is generally violated by deep learning architectures. This paper discusses a gradient-based bilevel method addressing these drawbacks for solving the hyperparameter optimization problem. The proposed method can handle continuous hyperparameters for which we have chosen the regularization hyperparameter in our experiments. The method guarantees convergence to the set of optimal hyperparameters that this study has theoretically proven. The idea is based on approximating the lower-level optimal value function using Gaussian process regression. As a result, the bilevel problem is reduced to a single level constrained optimization task that is solved using the augmented Lagrangian method. We have performed an extensive computational study on the MNIST and CIFAR-10 datasets on multi-layer perceptron and LeNet architectures that confirms the efficiency of the proposed method. A comparative study against grid search, random search, Bayesian optimization, and HyberBand method on various hyperparameter problems shows that the proposed algorithm converges with lower computation and leads to models that generalize better on the testing set.
    Towards deep observation: A systematic survey on artificial intelligence techniques to monitor fetus via Ultrasound Images. (arXiv:2201.07935v2 [cs.LG] UPDATED)
    Developing innovative informatics approaches aimed to enhance fetal monitoring is a burgeoning field of study in reproductive medicine. Several reviews have been conducted regarding Artificial intelligence (AI) techniques to improve pregnancy outcomes. They are limited by focusing on specific data such as mother's care during pregnancy. This systematic survey aims to explore how artificial intelligence (AI) can assist with fetal growth monitoring via Ultrasound (US) image. We used eight medical and computer science bibliographic databases, including PubMed, Embase, PsycINFO, ScienceDirect, IEEE explore, ACM Library, Google Scholar, and the Web of Science. We retrieved studies published between 2010 to 2021. Data extracted from studies were synthesized using a narrative approach. Out of 1269 retrieved studies, we included 107 distinct studies from queries that were relevant to the topic in the survey. We found that 2D ultrasound images were more popular (n=88) than 3D and 4D ultrasound images (n=19). Classification is the most used method (n=42), followed by segmentation (n=31), classification integrated with segmentation (n=16) and other miscellaneous such as object-detection, regression and reinforcement learning (n=18). The most common areas within the pregnancy domain were the fetus head (n=43), then fetus body (n=31), fetus heart (n=13), fetus abdomen (n=10), and lastly the fetus face (n=10). In the most recent studies, deep learning techniques were primarily used (n=81), followed by machine learning (n=16), artificial neural network (n=7), and reinforcement learning (n=2). AI techniques played a crucial role in predicting fetal diseases and identifying fetus anatomy structures during pregnancy. More research is required to validate this technology from a physician's perspective, such as pilot studies and randomized controlled trials on AI and its applications in a hospital setting.
    Deep neural networks for fast acquisition of aortic 3D pressure and velocity flow fields. (arXiv:2208.12156v1 [physics.flu-dyn])
    Computational fluid dynamics (CFD) can be used to simulate vascular haemodynamics and analyse potential treatment options. CFD has shown to be beneficial in improving patient outcomes. However, the implementation of CFD for routine clinical use is yet to be realised. Barriers for CFD include high computational resources, specialist experience needed for designing simulation set-ups, and long processing times. The aim of this study was to explore the use of machine learning (ML) to replicate conventional aortic CFD with automatic and fast regression models. Data used to train/test the model comprised of 3,000 CFD simulations performed on synthetically generated 3D aortic shapes. These subjects were generated from a statistical shape model (SSM) built on real patient-specific aortas (N=67). Inference performed on 200 test shapes resulted in average errors of 6.01% +/-3.12 SD and 3.99% +/-0.93 SD for pressure and velocity, respectively. Our ML-based models performed CFD in ~0.075 seconds (4,000x faster than the solver). This study shows that results from conventional vascular CFD can be reproduced using ML at a much faster rate, in an automatic process, and with high accuracy.
    Biologically Inspired Neural Path Finding. (arXiv:2206.05971v2 [cs.LG] UPDATED)
    The human brain can be considered to be a graphical structure comprising of tens of billions of biological neurons connected by synapses. It has the remarkable ability to automatically re-route information flow through alternate paths in case some neurons are damaged. Moreover, the brain is capable of retaining information and applying it to similar but completely unseen scenarios. In this paper, we take inspiration from these attributes of the brain, to develop a computational framework to find the optimal low cost path between a source node and a destination node in a generalized graph. We show that our framework is capable of handling unseen graphs at test time. Moreover, it can find alternate optimal paths, when nodes are arbitrarily added or removed during inference, while maintaining a fixed prediction time. Code is available here: https://github.com/hangligit/pathfinding
    A conditional one-output likelihood formulation for multitask Gaussian processes. (arXiv:2006.03495v4 [cs.LG] UPDATED)
    Multitask Gaussian processes (MTGP) are the Gaussian process (GP) framework's solution for multioutput regression problems in which the $T$ elements of the regressors cannot be considered conditionally independent given the observations. Standard MTGP models assume that there exist both a multitask covariance matrix as a function of an intertask matrix, and a noise covariance matrix. These matrices need to be approximated by a low rank simplification of order $P$ in order to reduce the number of parameters to be learnt from $T^2$ to $TP$. Here we introduce a novel approach that simplifies the multitask learning by reducing it to a set of conditioned univariate GPs without the need for any low rank approximations, therefore completely eliminating the requirement to select an adequate value for hyperparameter $P$. At the same time, by extending this approach with both a hierarchical and an approximate model, the proposed extensions are capable of recovering the multitask covariance and noise matrices after learning only $2T$ parameters, avoiding the validation of any model hyperparameter and reducing the overall complexity of the model as well as the risk of overfitting. Experimental results over synthetic and real problems confirm the advantages of this inference approach in its ability to accurately recover the original noise and signal matrices, as well as the achieved performance improvement in comparison to other state of art MTGP approaches. We have also integrated the model with standard GP toolboxes, showing that it is computationally competitive with state of the art options.
    Black box tests for algorithmic stability. (arXiv:2111.15546v3 [cs.LG] UPDATED)
    Algorithmic stability is a concept from learning theory that expresses the degree to which changes to the input data (e.g., removal of a single data point) may affect the outputs of a regression algorithm. Knowing an algorithm's stability properties is often useful for many downstream applications -- for example, stability is known to lead to desirable generalization properties and predictive inference guarantees. However, many modern algorithms currently used in practice are too complex for a theoretical analysis of their stability properties, and thus we can only attempt to establish these properties through an empirical exploration of the algorithm's behavior on various data sets. In this work, we lay out a formal statistical framework for this kind of "black box testing" without any assumptions on the algorithm or the data distribution, and establish fundamental bounds on the ability of any black box test to identify algorithmic stability.
    Learning Rate Perturbation: A Generic Plugin of Learning Rate Schedule towards Flatter Local Minima. (arXiv:2208.11873v1 [cs.LG])
    Learning rate is one of the most important hyper-parameters that has a significant influence on neural network training. Learning rate schedules are widely used in real practice to adjust the learning rate according to pre-defined schedules for fast convergence and good generalization. However, existing learning rate schedules are all heuristic algorithms and lack theoretical support. Therefore, people usually choose the learning rate schedules through multiple ad-hoc trials, and the obtained learning rate schedules are sub-optimal. To boost the performance of the obtained sub-optimal learning rate schedule, we propose a generic learning rate schedule plugin, called LEArning Rate Perturbation (LEAP), which can be applied to various learning rate schedules to improve the model training by introducing a certain perturbation to the learning rate. We found that, with such a simple yet effective strategy, training processing exponentially favors flat minima rather than sharp minima with guaranteed convergence, which leads to better generalization ability. In addition, we conduct extensive experiments which show that training with LEAP can improve the performance of various deep learning models on diverse datasets using various learning rate schedules (including constant learning rate).
    Supervised Contrastive Learning for Affect Modelling. (arXiv:2208.12238v1 [cs.HC])
    Affect modeling is viewed, traditionally, as the process of mapping measurable affect manifestations from multiple modalities of user input to affect labels. That mapping is usually inferred through end-to-end (manifestation-to-affect) machine learning processes. What if, instead, one trains general, subject-invariant representations that consider affect information and then uses such representations to model affect? In this paper we assume that affect labels form an integral part, and not just the training signal, of an affect representation and we explore how the recent paradigm of contrastive learning can be employed to discover general high-level affect-infused representations for the purpose of modeling affect. We introduce three different supervised contrastive learning approaches for training representations that consider affect information. In this initial study we test the proposed methods for arousal prediction in the RECOLA dataset based on user information from multiple modalities. Results demonstrate the representation capacity of contrastive learning and its efficiency in boosting the accuracy of affect models. Beyond their evidenced higher performance compared to end-to-end arousal classification, the resulting representations are general-purpose and subject-agnostic, as training is guided though general affect information available in any multimodal corpus.
    Credit card fraud detection - Classifier selection strategy. (arXiv:2208.11900v1 [cs.LG])
    Machine learning has opened up new tools for financial fraud detection. Using a sample of annotated transactions, a machine learning classification algorithm learns to detect frauds. With growing credit card transaction volumes and rising fraud percentages there is growing interest in finding appropriate machine learning classifiers for detection. However, fraud data sets are diverse and exhibit inconsistent characteristics. As a result, a model effective on a given data set is not guaranteed to perform on another. Further, the possibility of temporal drift in data patterns and characteristics over time is high. Additionally, fraud data has massive and varying imbalance. In this work, we evaluate sampling methods as a viable pre-processing mechanism to handle imbalance and propose a data-driven classifier selection strategy for characteristic highly imbalanced fraud detection data sets. The model derived based on our selection strategy surpasses peer models, whilst working in more realistic conditions, establishing the effectiveness of the strategy.
    Equivalence of quantum barren plateaus to cost concentration and narrow gorges. (arXiv:2104.05868v2 [quant-ph] UPDATED)
    Optimizing parameterized quantum circuits (PQCs) is the leading approach to make use of near-term quantum computers. However, very little is known about the cost function landscape for PQCs, which hinders progress towards quantum-aware optimizers. In this work, we investigate the connection between three different landscape features that have been observed for PQCs: (1) exponentially vanishing gradients (called barren plateaus), (2) exponential cost concentration about the mean, and (3) the exponential narrowness of minina (called narrow gorges). We analytically prove that these three phenomena occur together, i.e., when one occurs then so do the other two. A key implication of this result is that one can numerically diagnose barren plateaus via cost differences rather than via the computationally more expensive gradients. More broadly, our work shows that quantum mechanics rules out certain cost landscapes (which otherwise would be mathematically possible), and hence our results are interesting from a quantum foundations perspective.
    Nonparametric adaptive control and prediction: theory and randomized algorithms. (arXiv:2106.03589v3 [math.OC] UPDATED)
    A key assumption in the theory of nonlinear adaptive control is that the uncertainty of the system can be expressed in the linear span of a set of known basis functions. While this assumption leads to efficient algorithms, it limits applications to very specific classes of systems. We introduce a novel nonparametric adaptive algorithm that estimates an infinite-dimensional density over parameters online to learn an unknown dynamics in a reproducing kernel Hilbert space. Surprisingly, the resulting control input admits an analytical expression that enables its implementation despite its underlying infinite-dimensional structure. While this adaptive input is rich and expressive - subsuming, for example, traditional linear parameterizations - its computational complexity grows linearly with time, making it comparatively more expensive than its parametric counterparts. Leveraging the theory of random Fourier features, we provide an efficient randomized implementation that recovers the complexity of classical parametric methods while provably retaining the expressivity of the nonparametric input. In particular, our explicit bounds only depend polynomially on the underlying parameters of the system, allowing our proposed algorithms to efficiently scale to high-dimensional systems. As an illustration of the method, we demonstrate the ability of the randomized approximation algorithm to learn a predictive model of a 60-dimensional system consisting of ten point masses interacting through Newtonian gravitation. By reinterpretation as a gradient flow on a specific loss, we conclude with a natural extension of our kernel-based adaptive algorithms to deep neural networks. We show empirically that the extra expressivity afforded by deep representations can lead to improved performance at the expense of closed-loop stability that is rigorously guaranteed and consistently observed for kernel machines.
    A CNN-LSTM-based hybrid deep learning approach to detect sentiment polarities on Monkeypox tweets. (arXiv:2208.12019v1 [cs.CV])
    People have recently begun communicating their thoughts and viewpoints through user-generated multimedia material on social networking websites. This information can be images, text, videos, or audio. Recent years have seen a rise in the frequency of occurrence of this pattern. Twitter is one of the most extensively utilized social media sites, and it is also one of the finest locations to get a sense of how people feel about events that are linked to the Monkeypox sickness. This is because tweets on Twitter are shortened and often updated, both of which contribute to the platform's character. The fundamental objective of this study is to get a deeper comprehension of the diverse range of reactions people have in response to the presence of this condition. This study focuses on finding out what individuals think about monkeypox illnesses, which presents a hybrid technique based on CNN and LSTM. We have considered all three possible polarities of a user's tweet: positive, negative, and neutral. An architecture built on CNN and LSTM is utilized to determine how accurate the prediction models are. The recommended model's accuracy was 94% on the monkeypox tweet dataset. Other performance metrics such as accuracy, recall, and F1-score were utilized to test our models and results in the most time and resource-effective manner. The findings are then compared to more traditional approaches to machine learning. The findings of this research contribute to an increased awareness of the monkeypox infection in the general population.
    Data-driven approaches for predicting spread of infectious diseases through DINNs: Disease Informed Neural Networks. (arXiv:2110.05445v3 [cs.LG] UPDATED)
    In this work, we present an approach called Disease Informed Neural Networks (DINNs) that can be employed to effectively predict the spread of infectious diseases. This approach builds on a successful physics informed neural network approaches that have been applied to a variety of applications that can be modeled by linear and non-linear ordinary and partial differential equations. Specifically, we build on the application of PINNs to SIR compartmental models and expand it a scaffolded family of mathematical models describing various infectious diseases. We show how the neural networks are capable of learning how diseases spread, forecasting their progression, and finding their unique parameters (e.g. death rate). To demonstrate the robustness and efficacy of DINNs, we apply the approach to eleven highly infectious diseases that have been modeled in increasing levels of complexity. Our computational experiments suggest that DINNs is a reliable candidate for effectively learn about the dynamics of spread and forecast its progression into the future from available real-world data.
    A Feedforward Unitary Equivariant Neural Network. (arXiv:2208.12146v1 [cs.LG])
    We devise a new type of feedforward neural network. It is equivariant with respect to the unitary group $U(n)$. The input and output can be vectors in $\mathbb{C}^n$ with arbitrary dimension $n$. No convolution layer is required in our implementation. We avoid errors due to truncated higher order terms in Fourier-like transformation. The implementation of each layer can be done efficiently using simple calculations. As a proof of concept, we have given empirical results on the prediction of the dynamics of atomic motion to demonstrate the practicality of our approach.
    Efficient Activation Quantization via Adaptive Rounding Border for Post-Training Quantization. (arXiv:2208.11945v1 [cs.LG])
    Post-training quantization (PTQ) attracts increasing attention due to its convenience in deploying quantized neural networks. Rounding, the primary source of quantization error, is optimized only for model weights, while activations still use the rounding-to-nearest operation. In this work, for the first time, we demonstrate that well-chosen rounding schemes for activations can improve the final accuracy. To deal with the challenge of the dynamicity of the activation rounding scheme, we adaptively adjust the rounding border through a simple function to generate rounding schemes at the inference stage. The border function covers the impact of weight errors, activation errors, and propagated errors to eliminate the bias of the element-wise error, which further benefits model accuracy. We also make the border aware of global errors to better fit different arriving activations. Finally, we propose the AQuant framework to learn the border function. Extensive experiments show that AQuant achieves noticeable improvements with negligible overhead compared with state-of-the-art works and pushes the accuracy of ResNet-18 up to 60.3\% under the 2-bit weight and activation post-training quantization.
    A Comparison of Reinforcement Learning Frameworks for Software Testing Tasks. (arXiv:2208.12136v1 [cs.SE])
    Software testing activities aim to find the possible defects of a software product and ensure that the product meets its expected requirements. Some software testing approached are lacking automation or are partly automated which increases the testing time and overall software testing costs. Recently, Reinforcement Learning (RL) has been successfully employed in complex testing tasks such as game testing, regression testing, and test case prioritization to automate the process and provide continuous adaptation. Practitioners can employ RL by implementing from scratch an RL algorithm or use an RL framework. Developers have widely used these frameworks to solve problems in various domains including software testing. However, to the best of our knowledge, there is no study that empirically evaluates the effectiveness and performance of pre-implemented algorithms in RL frameworks. In this paper, we empirically investigate the applications of carefully selected RL algorithms on two important software testing tasks: test case prioritization in the context of Continuous Integration (CI) and game testing. For the game testing task, we conduct experiments on a simple game and use RL algorithms to explore the game to detect bugs. Results show that some of the selected RL frameworks such as Tensorforce outperform recent approaches in the literature. To prioritize test cases, we run experiments on a CI environment where RL algorithms from different frameworks are used to rank the test cases. Our results show that the performance difference between pre-implemented algorithms in some cases is considerable, motivating further investigation. Moreover, empirical evaluations on some benchmark problems are recommended for researchers looking to select RL frameworks, to make sure that RL algorithms perform as intended.
    Algorithmic Differentiation for Automatized Modelling of Machine Learned Force Fields. (arXiv:2208.12104v1 [physics.chem-ph])
    Reconstructing force fields (FF) from atomistic simulation data is a challenge since accurate data can be highly expensive. Here, machine learning (ML) models can help to be data economic as they can be successfully constrained using the underlying symmetry and conservation laws of physics. However, so far, every descriptor newly proposed for an ML model has required a cumbersome and mathematically tedious remodeling. We therefore propose to use modern techniques from algorithmic differentiation within the ML modeling process -- effectively enabling the usage of novel descriptors or models fully automatically at an order of magnitude higher computational efficiency. This paradigmatic approach enables not only a versatile usage of novel representations, the efficient computation of larger systems -- all of high value to the FF community -- but also the simple inclusion of further physical knowledge such as higher-order information (e.g.~Hessians, more complex partial differential equations constraints etc.), even beyond the presented FF domain.
    Prediction of the energy and exergy performance of F135 PW100 turbofan engine via deep learning. (arXiv:2208.12028v1 [cs.LG])
    In the present study, the effects of flight-Mach number, flight altitude, fuel types, and intake air temperature on thrust specific fuel consumption, thrust, intake air mass flow rate, thermal and propulsive efficiecies, as well as the exergetic efficiency and the exergy destruction rate in F135 PW100 engine are investigated. Based on the results obtained in the first phase, to model the thermodynamic performance of the aforementioned engine cycle, Flight-Mach number and flight altitude are considered to be 2.5 and 30,000 m, respectively; due to the operational advantage of supersonic flying at high altitude flight conditions, and the higher thrust of hydrogen fuel. Accordingly, in the second phase, taking into account the mentioned flight conditions, an intelligent model has been obtained to predict output parameters (i.e., thrust, thrust specific fuel consumption, and overall exergetic efficiency) using the deep learning method. In the attained deep neural model, the pressure ratio of the high-pressure turbine, fan pressure ratio, turbine inlet temperature, intake air temperature, and bypass ratio are considered input parameters. The provided datasets are randomly divided into two sets: the first one contains 6079 samples for model training and the second set contains 1520 samples for testing. In particular, the Adam optimization algorithm, the cost function of the mean square error, and the active function of rectified linear unit are used to train the network. The results show that the error percentage of the deep neural model is equal to 5.02%, 1.43%, and 2.92% to predict thrust, thrust specific fuel consumption, and overall exergetic efficiency, respectively, which indicates the success of the attained model in estimating the output parameters of the present problem.
    A simplified convergence theory for Byzantine resilient stochastic gradient descent. (arXiv:2208.11879v1 [cs.LG])
    In distributed learning, a central server trains a model according to updates provided by nodes holding local data samples. In the presence of one or more malicious servers sending incorrect information (a Byzantine adversary), standard algorithms for model training such as stochastic gradient descent (SGD) fail to converge. In this paper, we present a simplified convergence theory for the generic Byzantine Resilient SGD method originally proposed by Blanchard et al. [NeurIPS 2017]. Compared to the existing analysis, we shown convergence to a stationary point in expectation under standard assumptions on the (possibly nonconvex) objective function and flexible assumptions on the stochastic gradients.
    Domain-informed graph neural networks: a quantum chemistry case study. (arXiv:2208.11934v1 [cs.LG])
    We explore different strategies to integrate prior domain knowledge into the design of a deep neural network (DNN). We focus on graph neural networks (GNN), with a use case of estimating the potential energy of chemical systems (molecules and crystals) represented as graphs. We integrate two elements of domain knowledge into the design of the GNN to constrain and regularise its learning, towards higher accuracy and generalisation. First, knowledge on the existence of different types of relations (chemical bonds) between atoms is used to modulate the interaction of nodes in the GNN. Second, knowledge of the relevance of some physical quantities is used to constrain the learnt features towards a higher physical relevance using a simple multi-task paradigm. We demonstrate the general applicability of our knowledge integrations by applying them to two architectures that rely on different mechanisms to propagate information between nodes and to update node states.
    Automatic Mapping of Unstructured Cyber Threat Intelligence: An Experimental Study. (arXiv:2208.12144v1 [cs.CR])
    Proactive approaches to security, such as adversary emulation, leverage information about threat actors and their techniques (Cyber Threat Intelligence, CTI). However, most CTI still comes in unstructured forms (i.e., natural language), such as incident reports and leaked documents. To support proactive security efforts, we present an experimental study on the automatic classification of unstructured CTI into attack techniques using machine learning (ML). We contribute with two new datasets for CTI analysis, and we evaluate several ML models, including both traditional and deep learning-based ones. We present several lessons learned about how ML can perform at this task, which classifiers perform best and under which conditions, which are the main causes of classification errors, and the challenges ahead for CTI analysis.
    JAXFit: Trust Region Method for Nonlinear Least-Squares Curve Fitting on the GPU. (arXiv:2208.12187v1 [cs.LG])
    We implement a trust region method on the GPU for nonlinear least squares curve fitting problems using a new deep learning Python library called JAX. Our open source package, JAXFit, works for both unconstrained and constrained curve fitting problems and allows the fit functions to be defined in Python alone -- without any specialized knowledge of either the GPU or CUDA programming. Since JAXFit runs on the GPU, it is much faster than CPU based libraries and even other GPU based libraries, despite being very easy to use. Additionally, due to JAX's deep learning foundations, the Jacobian in JAXFit's trust region algorithm is calculated with automatic differentiation, rather than than using derivative approximations or requiring the user to define the fit function's partial derivatives.
    Subgraph Neighboring Relations Infomax for Inductive Link Prediction on Knowledge Graphs. (arXiv:2208.00850v2 [cs.AI] UPDATED)
    Inductive link prediction for knowledge graph aims at predicting missing links between unseen entities, those not shown in training stage. Most previous works learn entity-specific embeddings of entities, which cannot handle unseen entities. Recent several methods utilize enclosing subgraph to obtain inductive ability. However, all these works only consider the enclosing part of subgraph without complete neighboring relations, which leads to the issue that partial neighboring relations are neglected, and sparse subgraphs are hard to be handled. To address that, we propose Subgraph Neighboring Relations Infomax, SNRI, which sufficiently exploits complete neighboring relations from two aspects: neighboring relational feature for node feature and neighboring relational path for sparse subgraph. To further model neighboring relations in a global way, we innovatively apply mutual information (MI) maximization for knowledge graph. Experiments show that SNRI outperforms existing state-of-art methods by a large margin on inductive link prediction task, and verify the effectiveness of exploring complete neighboring relations in a global way to characterize node features and reason on sparse subgraphs.
    Understanding Diffusion Models: A Unified Perspective. (arXiv:2208.11970v1 [cs.LG])
    Diffusion models have shown incredible capabilities as generative models; indeed, they power the current state-of-the-art models on text-conditioned image generation such as Imagen and DALL-E 2. In this work we review, demystify, and unify the understanding of diffusion models across both variational and score-based perspectives. We first derive Variational Diffusion Models (VDM) as a special case of a Markovian Hierarchical Variational Autoencoder, where three key assumptions enable tractable computation and scalable optimization of the ELBO. We then prove that optimizing a VDM boils down to learning a neural network to predict one of three potential objectives: the original source input from any arbitrary noisification of it, the original source noise from any arbitrarily noisified input, or the score function of a noisified input at any arbitrary noise level. We then dive deeper into what it means to learn the score function, and connect the variational perspective of a diffusion model explicitly with the Score-based Generative Modeling perspective through Tweedie's Formula. Lastly, we cover how to learn a conditional distribution using diffusion models via guidance.
    Local Intrinsic Dimensionality Measures for Graphs, with Applications to Graph Embeddings. (arXiv:2208.11986v1 [cs.LG])
    The notion of local intrinsic dimensionality (LID) is an important advancement in data dimensionality analysis, with applications in data mining, machine learning and similarity search problems. Existing distance-based LID estimators were designed for tabular datasets encompassing data points represented as vectors in a Euclidean space. After discussing their limitations for graph-structured data considering graph embeddings and graph distances, we propose NC-LID, a novel LID-related measure for quantifying the discriminatory power of the shortest-path distance with respect to natural communities of nodes as their intrinsic localities. It is shown how this measure can be used to design LID-aware graph embedding algorithms by formulating two LID-elastic variants of node2vec with personalized hyperparameters that are adjusted according to NC-LID values. Our empirical analysis of NC-LID on a large number of real-world graphs shows that this measure is able to point to nodes with high link reconstruction errors in node2vec embeddings better than node centrality metrics. The experimental evaluation also shows that the proposed LID-elastic node2vec extensions improve node2vec by better preserving graph structure in generated embeddings.
    Sub-GMN: The Neural Subgraph Matching Network Model. (arXiv:2104.00186v4 [cs.LG] UPDATED)
    As one of the most fundamental tasks in graph theory, subgraph matching is a crucial task in many fields, ranging from information retrieval, computer vision, biology, chemistry and natural language processing. Yet subgraph matching problem remains to be an NP-complete problem. This study proposes an end-to-end learning-based approximate method for subgraph matching task, called subgraph matching network (Sub-GMN). The proposed Sub-GMN firstly uses graph representation learning to map nodes to node-level embedding. It then combines metric learning and attention mechanisms to model the relationship between matched nodes in the data graph and query graph. To test the performance of the proposed method, we applied our method on two databases. We used two existing methods, GNN and FGNN as baseline for comparison. Our experiment shows that, on dataset 1, on average the accuracy of Sub-GMN are 12.21\% and 3.2\% higher than that of GNN and FGNN respectively. On average running time Sub-GMN runs 20-40 times faster than FGNN. In addition, the average F1-score of Sub-GMN on all experiments with dataset 2 reached 0.95, which demonstrates that Sub-GMN outputs more correct node-to-node matches. Comparing with the previous GNNs-based methods for subgraph matching task, our proposed Sub-GMN allows varying query and data graphes in the test/application stage, while most previous GNNs-based methods can only find a matched subgraph in the data graph during the test/application for the same query graph used in the training stage. Another advantage of our proposed Sub-GMN is that it can output a list of node-to-node matches, while most existing end-to-end GNNs based methods cannot provide the matched node pairs.
    Zero-delay Consistent and Smooth Trainable Interpolation. (arXiv:2203.03776v2 [cs.LG] UPDATED)
    The question of how to produce a smooth interpolating curve from a stream of uncertainty regions, which become available sequentially, is addressed in this paper. To this end, we formalize the concept of real-time interpolator (RTI): a trainable recurrent unit that reconstructs smooth signals that are consistent with the received uncertainty regions in an online manner. More specifically, an RTI works under the requirement of reconstructing a section of the signal immediately after an uncertainty region is revealed (zero delay), without changing the reconstructed signal in the previous sections. Particularly, this work formulates the design of spline-based RTIs and proposes a data-driven training procedure, which minimizes the average curvature of the interpolated signals over a set of example sequences. These sequences are representative of the nature of the data sequence to be interpolated, allowing to tailor the RTI to any specific signal source. Our overall design allows for different possible schemes due to its modular structure, but in this work, we present two approaches, namely, the parametrized RTI and the recurrent neural network (RNN)-based RTI, including their architectures and properties. Experimental results show that the two proposed RTIs can be trained to achieve improved performance (in terms of the curvature loss metric) with respect to a myopic-type RTI that only exploits the local information at each time step while maintaining smooth, zero-delay, and consistency requirements.
    Diffusion Asymptotics for Sequential Experiments. (arXiv:2101.09855v4 [math.ST] UPDATED)
    We propose a new diffusion-asymptotic analysis for sequentially randomized experiments, including those that arise in solving multi-armed bandit problems. In an experiment with $n$ time steps, we let the mean reward gaps between actions scale to the order $1/\sqrt{n}$ so as to preserve the difficulty of the learning task as $n$ grows. In this regime, we show that the behavior of a class of sequentially randomized Markov experiments converges to a diffusion limit, given as the solution to a stochastic differential equation. The diffusion limit thus enables us to derive refined, instance-specific characterization of the stochastic dynamics of sequential experiments. We use the diffusion limit to obtain several new insights on the regret and belief evolution of sequential experiments, including Thompson sampling. On the one hand, we show that all sequential experiments whose randomization probabilities have a Lipschitz-continuous dependence on the observed data suffer from sub-optimal regret performance when the reward gaps are relatively large. On the other hand, we find that a version of Thompson sampling with an asymptotically uninformative prior variance achieves near-optimal instance-specific regret scaling, including with large reward gaps. However, although the use of uninformative priors for Thompson sampling yields good regret properties, we show that the induced posterior beliefs are highly unstable over time.
    MaxViT: Multi-Axis Vision Transformer. (arXiv:2204.01697v3 [cs.CV] UPDATED)
    Transformers have recently gained significant attention in the computer vision community. However, the lack of scalability of self-attention mechanisms with respect to image size has limited their wide adoption in state-of-the-art vision backbones. In this paper we introduce an efficient and scalable attention model we call multi-axis attention, which consists of two aspects: blocked local and dilated global attention. These design choices allow global-local spatial interactions on arbitrary input resolutions with only linear complexity. We also present a new architectural element by effectively blending our proposed attention model with convolutions, and accordingly propose a simple hierarchical vision backbone, dubbed MaxViT, by simply repeating the basic building block over multiple stages. Notably, MaxViT is able to ''see'' globally throughout the entire network, even in earlier, high-resolution stages. We demonstrate the effectiveness of our model on a broad spectrum of vision tasks. On image classification, MaxViT achieves state-of-the-art performance under various settings: without extra data, MaxViT attains 86.5% ImageNet-1K top-1 accuracy; with ImageNet-21K pre-training, our model achieves 88.7% top-1 accuracy. For downstream tasks, MaxViT as a backbone delivers favorable performance on object detection as well as visual aesthetic assessment. We also show that our proposed model expresses strong generative modeling capability on ImageNet, demonstrating the superior potential of MaxViT blocks as a universal vision module. The source code and trained models will be available at https://github.com/google-research/maxvit.
    Empirical study of Machine Learning Classifier Evaluation Metrics behavior in Massively Imbalanced and Noisy data. (arXiv:2208.11904v1 [cs.LG])
    With growing credit card transaction volumes, the fraud percentages are also rising, including overhead costs for institutions to combat and compensate victims. The use of machine learning into the financial sector permits more effective protection against fraud and other economic crime. Suitably trained machine learning classifiers help proactive fraud detection, improving stakeholder trust and robustness against illicit transactions. However, the design of machine learning based fraud detection algorithms has been challenging and slow due the massively unbalanced nature of fraud data and the challenges of identifying the frauds accurately and completely to create a gold standard ground truth. Furthermore, there are no benchmarks or standard classifier evaluation metrics to measure and identify better performing classifiers, thus keeping researchers in the dark. In this work, we develop a theoretical foundation to model human annotation errors and extreme imbalance typical in real world fraud detection data sets. By conducting empirical experiments on a hypothetical classifier, with a synthetic data distribution approximated to a popular real world credit card fraud data set, we simulate human annotation errors and extreme imbalance to observe the behavior of popular machine learning classifier evaluation matrices. We demonstrate that a combined F1 score and g-mean, in that specific order, is the best evaluation metric for typical imbalanced fraud detection model classification.
    Towards Benchmarking Explainable Artificial Intelligence Methods. (arXiv:2208.12120v1 [cs.AI])
    The currently dominating artificial intelligence and machine learning technology, neural networks, builds on inductive statistical learning. Neural networks of today are information processing systems void of understanding and reasoning capabilities, consequently, they cannot explain promoted decisions in a humanly valid form. In this work, we revisit and use fundamental philosophy of science theories as an analytical lens with the goal of revealing, what can be expected, and more importantly, not expected, from methods that aim to explain decisions promoted by a neural network. By conducting a case study we investigate a selection of explainability method's performance over two mundane domains, animals and headgear. Through our study, we lay bare that the usefulness of these methods relies on human domain knowledge and our ability to understand, generalise and reason. The explainability methods can be useful when the goal is to gain further insights into a trained neural network's strengths and weaknesses. If our aim instead is to use these explainability methods to promote actionable decisions or build trust in ML-models they need to be less ambiguous than they are today. In this work, we conclude from our study, that benchmarking explainability methods, is a central quest towards trustworthy artificial intelligence and machine learning.
    Integrating Statistical and Machine Learning Approaches to Identify Receptive Field Structure in Neural Populations. (arXiv:2208.12025v1 [q-bio.NC])
    Neurons can code for multiple variables simultaneously and neuroscientists are often interested in classifying neurons based on their receptive field properties. Statistical models provide powerful tools for determining the factors influencing neural spiking activity and classifying individual neurons. However, as neural recording technologies have advanced to produce simultaneous spiking data from massive populations, classical statistical methods often lack the computational efficiency required to handle such data. Machine learning (ML) approaches are known for enabling efficient large scale data analyses; however, they typically require massive training sets with balanced data, along with accurate labels to fit well. Additionally, model assessment and interpretation are often more challenging for ML than for classical statistical methods. To address these challenges, we develop an integrated framework, combining statistical modeling and machine learning approaches to identify the coding properties of neurons from large populations. In order to demonstrate this framework, we apply these methods to data from a population of neurons recorded from rat hippocampus to characterize the distribution of spatial receptive fields in this region.
    Data Augmentation for Graph Data: Recent Advancements. (arXiv:2208.11973v1 [cs.LG])
    Graph Neural Network (GNNs) based methods have recently become a popular tool to deal with graph data because of their ability to incorporate structural information. The only hurdle in the performance of GNNs is the lack of labeled data. Data Augmentation techniques for images and text data can not be used for graph data because of the complex and non-euclidean structure of graph data. This gap has forced researchers to shift their focus towards the development of data augmentation techniques for graph data. Most of the proposed Graph Data Augmentation (GDA) techniques are task-specific. In this paper, we survey the existing GDA techniques based on different graph tasks. This survey not only provides a reference to the research community of GDA but also provides the necessary information to the researchers of other domains.
    Quo Vadis: Hybrid Machine Learning Meta-Model based on Contextual and Behavioral Malware Representations. (arXiv:2208.12248v1 [cs.CR])
    We propose a hybrid machine learning architecture that simultaneously employs multiple deep learning models analyzing contextual and behavioral characteristics of Windows portable executable, producing a final prediction based on a decision from the meta-model. The detection heuristic in contemporary machine learning Windows malware classifiers is typically based on the static properties of the sample since dynamic analysis through virtualization is challenging for vast quantities of samples. To surpass this limitation, we employ a Windows kernel emulation that allows the acquisition of behavioral patterns across large corpora with minimal temporal and computational costs. We partner with a security vendor for a collection of more than 100k int-the-wild samples that resemble the contemporary threat landscape, containing raw PE files and filepaths of applications at the moment of execution. The acquired dataset is at least ten folds larger than reported in related works on behavioral malware analysis. Files in the training dataset are labeled by a professional threat intelligence team, utilizing manual and automated reverse engineering tools. We estimate the hybrid classifier's operational utility by collecting an out-of-sample test set three months later from the acquisition of the training set. We report an improved detection rate, above the capabilities of the current state-of-the-art model, especially under low false-positive requirements. Additionally, we uncover a meta-model's ability to identify malicious activity in validation and test sets even if none of the individual models express enough confidence to mark the sample as malevolent. We conclude that the meta-model can learn patterns typical to malicious samples from representation combinations produced by different analysis techniques. We publicly release pre-trained models and anonymized dataset of emulation reports.
    Maximum Likelihood on the Joint (Data, Condition) Distribution for Solving Ill-Posed Problems with Conditional Flow Models. (arXiv:2208.11782v1 [cs.LG])
    I describe a trick for training flow models using a prescribed rule as a surrogate for maximum likelihood. The utility of this trick is limited for non-conditional models, but an extension of the approach, applied to maximum likelihood of the joint probability distribution of data and conditioning information, can be used to train sophisticated \textit{conditional} flow models. Unlike previous approaches, this method is quite simple: it does not require explicit knowledge of the distribution of conditions, auxiliary networks or other specific architecture, or additional loss terms beyond maximum likelihood, and it preserves the correspondence between latent and data spaces. The resulting models have all the properties of non-conditional flow models, are robust to unexpected inputs, and can predict the distribution of solutions conditioned on a given input. They come with guarantees of prediction representativeness and are a natural and powerful way to solve highly uncertain problems. I demonstrate these properties on easily visualized toy problems, then use the method to successfully generate class-conditional images and to reconstruct highly degraded images via super-resolution.
    Invariant Representation Driven Neural Classifier for Anti-QCD Jet Tagging. (arXiv:2201.07199v4 [hep-ph] UPDATED)
    We leverage representation learning and the inductive bias in neural-net-based Standard Model jet classification tasks, to detect non-QCD signal jets. In establishing the framework for classification-based anomaly detection in jet physics, we demonstrate that, with a \emph{well-calibrated} and \emph{powerful enough feature extractor}, a well-trained \emph{mass-decorrelated} supervised Standard Model neural jet classifier can serve as a strong generic anti-QCD jet tagger for effectively reducing the QCD background. Imposing \emph{data-augmented} mass-invariance (and thus decoupling the dominant factor) not only facilitates background estimation, but also induces more substructure-aware representation learning. We are able to reach excellent tagging efficiencies for all the test signals considered. In the best case, we reach a background rejection rate of 51 and a significance improvement factor of 3.6 at 50 \% signal acceptance, with jet mass decorrelated. This study indicates that supervised Standard Model jet classifiers have great potential in general new physics searches.
    Time Series Clustering with an EM algorithm for Mixtures of Linear Gaussian State Space Models. (arXiv:2208.11907v1 [cs.LG])
    In this paper, we consider the task of clustering a set of individual time series while modeling each cluster, that is, model-based time series clustering. The task requires a parametric model with sufficient flexibility to describe the dynamics in various time series. To address this problem, we propose a novel model-based time series clustering method with mixtures of linear Gaussian state space models, which have high flexibility. The proposed method uses a new expectation-maximization algorithm for the mixture model to estimate the model parameters, and determines the number of clusters using the Bayesian information criterion. Experiments on a simulated dataset demonstrate the effectiveness of the method in clustering, parameter estimation, and model selection. The method is applied to a real dataset for which previously proposed time series clustering methods exhibited low accuracy. Results showed that our method produces more accurate clustering results than those obtained using the previous methods.
    FAT Forensics: A Python Toolbox for Algorithmic Fairness, Accountability and Transparency. (arXiv:1909.05167v3 [cs.LG] UPDATED)
    Today, artificial intelligence systems driven by machine learning algorithms can be in a position to take important, and sometimes legally binding, decisions about our everyday lives. In many cases, however, these systems and their actions are neither regulated nor certified. To help counter the potential harm that such algorithms can cause we developed an open source toolbox that can analyse selected fairness, accountability and transparency aspects of the machine learning process: data (and their features), models and predictions, allowing to automatically and objectively report them to relevant stakeholders. In this paper we describe the design, scope, usage and impact of this Python package, which is published under the 3-Clause BSD open source licence.
    Wasserstein Task Embedding for Measuring Task Similarities. (arXiv:2208.11726v1 [cs.LG])
    Measuring similarities between different tasks is critical in a broad spectrum of machine learning problems, including transfer, multi-task, continual, and meta-learning. Most current approaches to measuring task similarities are architecture-dependent: 1) relying on pre-trained models, or 2) training networks on tasks and using forward transfer as a proxy for task similarity. In this paper, we leverage the optimal transport theory and define a novel task embedding for supervised classification that is model-agnostic, training-free, and capable of handling (partially) disjoint label sets. In short, given a dataset with ground-truth labels, we perform a label embedding through multi-dimensional scaling and concatenate dataset samples with their corresponding label embeddings. Then, we define the distance between two datasets as the 2-Wasserstein distance between their updated samples. Lastly, we leverage the 2-Wasserstein embedding framework to embed tasks into a vector space in which the Euclidean distance between the embedded points approximates the proposed 2-Wasserstein distance between tasks. We show that the proposed embedding leads to a significantly faster comparison of tasks compared to related approaches like the Optimal Transport Dataset Distance (OTDD). Furthermore, we demonstrate the effectiveness of our proposed embedding through various numerical experiments and show statistically significant correlations between our proposed distance and the forward and backward transfer between tasks.
    Learning Fair Representations via Rate-Distortion Maximization. (arXiv:2202.00035v2 [cs.LG] UPDATED)
    Text representations learned by machine learning models often encode undesirable demographic information of the user. Predictive models based on these representations can rely on such information, resulting in biased decisions. We present a novel debiasing technique, Fairness-aware Rate Maximization (FaRM), that removes protected information by making representations of instances belonging to the same protected attribute class uncorrelated, using the rate-distortion function. FaRM is able to debias representations with or without a target task at hand. FaRM can also be adapted to remove information about multiple protected attributes simultaneously. Empirical evaluations show that FaRM achieves state-of-the-art performance on several datasets, and learned representations leak significantly less protected attribute information against an attack by a non-linear probing network.
    Fed-FSNet: Mitigating Non-I.I.D. Federated Learning via Fuzzy Synthesizing Network. (arXiv:2208.12044v1 [cs.CR])
    Federated learning (FL) has emerged as a promising privacy-preserving distributed machine learning framework recently. It aims at collaboratively learning a shared global model by performing distributed training locally on edge devices and aggregating local models into a global one without centralized raw data sharing in the cloud server. However, due to the large local data heterogeneities (Non-I.I.D. data) across edge devices, the FL may easily obtain a global model that can produce more shifted gradients on local datasets, thereby degrading the model performance or even suffering from the non-convergence during training. In this paper, we propose a novel FL training framework, dubbed Fed-FSNet, using a properly designed Fuzzy Synthesizing Network (FSNet) to mitigate the Non-I.I.D. FL at-the-source. Concretely, we maintain an edge-agnostic hidden model in the cloud server to estimate a less-accurate while direction-aware inversion of the global model. The hidden model can then fuzzily synthesize several mimic I.I.D. data samples (sample features) conditioned on only the global model, which can be shared by edge devices to facilitate the FL training towards faster and better convergence. Moreover, since the synthesizing process involves neither access to the parameters/updates of local models nor analyzing individual local model outputs, our framework can still ensure the privacy of FL. Experimental results on several FL benchmarks demonstrate that our method can significantly mitigate the Non-I.I.D. issue and obtain better performance against other representative methods.
    AI-enhanced iterative solvers for accelerating the solution of large scale parametrized systems. (arXiv:2207.02543v3 [math.NA] UPDATED)
    Recent advances in the field of machine learning open a new era in high performance computing. Applications of machine learning algorithms for the development of accurate and cost-efficient surrogates of complex problems have already attracted major attention from scientists. Despite their powerful approximation capabilities, however, surrogates cannot produce the `exact' solution to the problem. To address this issue, this paper exploits up-to-date ML tools and delivers customized iterative solvers of linear equation systems, capable of solving large-scale parametrized problems at any desired level of accuracy. Specifically, the proposed approach consists of the following two steps. At first, a reduced set of model evaluations is performed and the corresponding solutions are used to establish an approximate mapping from the problem's parametric space to its solution space using deep feedforward neural networks and convolutional autoencoders. This mapping serves a means to obtain very accurate initial predictions of the system's response to new query points at negligible computational cost. Subsequently, an iterative solver inspired by the Algebraic Multigrid method in combination with Proper Orthogonal Decomposition, termed POD-2G, is developed that successively refines the initial predictions towards the exact system solutions. The application of POD-2G as a standalone solver or as preconditioner in the context of preconditioned conjugate gradient methods is demonstrated on several numerical examples of large scale systems, with the results indicating its superiority over conventional iterative solution schemes.
    Simple steps are all you need: Frank-Wolfe and generalized self-concordant functions. (arXiv:2105.13913v5 [math.OC] UPDATED)
    Generalized self-concordance is a key property present in the objective function of many important learning problems. We establish the convergence rate of a simple Frank-Wolfe variant that uses the open-loop step size strategy $\gamma_t = 2/(t+2)$, obtaining a $\mathcal{O}(1/t)$ convergence rate for this class of functions in terms of primal gap and Frank-Wolfe gap, where $t$ is the iteration count. This avoids the use of second-order information or the need to estimate local smoothness parameters of previous work. We also show improved convergence rates for various common cases, e.g., when the feasible region under consideration is uniformly convex or polyhedral.
    Contrastive Audio-Language Learning for Music. (arXiv:2208.12208v1 [cs.SD])
    As one of the most intuitive interfaces known to humans, natural language has the potential to mediate many tasks that involve human-computer interaction, especially in application-focused fields like Music Information Retrieval. In this work, we explore cross-modal learning in an attempt to bridge audio and language in the music domain. To this end, we propose MusCALL, a framework for Music Contrastive Audio-Language Learning. Our approach consists of a dual-encoder architecture that learns the alignment between pairs of music audio and descriptive sentences, producing multimodal embeddings that can be used for text-to-audio and audio-to-text retrieval out-of-the-box. Thanks to this property, MusCALL can be transferred to virtually any task that can be cast as text-based retrieval. Our experiments show that our method performs significantly better than the baselines at retrieving audio that matches a textual description and, conversely, text that matches an audio query. We also demonstrate that the multimodal alignment capability of our model can be successfully extended to the zero-shot transfer scenario for genre classification and auto-tagging on two public datasets.
    Efficient Planning in a Compact Latent Action Space. (arXiv:2208.10291v2 [cs.LG] UPDATED)
    While planning-based sequence modelling methods have shown great potential in continuous control, scaling them to high-dimensional state-action sequences remains an open challenge due to the high computational complexity and innate difficulty of planning in high-dimensional spaces. We propose the Trajectory Autoencoding Planner (TAP), a planning-based sequence modelling RL method that scales to high state-action dimensionalities. Using a state-conditional Vector-Quantized Variational Autoencoder (VQ-VAE), TAP models the conditional distribution of the trajectories given the current state. When deployed as an RL agent, TAP avoids planning step-by-step in a high-dimensional continuous action space but instead looks for the optimal latent code sequences by beam search. Unlike $O(D^3)$ complexity of Trajectory Transformer, TAP enjoys constant $O(C)$ planning computational complexity regarding state-action dimensionality $D$. Our empirical evaluation also shows the increasingly strong performance of TAP with the growing dimensionality. For Adroit robotic hand manipulation tasks with high state and action dimensionality, TAP surpasses existing model-based methods, including TT, with a large margin and also beats strong model-free actor-critic baselines.
    Community Detection in the Hypergraph SBM: Optimal Recovery Given the Similarity Matrix. (arXiv:2208.12227v1 [cs.SI])
    Community detection is a fundamental problem in network science. In this paper, we consider community detection in hypergraphs drawn from the $hypergraph$ $stochastic$ $block$ $model$ (HSBM), with a focus on exact community recovery. We study the performance of polynomial-time algorithms for community detection in a case where the full hypergraph is unknown. Instead, we are provided a $similarity$ $matrix$ $W$, where $W_{ij}$ reports the number of hyperedges containing both $i$ and $j$. Under this information model, Kim, Bandeira, and Goemans [KBG18] determined the information-theoretic threshold for exact recovery, and proposed a semidefinite programming relaxation which they conjectured to be optimal. In this paper, we confirm this conjecture. We also show that a simple, highly efficient spectral algorithm is optimal, establishing the spectral algorithm as the method of choice. Our analysis of the spectral algorithm crucially relies on strong $entrywise$ bounds on the eigenvectors of $W$. Our bounds are inspired by the work of Abbe, Fan, Wang, and Zhong [AFWZ20], who developed entrywise bounds for eigenvectors of symmetric matrices with independent entries. Despite the complex dependency structure in similarity matrices, we prove similar entrywise guarantees.
    Reduced-PINN: An Integration-Based Physics-Informed Neural Networks for Stiff ODEs. (arXiv:2208.12045v1 [cs.LG])
    Physics-informed neural networks (PINNs) have recently received much attention due to their capabilities in solving both forward and inverse problems. For training a deep neural network associated with a PINN, one typically constructs a total loss function using a weighted sum of different loss terms and then tries to minimize that. This approach often becomes problematic for solving stiff equations since it cannot consider adaptive increments. Many studies reported the poor performance of the PINN and its challenges in simulating stiff chemical active issues with administering conditions of stiff ordinary differential conditions (ODEs). Studies show that stiffness is the primary cause of the failure of the PINN in simulating stiff kinetic systems. Here, we address this issue by proposing a reduced weak-form of the loss function, which led to a new PINN architecture, further named as Reduced-PINN, that utilizes a reduced-order integration method to enable the PINN to solve stiff chemical kinetics. The proposed Reduced-PINN can be applied to various reaction-diffusion systems involving stiff dynamics. To this end, we transform initial value problems (IVPs) to their equivalent integral forms and solve the resulting integral equations using physics-informed neural networks. In our derived integral-based optimization process, there is only one term without explicitly incorporating loss terms associated with ordinary differential equation (ODE) and initial conditions (ICs). To illustrate the capabilities of Reduced-PINN, we used it to simulate multiple stiff/mild second-order ODEs. We show that Reduced-PINN captures the solution accurately for a stiff scalar ODE. We also validated the Reduced-PINN against a stiff system of linear ODEs.
    On Reality and the Limits of Language Data. (arXiv:2208.11981v1 [cs.CL])
    Recent advances in neural network language models have shown that it is possible to derive expressive meaning representations by leveraging linguistic associations in large-scale natural language data. These potentially Gestalt representations have enabled state-of-the-art performance for many practical applications. It would appear that we are on a pathway to empirically deriving a robust and expressive computable semantics. A key question that arises is how far can language data alone enable computers to understand the necessary truth about the physical world? Attention to this question is warranted because our future interactions with intelligent machines depends on how well our techniques correctly represent and process the concepts (objects, properties, and processes) that humans commonly observe to be true. After reviewing existing protocols, the objective of this work is to explore this question using a novel and tightly controlled reasoning test and to highlight what models might learn directly from pure linguistic data.
    Fix-A-Step: Effective Semi-supervised Learning from Uncurated Unlabeled Sets. (arXiv:2208.11870v1 [cs.LG])
    Semi-supervised learning (SSL) promises gains in accuracy compared to training classifiers on small labeled datasets by also training on many unlabeled images. In realistic applications like medical imaging, unlabeled sets will be collected for expediency and thus uncurated: possibly different from the labeled set in represented classes or class frequencies. Unfortunately, modern deep SSL often makes accuracy worse when given uncurated unlabeled sets. Recent remedies suggest filtering approaches that detect out-of-distribution unlabeled examples and then discard or downweight them. Instead, we view all unlabeled examples as potentially helpful. We introduce a procedure called Fix-A-Step that can improve heldout accuracy of common deep SSL methods despite lack of curation. The key innovations are augmentations of the labeled set inspired by all unlabeled data and a modification of gradient descent updates to prevent following the multi-task SSL loss from hurting labeled-set accuracy. Though our method is simpler than alternatives, we show consistent accuracy gains on CIFAR-10 and CIFAR-100 benchmarks across all tested levels of artificial contamination for the unlabeled sets. We further suggest a real medical benchmark for SSL: recognizing the view type of ultrasound images of the heart. Our method can learn from 353,500 truly uncurated unlabeled images to deliver gains that generalize across hospitals.  ( 2 min )
    Towards Unsupervised HPO for Outlier Detection. (arXiv:2208.11727v1 [cs.LG])
    Given an unsupervised outlier detection (OD) algorithm, how can we optimize its hyperparameter(s) (HP) on a new dataset, without any labels? In this work, we address this challenging hyperparameter optimization for unsupervised OD problem, and propose the first systematic approach called HPOD that is based on meta-learning. HPOD capitalizes on the prior performance of a large collection of HPs on existing OD benchmark datasets, and transfers this information to enable HP evaluation on a new dataset without labels. Moreover, HPOD adapts (originally supervised) sequential model-based optimization to identify promising HPs efficiently. Extensive experiments show that HPOD works with both deep (e.g., Robust AutoEncoder) and shallow (e.g., Local Outlier Factor (LOF) and Isolation Forest (iForest)) OD algorithms on both discrete and continuous HP spaces, and outperforms a wide range of baselines with on average 58% and 66% performance improvement over the default HPs of LOF and iForest.  ( 2 min )
    Rethinking Cost-sensitive Classification in Deep Learning via Adversarial Data Augmentation. (arXiv:2208.11739v1 [cs.LG])
    Cost-sensitive classification is critical in applications where misclassification errors widely vary in cost. However, over-parameterization poses fundamental challenges to the cost-sensitive modeling of deep neural networks (DNNs). The ability of a DNN to fully interpolate a training dataset can render a DNN, evaluated purely on the training set, ineffective in distinguishing a cost-sensitive solution from its overall accuracy maximization counterpart. This necessitates rethinking cost-sensitive classification in DNNs. To address this challenge, this paper proposes a cost-sensitive adversarial data augmentation (CSADA) framework to make over-parameterized models cost-sensitive. The overarching idea is to generate targeted adversarial examples that push the decision boundary in cost-aware directions. These targeted adversarial samples are generated by maximizing the probability of critical misclassifications and used to train a model with more conservative decisions on costly pairs. Experiments on well-known datasets and a pharmacy medication image (PMI) dataset made publicly available show that our method can effectively minimize the overall cost and reduce critical errors, while achieving comparable performance in terms of overall accuracy.  ( 2 min )
    On Differential Privacy for Federated Learning in Wireless Systems with Multiple Base Stations. (arXiv:2208.11848v1 [cs.CR])
    In this work, we consider a federated learning model in a wireless system with multiple base stations and inter-cell interference. We apply a differential private scheme to transmit information from users to their corresponding base station during the learning phase. We show the convergence behavior of the learning process by deriving an upper bound on its optimality gap. Furthermore, we define an optimization problem to reduce this upper bound and the total privacy leakage. To find the locally optimal solutions of this problem, we first propose an algorithm that schedules the resource blocks and users. We then extend this scheme to reduce the total privacy leakage by optimizing the differential privacy artificial noise. We apply the solutions of these two procedures as parameters of a federated learning system. In this setting, we assume that each user is equipped with a classifier. Moreover, the communication cells are assumed to have mostly fewer resource blocks than numbers of users. The simulation results show that our proposed scheduler improves the average accuracy of the predictions compared with a random scheduler. Furthermore, its extended version with noise optimizer significantly reduces the amount of privacy leakage.  ( 2 min )
    CNN-based Prediction of Network Robustness With Missing Edges. (arXiv:2208.11847v1 [eess.SY])
    Connectivity and controllability of a complex network are two important issues that guarantee a networked system to function. Robustness of connectivity and controllability guarantees the system to function properly and stably under various malicious attacks. Evaluating network robustness using attack simulations is time consuming, while the convolutional neural network (CNN)-based prediction approach provides a cost-efficient method to approximate the network robustness. In this paper, we investigate the performance of CNN-based approaches for connectivity and controllability robustness prediction, when partial network information is missing, namely the adjacency matrix is incomplete. Extensive experimental studies are carried out. A threshold is explored that if a total amount of more than 7.29\% information is lost, the performance of CNN-based prediction will be significantly degenerated for all cases in the experiments. Two scenarios of missing edge representations are compared, 1) a missing edge is marked `no edge' in the input for prediction, and 2) a missing edge is denoted using a special marker of `unknown'. Experimental results reveal that the first representation is misleading to the CNN-based predictors.  ( 2 min )
    Enforcing Delayed-Impact Fairness Guarantees. (arXiv:2208.11744v1 [cs.LG])
    Recent research has shown that seemingly fair machine learning models, when used to inform decisions that have an impact on peoples' lives or well-being (e.g., applications involving education, employment, and lending), can inadvertently increase social inequality in the long term. This is because prior fairness-aware algorithms only consider static fairness constraints, such as equal opportunity or demographic parity. However, enforcing constraints of this type may result in models that have negative long-term impact on disadvantaged individuals and communities. We introduce ELF (Enforcing Long-term Fairness), the first classification algorithm that provides high-confidence fairness guarantees in terms of long-term, or delayed, impact. We prove that the probability that ELF returns an unfair solution is less than a user-specified tolerance and that (under mild assumptions), given sufficient training data, ELF is able to find and return a fair solution if one exists. We show experimentally that our algorithm can successfully mitigate long-term unfairness.  ( 2 min )
    A Survey of Open Source Automation Tools for Data Science Predictions. (arXiv:2208.11792v1 [cs.LG])
    We present an expository overview of technical and cultural challenges to the development and adoption of automation at various stages in the data science prediction lifecycle, restricting focus to supervised learning with structured datasets. In addition, we review popular open source Python tools implementing common solution patterns for the automation challenges and highlight gaps where we feel progress still demands to be made.  ( 2 min )
    A Perturbation Resistant Transformation and Classification System for Deep Neural Networks. (arXiv:2208.11839v1 [cs.CV])
    Deep convolutional neural networks accurately classify a diverse range of natural images, but may be easily deceived when designed, imperceptible perturbations are embedded in the images. In this paper, we design a multi-pronged training, input transformation, and image ensemble system that is attack agnostic and not easily estimated. Our system incorporates two novel features. The first is a transformation layer that computes feature level polynomial kernels from class-level training data samples and iteratively updates input image copies at inference time based on their feature kernel differences to create an ensemble of transformed inputs. The second is a classification system that incorporates the prediction of the undefended network with a hard vote on the ensemble of filtered images. Our evaluations on the CIFAR10 dataset show our system improves the robustness of an undefended network against a variety of bounded and unbounded white-box attacks under different distance metrics, while sacrificing little accuracy on clean images. Against adaptive full-knowledge attackers creating end-to-end attacks, our system successfully augments the existing robustness of adversarially trained networks, for which our methods are most effectively applied.  ( 2 min )
    Entropy Regularization for Population Estimation. (arXiv:2208.11747v1 [cs.LG])
    Entropy regularization is known to improve exploration in sequential decision-making problems. We show that this same mechanism can also lead to nearly unbiased and lower-variance estimates of the mean reward in the optimize-and-estimate structured bandit setting. Mean reward estimation (i.e., population estimation) tasks have recently been shown to be essential for public policy settings where legal constraints often require precise estimates of population metrics. We show that leveraging entropy and KL divergence can yield a better trade-off between reward and estimator variance than existing baselines, all while remaining nearly unbiased. These properties of entropy regularization illustrate an exciting potential for bridging the optimal exploration and estimation literatures.  ( 2 min )
    NeuralUQ: A comprehensive library for uncertainty quantification in neural differential equations and operators. (arXiv:2208.11866v1 [cs.LG])
    Uncertainty quantification (UQ) in machine learning is currently drawing increasing research interest, driven by the rapid deployment of deep neural networks across different fields, such as computer vision, natural language processing, and the need for reliable tools in risk-sensitive applications. Recently, various machine learning models have also been developed to tackle problems in the field of scientific computing with applications to computational science and engineering (CSE). Physics-informed neural networks and deep operator networks are two such models for solving partial differential equations and learning operator mappings, respectively. In this regard, a comprehensive study of UQ methods tailored specifically for scientific machine learning (SciML) models has been provided in [45]. Nevertheless, and despite their theoretical merit, implementations of these methods are not straightforward, especially in large-scale CSE applications, hindering their broad adoption in both research and industry settings. In this paper, we present an open-source Python library (https://github.com/Crunch-UQ4MI), termed NeuralUQ and accompanied by an educational tutorial, for employing UQ methods for SciML in a convenient and structured manner. The library, designed for both educational and research purposes, supports multiple modern UQ methods and SciML models. It is based on a succinct workflow and facilitates flexible employment and easy extensions by the users. We first present a tutorial of NeuralUQ and subsequently demonstrate its applicability and efficiency in four diverse examples, involving dynamical systems and high-dimensional parametric and time-dependent PDEs.  ( 3 min )
    AI-coupled HPC Workflows. (arXiv:2208.11745v1 [cs.DC])
    Increasingly, scientific discovery requires sophisticated and scalable workflows. Workflows have become the ``new applications,'' wherein multi-scale computing campaigns comprise multiple and heterogeneous executable tasks. In particular, the introduction of AI/ML models into the traditional HPC workflows has been an enabler of highly accurate modeling, typically reducing computational needs compared to traditional methods. This chapter discusses various modes of integrating AI/ML models to HPC computations, resulting in diverse types of AI-coupled HPC workflows. The increasing need of coupling AI/ML and HPC across scientific domains is motivated, and then exemplified by a number of production-grade use cases for each mode. We additionally discuss the primary challenges of extreme-scale AI-coupled HPC campaigns -- task heterogeneity, adaptivity, performance -- and several framework and middleware solutions which aim to address them. While both HPC workflow and AI/ML computing paradigms are independently effective, we highlight how their integration, and ultimate convergence, is leading to significant improvements in scientific performance across a range of domains, ultimately resulting in scientific explorations otherwise unattainable.  ( 2 min )
    GAN-based generative modelling for dermatological applications -- comparative study. (arXiv:2208.11702v1 [eess.IV])
    The lack of sufficiently large open medical databases is one of the biggest challenges in AI-powered healthcare. Synthetic data created using Generative Adversarial Networks (GANs) appears to be a good solution to mitigate the issues with privacy policies. The other type of cure is decentralized protocol across multiple medical institutions without exchanging local data samples. In this paper, we explored unconditional and conditional GANs in centralized and decentralized settings. The centralized setting imitates studies on large but highly unbalanced skin lesion dataset, while the decentralized one simulates a more realistic hospital scenario with three institutions. We evaluated models' performance in terms of fidelity, diversity, speed of training, and predictive ability of classifiers trained on the generated synthetic data. In addition we provided explainability through exploration of latent space and embeddings projection focused both on global and local explanations. Calculated distance between real images and their projections in the latent space proved the authenticity and generalization of trained GANs, which is one of the main concerns in this type of applications. The open source code for conducted studies is publicly available at \url{https://github.com/aidotse/stylegan2-ada-pytorch}.  ( 3 min )
    Multiresolution Neural Networks for Imaging. (arXiv:2208.11813v1 [cs.CV])
    We present MR-Net, a general architecture for multiresolution neural networks, and a framework for imaging applications based on this architecture. Our coordinate-based networks are continuous both in space and in scale as they are composed of multiple stages that progressively add finer details. Besides that, they are a compact and efficient representation. We show examples of multiresolution image representation and applications to texture magnification and minification, and antialiasing.  ( 2 min )
    Shortcut Learning of Large Language Models in Natural Language Understanding: A Survey. (arXiv:2208.11857v1 [cs.CL])
    Large language models (LLMs) have achieved state-of-the-art performance on a series of natural language understanding tasks. However, these LLMs might rely on dataset bias and artifacts as shortcuts for prediction. This has significantly hurt their Out-of-Distribution (OOD) generalization and adversarial robustness. In this paper, we provide a review of recent developments that address the robustness challenge of LLMs. We first introduce the concepts and robustness challenge of LLMs. We then introduce methods to identify shortcut learning behavior in LLMs, characterize the reasons for shortcut learning, as well as introduce mitigation solutions. Finally, we identify key challenges and introduce the connections of this line of research to other directions.  ( 2 min )
    gSwin: Gated MLP Vision Model with Hierarchical Structure of Shifted Window. (arXiv:2208.11718v1 [cs.CV])
    Following the success in language domain, the self-attention mechanism (transformer) is adopted in the vision domain and achieving great success recently. Additionally, as another stream, multi-layer perceptron (MLP) is also explored in the vision domain. These architectures, other than traditional CNNs, have been attracting attention recently, and many methods have been proposed. As one that combines parameter efficiency and performance with locality and hierarchy in image recognition, we propose gSwin, which merges the two streams; Swin Transformer and (multi-head) gMLP. We showed that our gSwin can achieve better accuracy on three vision tasks, image classification, object detection and semantic segmentation, than Swin Transformer, with smaller model size.  ( 2 min )
    EEG4Students: An Experimental Design for EEG Data Collection and Machine Learning Analysis. (arXiv:2208.11743v1 [cs.LG])
    Using Machine Learning and Deep Learning to predict cognitive tasks from electroencephalography (EEG) signals has been a fast-developing area in Brain-Computer Interfaces (BCI). However, during the COVID-19 pandemic, data collection and analysis could be more challenging. The remote experiment during the pandemic yields several challenges, and we discuss the possible solutions. This paper explores machine learning algorithms that can run efficiently on personal computers for BCI classification tasks. The results show that Random Forest and RBF SVM perform well for EEG classification tasks. Furthermore, we investigate how to conduct such BCI experiments using affordable consumer-grade devices to collect EEG-based BCI data. In addition, we have developed the data collection protocol, EEG4Students, that grants non-experts who are interested in a guideline for such data collection. Our code and data can be found at https://github.com/GuangyaoDou/EEG4Students.  ( 2 min )
    Ontology-Driven Self-Supervision for Adverse Childhood Experiences Identification Using Social Media Datasets. (arXiv:2208.11701v1 [cs.CL])
    Adverse Childhood Experiences (ACEs) are defined as a collection of highly stressful, and potentially traumatic, events or circumstances that occur throughout childhood and/or adolescence. They have been shown to be associated with increased risks of mental health diseases or other abnormal behaviours in later lives. However, the identification of ACEs from textual data with Natural Language Processing (NLP) is challenging because (a) there are no NLP ready ACE ontologies; (b) there are few resources available for machine learning, necessitating the data annotation from clinical experts; (c) costly annotations by domain experts and large number of documents for supporting large machine learning models. In this paper, we present an ontology-driven self-supervised approach (derive concept embeddings using an auto-encoder from baseline NLP results) for producing a publicly available resource that would support large-scale machine learning (e.g., training transformer based large language models) on social media corpus. This resource as well as the proposed approach are aimed to facilitate the community in training transferable NLP models for effectively surfacing ACEs in low-resource scenarios like NLP on clinical notes within Electronic Health Records. The resource including a list of ACE ontology terms, ACE concept embeddings and the NLP annotated corpus is available at https://github.com/knowlab/ACE-NLP.  ( 2 min )
    Deep Learning-based approaches for automatic detection of shell nouns and evaluation on WikiText-2. (arXiv:2208.11867v1 [cs.CL])
    In some areas, such as Cognitive Linguistics, researchers are still using traditional techniques based on manual rules and patterns. Since the definition of shell noun is rather subjective and there are many exceptions, this time-consuming work had to be done by hand in the past when Deep Learning techniques were not mature enough. With the increasing number of networked languages, these rules are becoming less useful. However, there is a better alternative now. With the development of Deep Learning, pre-trained language models have provided a good technical basis for Natural Language Processing. Automated processes based on Deep Learning approaches are more in line with modern needs. This paper collaborates across borders to propose two Neural Network models for the automatic detection of shell nouns and experiment on the WikiText-2 dataset. The proposed approaches not only allow the entire process to be automated, but the precision has reached 94% even on completely unseen articles, comparable to that of human annotators. This shows that the performance and generalization ability of the model is good enough to be used for research purposes. Many new nouns are found that fit the definition of shell noun very well. All discovered shell nouns as well as pre-trained models and code are available on GitHub.  ( 3 min )
    Learning Task Automata for Reinforcement Learning using Hidden Markov Models. (arXiv:2208.11838v1 [cs.LG])
    Training reinforcement learning (RL) agents using scalar reward signals is often infeasible when an environment has sparse and non-Markovian rewards. Moreover, handcrafting these reward functions before training is prone to misspecification, especially when the environment's dynamics are only partially known. This paper proposes a novel pipeline for learning non-Markovian task specifications as succinct finite-state `task automata' from episodes of agent experience within unknown environments. We leverage two key algorithmic insights. First, we learn a product MDP, a model composed of the specification's automaton and the environment's MDP (both initially unknown), by treating it as a partially observable MDP and using off-the-shelf algorithms for hidden Markov models. Second, we propose a novel method for distilling the task automaton (assumed to be a deterministic finite automaton) from the learnt product MDP. Our learnt task automaton enables the decomposition of a task into its constituent sub-tasks, which improves the rate at which an RL agent can later synthesise an optimal policy. It also provides an interpretable encoding of high-level environmental and task features, so a human can readily verify that the agent has learnt coherent tasks with no misspecifications. In addition, we take steps towards ensuring that the learnt automaton is environment-agnostic, making it well-suited for use in transfer learning. Finally, we provide experimental results to illustrate our algorithm's performance in different environments and tasks and its ability to incorporate prior domain knowledge to facilitate more efficient learning.  ( 3 min )
    An Empirical Analysis of the Efficacy of Different Sampling Techniques for Imbalanced Classification. (arXiv:2208.11852v1 [cs.LG])
    Learning from imbalanced data is a challenging task. Standard classification algorithms tend to perform poorly when trained on imbalanced data. Some special strategies need to be adopted, either by modifying the data distribution or by redesigning the underlying classification algorithm to achieve desirable performance. The prevalence of imbalance in real-world datasets has led to the creation of a multitude of strategies for the class imbalance issue. However, not all the strategies are useful or provide good performance in different imbalance scenarios. There are numerous approaches to dealing with imbalanced data, but the efficacy of such techniques or an experimental comparison among those techniques has not been conducted. In this study, we present a comprehensive analysis of 26 popular sampling techniques to understand their effectiveness in dealing with imbalanced data. Rigorous experiments have been conducted on 50 datasets with different degrees of imbalance to thoroughly investigate the performance of these techniques. A detailed discussion of the advantages and limitations of the techniques, as well as how to overcome such limitations, has been presented. We identify some critical factors that affect the sampling strategies and provide recommendations on how to choose an appropriate sampling technique for a particular application.  ( 2 min )
  • Open

    Causal Entropy Optimization. (arXiv:2208.10981v1 [cs.LG] CROSS LISTED)
    We study the problem of globally optimizing the causal effect on a target variable of an unknown causal graph in which interventions can be performed. This problem arises in many areas of science including biology, operations research and healthcare. We propose Causal Entropy Optimization (CEO), a framework that generalizes Causal Bayesian Optimization (CBO) to account for all sources of uncertainty, including the one arising from the causal graph structure. CEO incorporates the causal structure uncertainty both in the surrogate models for the causal effects and in the mechanism used to select interventions via an information-theoretic acquisition function. The resulting algorithm automatically trades-off structure learning and causal effect optimization, while naturally accounting for observation noise. For various synthetic and real-world structural causal models, CEO achieves faster convergence to the global optimum compared with CBO while also learning the graph. Furthermore, our joint approach to structure learning and causal optimization improves upon sequential, structure-learning-first approaches.
    Causal Strategic Linear Regression. (arXiv:2002.10066v3 [cs.LG] UPDATED)
    In many predictive decision-making scenarios, such as credit scoring and academic testing, a decision-maker must construct a model that accounts for agents' propensity to "game" the decision rule by changing their features so as to receive better decisions. Whereas the strategic classification literature has previously assumed that agents' outcomes are not causally affected by their features (and thus that strategic agents' goal is deceiving the decision-maker), we join concurrent work in modeling agents' outcomes as a function of their changeable attributes. As our main contribution, we provide efficient algorithms for learning decision rules that optimize three distinct decision-maker objectives in a realizable linear setting: accurately predicting agents' post-gaming outcomes (prediction risk minimization), incentivizing agents to improve these outcomes (agent outcome maximization), and estimating the coefficients of the true underlying model (parameter estimation). Our algorithms circumvent a hardness result of Miller et al. (2020) by allowing the decision maker to test a sequence of decision rules and observe agents' responses, in effect performing causal interventions through the decision rules.
    Development of Sleep State Trend (SST), a bedside measure of neonatal sleep state fluctuations based on single EEG channels. (arXiv:2208.11933v1 [eess.SP])
    Objective: To develop and validate an automated method for bedside monitoring of sleep state fluctuations in neonatal intensive care units. Methods: A deep learning -based algorithm was designed and trained using 53 EEG recordings from a long-term (a)EEG monitoring in 30 near-term neonates. The results were validated using an external dataset from 30 polysomnography recordings. In addition to training and validating a single EEG channel quiet sleep detector, we constructed Sleep State Trend (SST), a bedside-ready means for visualizing classifier outputs. Results: The accuracy of quiet sleep detection in the training data was 90%, and the accuracy was comparable (85-86%) in all bipolar derivations available from the 4-electrode recordings. The algorithm generalized well to an external dataset, showing 81% overall accuracy despite different signal derivations. SST allowed an intuitive, clear visualization of the classifier output. Conclusions: Fluctuations in sleep states can be detected at high fidelity from a single EEG channel, and the results can be visualized as a transparent and intuitive trend in the bedside monitors. Significance: The Sleep State Trend (SST) may provide caregivers a real-time view of sleep state fluctuations and its cyclicity.
    A derivation of variational message passing (VMP) for latent Dirichlet allocation (LDA). (arXiv:2111.01480v2 [cs.LG] UPDATED)
    Latent Dirichlet Allocation (LDA) is a probabilistic model used to uncover latent topics in a corpus of documents. Inference is often performed using variational Bayes (VB) algorithms, which calculate a lower bound to the posterior distribution over the parameters. Deriving the variational update equations for new models requires considerable manual effort; variational message passing (VMP) has emerged as a "black-box" tool to expedite the process of variational inference. But applying VMP in practice still presents subtle challenges, and the existing literature does not contain the steps that are necessary to implement VMP for the standard smoothed LDA model, nor are available black-box probabilistic graphical modelling software able to do the word-topic updates necessary to implement LDA. In this paper, we therefore present a detailed derivation of the VMP update equations for LDA. We see this as a first step to enabling other researchers to calculate the VMP updates for similar graphical models.
    Lagrangian and Hamiltonian Mechanics for Probabilities on the Statistical Manifold. (arXiv:2009.09431v2 [math.ST] UPDATED)
    We provide an Information-Geometric formulation of Classical Mechanics on the Riemannian manifold of probability distributions, which is an affine manifold endowed with a dually-flat connection. In a non-parametric formalism, we consider the full set of positive probability functions on a finite sample space, and we provide a specific expression for the tangent and cotangent spaces over the statistical manifold, in terms of a Hilbert bundle structure that we call the Statistical Bundle. In this setting, we compute velocities and accelerations of a one-dimensional statistical model using the canonical dual pair of parallel transports and define a coherent formalism for Lagrangian and Hamiltonian mechanics on the bundle. Finally, in a series of examples, we show how our formalism provides a consistent framework for accelerated natural gradient dynamics on the probability simplex, paving the way for direct applications in optimization, game theory and neural networks.
    Causal Inference with Corrupted Data: Measurement Error, Missing Values, Discretization, and Differential Privacy. (arXiv:2107.02780v3 [econ.EM] UPDATED)
    The 2020 US Census will be published with differential privacy, implemented by injecting synthetic noise into the data. Controversy has ensued, with debates that center on the painful trade-off between the privacy of respondents and the precision of economic analysis. Is this trade-off inevitable? To answer this question, we formulate a semiparametric model of causal inference with high dimensional data that may be noisy, missing, discretized, or privatized. We propose a new end-to-end procedure for data cleaning, estimation, and inference with data cleaning-adjusted confidence intervals. We prove consistency, Gaussian approximation, and semiparametric efficiency by finite sample arguments. The rate of Gaussian approximation is $n^{-1/2}$ for semiparametric estimands such as average treatment effect, and it degrades gracefully for nonparametric estimands such as heterogeneous treatment effect. Our key assumption is that the true covariates are approximately low rank, which we interpret as approximate repeated measurements and validate in the Census. In our analysis, we provide nonasymptotic theoretical contributions to matrix completion, statistical learning, and semiparametric statistics. We verify the coverage of the data cleaning-adjusted confidence intervals in simulations. Finally, we conduct a semi-synthetic exercise calibrated to privacy levels mandated for the 2020 US Census.
    Nonparametric Gaussian Mixture Models for the Multi-Armed Bandit. (arXiv:1808.02932v4 [stat.ML] UPDATED)
    We here adopt Bayesian nonparametric mixture models to extend multi-armed bandits in general, and Thompson sampling in particular, to scenarios where there is reward model uncertainty. In the stochastic multi-armed bandit, the reward for the played arm is generated from an unknown distribution. Reward uncertainty, i.e., the lack of knowledge about the reward-generating distribution, induces the exploration-exploitation trade-off: a bandit agent needs to simultaneously learn the properties of the reward distribution and sequentially decide which action to take next. In this work, we extend Thompson sampling to scenarios where there is reward model uncertainty by adopting Bayesian nonparametric Gaussian mixture models for flexible reward density estimation. The proposed Bayesian nonparametric mixture model Thompson sampling sequentially learns the reward model that best approximates the true, yet unknown, per-arm reward distribution, achieving successful regret performance. We derive, based on a novel posterior convergence based analysis, an asymptotic regret bound for the proposed method. In addition, we empirically evaluate its performance in diverse and previously elusive bandit environments, e.g., with rewards not in the exponential family, subject to outliers, and with different per-arm reward distributions. We show that the proposed Bayesian nonparametric Thompson sampling outperforms, both in averaged cumulative regret and in regret volatility, state-of-the-art alternatives. The proposed method is valuable in the presence of bandit reward model uncertainty, as it avoids stringent case-by-case model design choices, yet provides important regret savings.
    FAT Forensics: A Python Toolbox for Algorithmic Fairness, Accountability and Transparency. (arXiv:1909.05167v3 [cs.LG] UPDATED)
    Today, artificial intelligence systems driven by machine learning algorithms can be in a position to take important, and sometimes legally binding, decisions about our everyday lives. In many cases, however, these systems and their actions are neither regulated nor certified. To help counter the potential harm that such algorithms can cause we developed an open source toolbox that can analyse selected fairness, accountability and transparency aspects of the machine learning process: data (and their features), models and predictions, allowing to automatically and objectively report them to relevant stakeholders. In this paper we describe the design, scope, usage and impact of this Python package, which is published under the 3-Clause BSD open source licence.
    A conditional one-output likelihood formulation for multitask Gaussian processes. (arXiv:2006.03495v4 [cs.LG] UPDATED)
    Multitask Gaussian processes (MTGP) are the Gaussian process (GP) framework's solution for multioutput regression problems in which the $T$ elements of the regressors cannot be considered conditionally independent given the observations. Standard MTGP models assume that there exist both a multitask covariance matrix as a function of an intertask matrix, and a noise covariance matrix. These matrices need to be approximated by a low rank simplification of order $P$ in order to reduce the number of parameters to be learnt from $T^2$ to $TP$. Here we introduce a novel approach that simplifies the multitask learning by reducing it to a set of conditioned univariate GPs without the need for any low rank approximations, therefore completely eliminating the requirement to select an adequate value for hyperparameter $P$. At the same time, by extending this approach with both a hierarchical and an approximate model, the proposed extensions are capable of recovering the multitask covariance and noise matrices after learning only $2T$ parameters, avoiding the validation of any model hyperparameter and reducing the overall complexity of the model as well as the risk of overfitting. Experimental results over synthetic and real problems confirm the advantages of this inference approach in its ability to accurately recover the original noise and signal matrices, as well as the achieved performance improvement in comparison to other state of art MTGP approaches. We have also integrated the model with standard GP toolboxes, showing that it is computationally competitive with state of the art options.
    Rethinking Cost-sensitive Classification in Deep Learning via Adversarial Data Augmentation. (arXiv:2208.11739v1 [cs.LG])
    Cost-sensitive classification is critical in applications where misclassification errors widely vary in cost. However, over-parameterization poses fundamental challenges to the cost-sensitive modeling of deep neural networks (DNNs). The ability of a DNN to fully interpolate a training dataset can render a DNN, evaluated purely on the training set, ineffective in distinguishing a cost-sensitive solution from its overall accuracy maximization counterpart. This necessitates rethinking cost-sensitive classification in DNNs. To address this challenge, this paper proposes a cost-sensitive adversarial data augmentation (CSADA) framework to make over-parameterized models cost-sensitive. The overarching idea is to generate targeted adversarial examples that push the decision boundary in cost-aware directions. These targeted adversarial samples are generated by maximizing the probability of critical misclassifications and used to train a model with more conservative decisions on costly pairs. Experiments on well-known datasets and a pharmacy medication image (PMI) dataset made publicly available show that our method can effectively minimize the overall cost and reduce critical errors, while achieving comparable performance in terms of overall accuracy.
    Simple steps are all you need: Frank-Wolfe and generalized self-concordant functions. (arXiv:2105.13913v5 [math.OC] UPDATED)
    Generalized self-concordance is a key property present in the objective function of many important learning problems. We establish the convergence rate of a simple Frank-Wolfe variant that uses the open-loop step size strategy $\gamma_t = 2/(t+2)$, obtaining a $\mathcal{O}(1/t)$ convergence rate for this class of functions in terms of primal gap and Frank-Wolfe gap, where $t$ is the iteration count. This avoids the use of second-order information or the need to estimate local smoothness parameters of previous work. We also show improved convergence rates for various common cases, e.g., when the feasible region under consideration is uniformly convex or polyhedral.
    Adversarial Bayesian Simulation. (arXiv:2208.12113v1 [stat.ME])
    In the absence of explicit or tractable likelihoods, Bayesians often resort to approximate Bayesian computation (ABC) for inference. Our work bridges ABC with deep neural implicit samplers based on generative adversarial networks (GANs) and adversarial variational Bayes. Both ABC and GANs compare aspects of observed and fake data to simulate from posteriors and likelihoods, respectively. We develop a Bayesian GAN (B-GAN) sampler that directly targets the posterior by solving an adversarial optimization problem. B-GAN is driven by a deterministic mapping learned on the ABC reference by conditional GANs. Once the mapping has been trained, iid posterior samples are obtained by filtering noise at a negligible additional cost. We propose two post-processing local refinements using (1) data-driven proposals with importance reweighing, and (2) variational Bayes. We support our findings with frequentist-Bayesian results, showing that the typical total variation distance between the true and approximate posteriors converges to zero for certain neural network generators and discriminators. Our findings on simulated data show highly competitive performance relative to some of the most recent likelihood-free posterior simulators.
    ECOD: Unsupervised Outlier Detection Using Empirical Cumulative Distribution Functions. (arXiv:2201.00382v3 [cs.LG] UPDATED)
    Outlier detection refers to the identification of data points that deviate from a general data distribution. Existing unsupervised approaches often suffer from high computational cost, complex hyperparameter tuning, and limited interpretability, especially when working with large, high-dimensional datasets. To address these issues, we present a simple yet effective algorithm called ECOD (Empirical-Cumulative-distribution-based Outlier Detection), which is inspired by the fact that outliers are often the "rare events" that appear in the tails of a distribution. In a nutshell, ECOD first estimates the underlying distribution of the input data in a nonparametric fashion by computing the empirical cumulative distribution per dimension of the data. ECOD then uses these empirical distributions to estimate tail probabilities per dimension for each data point. Finally, ECOD computes an outlier score of each data point by aggregating estimated tail probabilities across dimensions. Our contributions are as follows: (1) we propose a novel outlier detection method called ECOD, which is both parameter-free and easy to interpret; (2) we perform extensive experiments on 30 benchmark datasets, where we find that ECOD outperforms 11 state-of-the-art baselines in terms of accuracy, efficiency, and scalability; and (3) we release an easy-to-use and scalable (with distributed support) Python implementation for accessibility and reproducibility.
    The Informativeness of K -Means for Learning Mixture Models. (arXiv:1703.10534v4 [stat.ML] UPDATED)
    The learning of mixture models can be viewed as a clustering problem. Indeed, given data samples independently generated from a mixture of distributions, we often would like to find the {\it correct target clustering} of the samples according to which component distribution they were generated from. For a clustering problem, practitioners often choose to use the simple $k$-means algorithm. $k$-means attempts to find an {\it optimal clustering} that minimizes the sum-of-squares distance between each point and its cluster center. In this paper, we consider fundamental (i.e., information-theoretic) limits of the solutions (clusterings) obtained by optimizing the sum-of-squares distance. In particular, we provide sufficient conditions for the closeness of any optimal clustering and the correct target clustering assuming that the data samples are generated from a mixture of spherical Gaussian distributions. We also generalize our results to log-concave distributions. Moreover, we show that under similar or even weaker conditions on the mixture model, any optimal clustering for the samples with reduced dimensionality is also close to the correct target clustering. These results provide intuition for the informativeness of $k$-means (with and without dimensionality reduction) as an algorithm for learning mixture models.
    JAXFit: Trust Region Method for Nonlinear Least-Squares Curve Fitting on the GPU. (arXiv:2208.12187v1 [cs.LG])
    We implement a trust region method on the GPU for nonlinear least squares curve fitting problems using a new deep learning Python library called JAX. Our open source package, JAXFit, works for both unconstrained and constrained curve fitting problems and allows the fit functions to be defined in Python alone -- without any specialized knowledge of either the GPU or CUDA programming. Since JAXFit runs on the GPU, it is much faster than CPU based libraries and even other GPU based libraries, despite being very easy to use. Additionally, due to JAX's deep learning foundations, the Jacobian in JAXFit's trust region algorithm is calculated with automatic differentiation, rather than than using derivative approximations or requiring the user to define the fit function's partial derivatives.
    Online Learning via Offline Greedy Algorithms: Applications in Market Design and Optimization. (arXiv:2102.11050v2 [cs.LG] UPDATED)
    Motivated by online decision-making in time-varying combinatorial environments, we study the problem of transforming offline algorithms to their online counterparts. We focus on offline combinatorial problems that are amenable to a constant factor approximation using a greedy algorithm that is robust to local errors. For such problems, we provide a general framework that efficiently transforms offline robust greedy algorithms to online ones using Blackwell approachability. We show that the resulting online algorithms have $O(\sqrt{T})$ (approximate) regret under the full information setting. We further introduce a bandit extension of Blackwell approachability that we call Bandit Blackwell approachability. We leverage this notion to transform greedy robust offline algorithms into a $O(T^{2/3})$ (approximate) regret in the bandit setting. Demonstrating the flexibility of our framework, we apply our offline-to-online transformation to several problems at the intersection of revenue management, market design, and online optimization, including product ranking optimization in online platforms, reserve price optimization in auctions, and submodular maximization. We also extend our reduction to greedy-like first order methods used in continuous optimization, such as those used for maximizing continuous strong DR monotone submodular functions subject to convex constraints. We show that our transformation, when applied to these applications, leads to new regret bounds or improves the current known bounds. We complement our theoretical studies by conducting numerical simulations for two of our applications, in both of which we observe that the numerical performance of our transformations outperforms the theoretical guarantees in practical instances.

  • Open

    [Research] Middle School Lego Robotics team request
    Hello r/MachineLearning, I am Skip Morrow. I volunteer as a FIRST Lego League (FLL) coach at a middle school in Virginia. I am hoping that we can get some assistance from the reddit community. If you are not familiar with the FIRST Lego League, it is a competition where teams around the world are challenged to think of a problem within our community related to an annual theme and then develop a solution to that problem (there's also a more familiar lego robotics competition as part of FLL, but this request is not about that). Past themes have included such things as water supply, transportation, and waste disposal. This year the theme is "energy"-- how do we produce it, use it, conserve it and dispose of it. I had an idea that we could learn about artificial intelligence and see how we could use AI/ML to conserve energy. This is where the reddit Machine Learning team comes in. We are hoping that there is a Machine Learning expert here that would be willing to spend a couple of hours (probably less than six) over the next three months helping our team learn about AI/ML and helping us come up with a product that we could demonstrate in front of some judges at a tournament in Nov. The involvement could consist of answering some emails and doing one or two Teams/Zoom/virtual meetings. I very much want to keep your impact to a minimum, but I am looking for something rewarding and enriching for the kids. If you like getting kids excited about STEM, you will love this. The kids on the team are future programers, engineers, scientists, etc, and eager to learn. It is guaranteed to be fun! Is there anyone here that might be able to help us? Or maybe you could share this with someone that might? Very Respectfully, Skip Morrow submitted by /u/SkipMorrow [link] [comments]  ( 90 min )
    [P] Productionize a video tagging system
    Hi all, I need help developing a pipeline to productionize our video tagging system. We have a deep learning model that takes video input and assigns labels to it. All of our videos are stored in S3 Glacier Deep archive, and it is very expensive to unfreeze them. We also have the videos available through Mux, but here we would have to download the video (to the container etc) before passing it to our model. As soon as a video is added to our collection, we update out PostgreSQL data base with the video info including the url on Mux. My apologies if my question is bit vague as I'm new to this. I need to accomplish the following with the production pipeline: - Monitor our PostgreSQL database and process videos as soon as they are added, and store the resulting tags back in the PostgreSQL database. - ^ this would be ideal. But as a preliminary step, maybe we can pass a video_id manually, and the pipeline should process that video and store the labels. Any help would be appreciated. submitted by /u/therobot20 [link] [comments]  ( 89 min )
    [P] Run stable diffusion in google colab including image2image and inpainting
    ​ https://preview.redd.it/8vrzhm8l4xj91.jpg?width=640&format=pjpg&auto=webp&s=9a5e7fb4694981d9f9184a9fdc099c96273c60a4 https://preview.redd.it/vyrc59624xj91.png?width=1350&format=png&auto=webp&s=b25f1b106ad5c97ed62e8ed8088b3e3a43991480 colab for img2img: https://colab.research.google.com/drive/1NfgqublyT_MWtR5CsmrgmdnkWiijF3P3?usp=sharing colab for inpainting: https://colab.research.google.com/drive/1whhIiXxjQjbBuiq4lqwh-AlLIjh3l1OB demo built with gradio: https://github.com/gradio-app/gradio hosted web demo for stable diffusion: https://huggingface.co/spaces/stabilityai/stable-diffusion submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 88 min )
    [D] Would StyleGAN (2/3) with human-rated aesthetic score aided dataset be possible? and why hasn't anyone done this before?
    Diffusion models are largely transitioning to an aesthetic-score aided dataset , to give more precise scoring system input to the discriminator, in many cases this results in tremendously better quality at 10x less data used than datasets without aesthetic rating for individual images My question is , why havent anyone made a similar mod for StyleGAN2 or StyleGAN3? Whats stopping GANs from implementing a similar system? from my understanding of GANs , it would be an even more exponential help for them, since GAN discriminators right now lack any fine-gradients when it comes to guiding them on image quality fake-real rating and bad samples inside the dataset can often ruin the Discriminator from further improving. submitted by /u/CranberryMean3990 [link] [comments]  ( 90 min )
    OpenAI Residency 2023 [R]
    Has anyone applied to the newly opened OpenAI residency program to begin in January 2023? submitted by /u/rooney119 [link] [comments]  ( 89 min )
    [D] What does production look like in your case?
    Hey there r/ml! I'm working on a talk/blog on a related topic and was curious to see what the case is for the sub. Thought of creating a poll, but I think each case would be a bit different so I don't want to constrain people too much. If you deploy ML to production in your group/team/company – what does production mean for you? Examples: "We run a model once a week that predicts some stuff and stores it in a table, then the customer queries it" "We create an inference endpoint on some cloud resource, which our product/users use to predict poses in videos" "I wish I knew, we're still figuring it out" "We deploy a model as part of a larger pipeline in a system of microservices (and other buzzwords)" Also, if you are in an extra-sharing mood – in your version of production, were there any counter-intuitive things you learned when you first set up the pipeline? Cheers! Enjoy the picture Dall-E2 made for you of a cat asking for upvotes in return. Cat asking for upvotes submitted by /u/PhYsIcS-GUY227 [link] [comments]  ( 112 min )
    [D] Who are the essential Machine Learning Twitter accounts to follow?
    I'm soon to be starting a PhD in machine learning and am trying to build up an academic Twitter page to keep up with the space and make my own updates. At the moment, I'm following some academics I have been particularly interested in when reading their work; however, I'm sure there's large Twitter pages which I've not followed that give really interesting insights into the SOTA. As a result, for the benefit of myself and everyone else on this subreddit, I'm wondering if anyone has any really interesting for accounts to follow under the umbrella of 'Machine Learning', all disciplines welcome! submitted by /u/mouldygoldie [link] [comments]  ( 89 min )
    [D] NeurIPS is not as affordable as it used to be?
    Did anyone else notice that it costs €200 to attend NeurIPS 2022 virtually? What are your thoughts on this? submitted by /u/celestiallylovedone [link] [comments]  ( 89 min )
    [D][N]"Mudge learned that Twitter had never acquired proper legal rights to training material used to build Twitter's key Machine Learning models. The Machine Learning models at issue were some of the core models running the company's most basic products, like which Tweets to show each user."
    In the news this week is a major story about Twitter's top Information Security executive, Peiter "Mudge" Zatko, turning whistleblower. Mudge is a legend in the security community and he is a former DARPA program manager -- he's legit. His report is a bombshell on what a complete disaster Twitter's information security posture really is. If you're a security person, this is worth the read. However, while the media's been very focused on the security problems, this nugget seems to be glossed over: it appears that Twitter knowingly used ML software in a commercial production environment for years without proper licensing. ~~~ Excerpts from the report: Page 38: Item 70. Unlicensed machine learning materials for core algorithms: In January 2022, in the days before he was terminated, Mudge …  ( 101 min )
    Discussion [D] Scrapping data from google street view
    Is there a way to extract compass degree data from googlestreetview using python.?? ​ submitted by /u/Little-Chocolate269 [link] [comments]  ( 89 min )
    [P] How to determine object orientation
    Hi, There is always a single object on the image that must be detected and cropped. I have trained yoloV5 using the dataset in which all the images are aligned such that the rectangle edge is parallel with the horizon. The model achieves excellent results. Now, I would like to upgrade the solution such that it would determine the object rotation (in case the object on the image is rotated). To train the new model, I have the same dataset (all objects have a rotation angle of 0 degrees). The idea is to create a new dataset by randomly selecting an angle, rotating the image, and recalculating the rectangle angles accordingly. Below is the image representing the transformation (left original image, right rotated): rotation In the original dataset, the annotations are in yolo format - relative values (x_center, y_center, width height). From these values, I have calculated the coordinates (x1, y1), (x2, y2), (x3, y3), (x4, y4). After applying rotation around the axis located in the center of the object, I have to recalculate the new coordinates of rectangle corners (I have not done that yet and am looking for a simple solution rather than applying trigonometric functions, so I would appreciate it if anyone knows any library that simplifies the calculation). ​ After preparing the dataset, I plan to try Yolov5 for Oriented Object Detection. ​ I am not sure if there is any more straightforward solution to determine rotation and prepare the dataset in the case of using yolov5 OOD. ​ I would appreciate any suggestions about how to implement rotation detection by having the dataset of objects with rotation angles 0. submitted by /u/ThickDoctor007 [link] [comments]  ( 107 min )
    [D] How to use a vision transformer to produce embeddings?
    I'm trying to use a vision transformer to use embeddings. The issue is, the vision transformer from timm has the parameters embed_dim and num_classes. The output vector from the vision transformer is num_classes in length. I'm not trying to do a classification problem, I'm trying to train my own CLIP-like model, which means I want embeddings. (I.e., maybe the last layer takes the embeddings and turns it into the class vector, which I don't want). Is it fine to just use the vector that is num_classes in length as the embedding vector, or is there a particular layer I need to strip? Somewhat unrelated, does anyone know what the width parameter is, in a vision transformer? submitted by /u/throwaway119284 [link] [comments]  ( 90 min )
    [D] Short* and effective roadmap to understand modern synthesis models?
    What I need: Be able to read papers and understand principles of work Understand how work GPT-3 Understand how work Stable Diffusion etc What is learning path for absolute zero? ​ - It means short as intensive and capable of leading to tangible results in a short time. Without unnecessary distractions (which can be mastered later). Learning is a never-ending process and I didn't mean to learn quickly and stop. submitted by /u/fhv3hk71 [link] [comments]  ( 107 min )
  • Open

    What is a Data Quality Framework and How to Implement it?
    30-50% of businesses experience gaps between their data expectations and reality. They have the data they need, but due to the presence of intolerable defects, they cannot use it as needed. These defects also called data quality issues – must be fetched and fixed so that data can be used for successful business operations and… Read More »What is a Data Quality Framework and How to Implement it? The post What is a Data Quality Framework and How to Implement it? appeared first on Data Science Central.  ( 20 min )
  • Open

    DALL - E’s output when given “unaired Star Trek episode”
    submitted by /u/coleflannery [link] [comments]  ( 87 min )
    Thoughts on the deepfakes in this video? Refaced was used (remove if not allowed)
    submitted by /u/nkubiak [link] [comments]  ( 97 min )
    Artificial moral cognition: Learning from developmental psychology
    submitted by /u/Futures_Bot [link] [comments]  ( 88 min )
    How to Build a GPT-3 for Science
    submitted by /u/pmz [link] [comments]  ( 93 min )
    Best OpenCV Books for beginners to Advanced to know in 2022 -
    submitted by /u/Lakshmireddys [link] [comments]  ( 87 min )
    We need help testing our new platform
    Hi everyone, We just finished a project and developed a platform for evaluating the robustness of AI models against adversarial attacks and natural noises. Perhaps some of you have experience with similar open source tools and framework(e.g. ART), but we believe our platform is ultimately a MLOps platform rather than a library, and provides the user with the capability to expand various attacks (many not available in ART) to other tasks. Additionally, the platform is not focused on just adversarial robustness, but on natural robustness capabilities with domain adaptation options as well. Our platform is free for testing in the next 30 days, you can get access via this link: https://guardai.navinfo.cloud/#/ , we are more than happy to answer your questions and receive your feedback. If you want to learn more about it, we also created this simple landing page for your reference. https://www.navinfo.eu/services/cybersecurity/guardai/ submitted by /u/GuardAITeam [link] [comments]  ( 87 min )
    I need to find a program/website to average out a large set of faces, any one know any?
    submitted by /u/Shaddersss [link] [comments]  ( 87 min )
    Personal AI Writing Assistant for Mac
    submitted by /u/juliarmg [link] [comments]  ( 87 min )
    Accelerating Structure Prediction of Protein Monomers and Multimer by 11 Times! An Open Source Solution from Colossal-AI and BioMap
    The latest solution from the Colossal-AI team (https://github.com/hpcaitech/ColossalAI) and BioMap for protein monomer and multimer structure prediction, xTrimo Multimer, has recently become open-source to the public. This new solution can predict both monomer and multimer structure simultaneously accelerating the process by up to 11 times! https://preview.redd.it/xeg7wb79btj91.png?width=852&format=png&auto=webp&s=5a9308c9e87691e8a6082d9d3926f10a87e272a6 The hero behind is Colossal-AI, which is a powerful deep learning system that aims to make large AI model training easy and accessible in the community and industry. By integrating large model training techniques and optimizations provided by Colossal-AI, we can significantly reduce the time and cost of both protein monomer and multimer …  ( 94 min )
    A business Card with AI power - Let your photo speak
    It use v2v tech from Movio https://reddit.com/link/wx775y/video/qp3cz1n46tj91/player submitted by /u/Ok_Asparagus_964 [link] [comments]  ( 87 min )
  • Open

    Achieve low-latency hosting for decision tree-based ML models on NVIDIA Triton Inference Server on Amazon SageMaker
    Machine learning (ML) model deployments can have very demanding performance and latency requirements for businesses today. Use cases such as fraud detection and ad placement are examples where milliseconds matter and are critical to business success. Strict service level agreements (SLAs) need to be met, and a typical request may require multiple steps such as […]  ( 13 min )
    Build a multi-lingual document translation workflow with domain-specific and language-specific customization
    In the digital world, providing information in a local language isn’t novel, but it can be a tedious and expensive task. Advancements in machine learning (ML) and natural language processing (NLP) have made this task much easier and less expensive. We have seen increased adoption of ML for multi-lingual data and document processing workloads. Enterprise […]  ( 8 min )
  • Open

    High-Definition Segmentation in Google Meet
    Posted by Tingbo Hou and Juhyun Lee, Software Engineers, Google In recent years video conferencing has played an increasingly important role in both work and personal communication for many users. Over the past two years, we have enhanced this experience in Google Meet by introducing privacy-preserving machine learning (ML) powered background features, also known as “virtual green screen”, which allows users to blur their backgrounds or replace them with other images. What is unique about this solution is that it runs directly in the browser without the need to install additional software. So far, these ML-powered features have relied on CPU inference made possible by leveraging neural network sparsity, a common solution that works across devices, from entry level computers to high-end …  ( 23 min )
  • Open

    "The Alberta Plan for AI Research", Sutton et al 2022 {DM} (manifesto for project to build permanent continually-learning non-episodic RL agents)
    submitted by /u/gwern [link] [comments]  ( 87 min )
    Agent obtains optimal path when trained in one direction,but when trained along reverse direction as well, it avoids the same path.
    I am currently training an agent to find the minimal number to steps to take when going from one position to another in a 4x4 grid. When I train the agent where the start position is one corner of the grid([0,0]) and end position is another corner of the grid([3,3]) ONLY, the agent goes diagonally, which is the desired optimal path. But, when I train the agent where the starting position can be either [0,0] or [3,3], and the ending position is opposite to that ie. [3,3] or [0,0], the agent does not trace out the diagonal path. In fact, the agent moves along the side of the box. To further illustrate, when the start position and end position are [0,0] and [3,3] respectively, the agent moves horizontally right and then vertically down on one side of the box to reach the final destination. But, when the start position and end position are [3,3] and [0,0] respectively, the agent now move horizontally left and then vertically up on the other side of other box thus not tracing the same path. My question: What might have caused the agent not to take the diagonal path, and what would you suggest for the workaround this problem? More details: When I train the agent, where the optimal path is vertical, along both directions, the NN is finding much harder to converge. Eg: When I train the agent where start and end position are [0,0] and [3,0] or [3,0] and [0,0]. submitted by /u/Icy_Improvement_5527 [link] [comments]  ( 92 min )
    PPO with e-greedy (part 2)
    so I'm trying somehow to integrate e-greedy exploring to my pop also, now my select_action is like this: def select_action(self, epsilon, state):#only used when interact with the env with torch.no_grad(): state = torch.FloatTensor(state.reshape(1, -1)).to(device) dist = self.actor.get_dist(state) r = random.uniform(0,1) if r < epsilon: a = random.uniform(0,1) else: a = dist.sample() a = torch.clamp(a, 0, 1) logprob_a = dist.log_prob(a).cpu().numpy().flatten() return a.cpu().numpy().flatten(), logprob_a expectedly it throws the error: ValueError: The value argument to log_prob must be a Tensor what should I do? should I have a separate uniform dist? submitted by /u/White_Sirilo [link] [comments]  ( 88 min )
    Question regarding transfer learning with different action sizes
    Hello, I am currently looking into reinforcement learning for wind farm control, and I were thinking about training an agent on a 3x3 wind farm, and then trying to utilize transfer learning to get it to work on a (for example) 6x6 wind farm. That would mean that my action space changes from 9 actions to 36 in this case. Would this even be possible, and how would I handle this change in action space? I am still quite new to this topic, so if you have a link to some papers or repositories that would be a big help. submitted by /u/IAmActuallyMarcus [link] [comments]  ( 88 min )
  • Open

    3D Artist Creates Blooming, Generative Sculptures With NVIDIA RTX and AI
    Looking for a change of art? Try using AI — that’s what 3D artist Nikola Damjanov is doing. Based in Serbia, Damjanov has over 15 years of experience in the graphics industry, from making 3D models and animations to creating high-quality visual effects for music videos and movies. Now an artist at game developer company Read article > The post 3D Artist Creates Blooming, Generative Sculptures With NVIDIA RTX and AI appeared first on NVIDIA Blog.  ( 5 min )
    Fintech Company Blocks Fraud Attacks for Financial Institutions With AI and NVIDIA GPUs
    E-commerce sales have skyrocketed as more people shop remotely, spurred by the pandemic. But this surge has also led fraudsters to use the opportunity to scam retailers and customers, according to David Sutton, director of analytical technology at fintech company Featurespace. The company, headquartered in the U.K., has developed AI-powered technology to increase the speed Read article > The post Fintech Company Blocks Fraud Attacks for Financial Institutions With AI and NVIDIA GPUs appeared first on NVIDIA Blog.  ( 6 min )
    GFN Thursday Adds ‘Saints Row,’ ‘Genshin Impact’ on Mobile With Touch Controls
    Some weeks, GFN Thursday reveals new or unique features. Other weeks, it’s a cool reward. And every week, it offers its members new games. This week, it’s all of the above. First, Saints Row marches into GeForce NOW. Be your own boss in the new reboot of the classic open-world criminal adventure series, now available Read article > The post GFN Thursday Adds ‘Saints Row,’ ‘Genshin Impact’ on Mobile With Touch Controls appeared first on NVIDIA Blog.  ( 6 min )
  • Open

    MoCapAct: Training humanoid robots to “Move Like Jagger”
    What would it take to get humanoid, bipedal robots to dance like Mick Jagger? Indeed, for something more mundane, what does it take to get them to simply stand still? Sit down? Walk? Move in myriads of other ways many people take for granted? Bipedalism provides unparalleled versatility in an environment designed for and by […] The post MoCapAct: Training humanoid robots to “Move Like Jagger” appeared first on Microsoft Research.  ( 11 min )
  • Open

    Neural Network From Scratch in Python pt-1
    What is neural network ?  ( 8 min )
  • Open

    Scenario-Adaptive and Self-Supervised Model for Multi-Scenario Personalized Recommendation. (arXiv:2208.11457v1 [cs.IR])
    Multi-scenario recommendation is dedicated to retrieve relevant items for users in multiple scenarios, which is ubiquitous in industrial recommendation systems. These scenarios enjoy portions of overlaps in users and items, while the distribution of different scenarios is different. The key point of multi-scenario modeling is to efficiently maximize the use of whole-scenario information and granularly generate adaptive representations both for users and items among multiple scenarios. we summarize three practical challenges which are not well solved for multi-scenario modeling: (1) Lacking of fine-grained and decoupled information transfer controls among multiple scenarios. (2) Insufficient exploitation of entire space samples. (3) Item's multi-scenario representation disentanglement problem. In this paper, we propose a Scenario-Adaptive and Self-Supervised (SASS) model to solve the three challenges mentioned above. Specifically, we design a Multi-Layer Scenario Adaptive Transfer (ML-SAT) module with scenario-adaptive gate units to select and fuse effective transfer information from whole scenario to individual scenario in a quite fine-grained and decoupled way. To sufficiently exploit the power of entire space samples, a two-stage training process including pre-training and fine-tune is introduced. The pre-training stage is based on a scenario-supervised contrastive learning task with the training samples drawn from labeled and unlabeled data spaces. The model is created symmetrically both in user side and item side, so that we can get distinguishing representations of items in different scenarios. Extensive experimental results on public and industrial datasets demonstrate the superiority of the SASS model over state-of-the-art methods. This model also achieves more than 8.0% improvement on Average Watching Time Per User in online A/B tests.
    Towards an Awareness of Time Series Anomaly Detection Models' Adversarial Vulnerability. (arXiv:2208.11264v1 [cs.LG])
    Time series anomaly detection is extensively studied in statistics, economics, and computer science. Over the years, numerous methods have been proposed for time series anomaly detection using deep learning-based methods. Many of these methods demonstrate state-of-the-art performance on benchmark datasets, giving the false impression that these systems are robust and deployable in many practical and industrial real-world scenarios. In this paper, we demonstrate that the performance of state-of-the-art anomaly detection methods is degraded substantially by adding only small adversarial perturbations to the sensor data. We use different scoring metrics such as prediction errors, anomaly, and classification scores over several public and private datasets ranging from aerospace applications, server machines, to cyber-physical systems in power plants. Under well-known adversarial attacks from Fast Gradient Sign Method (FGSM) and Projected Gradient Descent (PGD) methods, we demonstrate that state-of-the-art deep neural networks (DNNs) and graph neural networks (GNNs) methods, which claim to be robust against anomalies and have been possibly integrated in real-life systems, have their performance drop to as low as 0%. To the best of our understanding, we demonstrate, for the first time, the vulnerabilities of anomaly detection systems against adversarial attacks. The overarching goal of this research is to raise awareness towards the adversarial vulnerabilities of time series anomaly detectors.
    Weakly Supervised Disentangled Generative Causal Representation Learning. (arXiv:2010.02637v3 [cs.LG] UPDATED)
    This paper proposes a Disentangled gEnerative cAusal Representation (DEAR) learning method under appropriate supervised information. Unlike existing disentanglement methods that enforce independence of the latent variables, we consider the general case where the underlying factors of interests can be causally related. We show that previous methods with independent priors fail to disentangle causally related factors even under supervision. Motivated by this finding, we propose a new disentangled learning method called DEAR that enables causal controllable generation and causal representation learning. The key ingredient of this new formulation is to use a structural causal model (SCM) as the prior distribution for a bidirectional generative model. The prior is then trained jointly with a generator and an encoder using a suitable GAN algorithm incorporated with supervised information on the ground-truth factors and their underlying causal structure. We provide theoretical justification on the identifiability and asymptotic convergence of the proposed method. We conduct extensive experiments on both synthesized and real data sets to demonstrate the effectiveness of DEAR in causal controllable generation, and the benefits of the learned representations for downstream tasks in terms of sample efficiency and distributional robustness.
    Robustness to Unbounded Smoothness of Generalized SignSGD. (arXiv:2208.11195v1 [cs.LG])
    Traditional analyses in non-convex optimization typically rely on the smoothness assumption, namely requiring the gradients to be Lipschitz. However, recent evidence shows that this smoothness condition does not capture the properties of some deep learning objective functions, including the ones involving Recurrent Neural Networks and LSTMs. Instead, they satisfy a much more relaxed condition, with potentially unbounded smoothness. Under this relaxed assumption, it has been theoretically and empirically shown that the gradient-clipped SGD has an advantage over the vanilla one. In this paper, we show that clipping is not indispensable for Adam-type algorithms in tackling such scenarios: we theoretically prove that a generalized SignSGD algorithm can obtain similar convergence rates as SGD with clipping but does not need explicit clipping at all. This family of algorithms on one end recovers SignSGD and on the other end closely resembles the popular Adam algorithm. Our analysis underlines the critical role that momentum plays in analyzing SignSGD-type and Adam-type algorithms: it not only reduces the effects of noise, thus removing the need for large mini-batch in previous analyses of SignSGD-type algorithms, but it also substantially reduces the effects of unbounded smoothness and gradient norms. We also compare these algorithms with popular optimizers on a set of deep learning tasks, observing that we can match the performance of Adam while beating the others.
    AutoML-Based Drought Forecast with Meteorological Variables. (arXiv:2207.07012v2 [cs.LG] UPDATED)
    A precise forecast for droughts is of considerable value to scientific research, agriculture, and water resource management. With emerging developments of data-driven approaches for hydro-climate modeling, this paper investigates an AutoML-based framework to forecast droughts in the U.S. Compared with commonly-used temporal deep learning models, the AutoML model can achieve comparable performance with less training data and time. As deep learning models are becoming popular for Earth system modeling, this paper aims to bring more efforts to AutoML-based methods, and the use of them as benchmark baselines for more complex deep learning models.
    SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural Text-to-Speech Synthesis. (arXiv:2204.03040v2 [cs.SD] UPDATED)
    In this work, we present the SOMOS dataset, the first large-scale mean opinion scores (MOS) dataset consisting of solely neural text-to-speech (TTS) samples. It can be employed to train automatic MOS prediction systems focused on the assessment of modern synthesizers, and can stimulate advancements in acoustic model evaluation. It consists of 20K synthetic utterances of the LJ Speech voice, a public domain speech dataset which is a common benchmark for building neural acoustic models and vocoders. Utterances are generated from 200 TTS systems including vanilla neural acoustic models as well as models which allow prosodic variations. An LPCNet vocoder is used for all systems, so that the samples' variation depends only on the acoustic models. The synthesized utterances provide balanced and adequate domain and length coverage. We collect MOS naturalness evaluations on 3 English Amazon Mechanical Turk locales and share practices leading to reliable crowdsourced annotations for this task. We provide baseline results of state-of-the-art MOS prediction models on the SOMOS dataset and show the limitations that such models face when assigned to evaluate TTS utterances.
    Safe Output Feedback Motion Planning from Images via Learned Perception Modules and Contraction Theory. (arXiv:2206.06553v2 [cs.RO] UPDATED)
    We present a motion planning algorithm for a class of uncertain control-affine nonlinear systems which guarantees runtime safety and goal reachability when using high-dimensional sensor measurements (e.g., RGB-D images) and a learned perception module in the feedback control loop. First, given a dataset of states and observations, we train a perception system that seeks to invert a subset of the state from an observation, and estimate an upper bound on the perception error which is valid with high probability in a trusted domain near the data. Next, we use contraction theory to design a stabilizing state feedback controller and a convergent dynamic state observer which uses the learned perception system to update its state estimate. We derive a bound on the trajectory tracking error when this controller is subjected to errors in the dynamics and incorrect state estimates. Finally, we integrate this bound into a sampling-based motion planner, guiding it to return trajectories that can be safely tracked at runtime using sensor data. We demonstrate our approach in simulation on a 4D car, a 6D planar quadrotor, and a 17D manipulation task with RGB(-D) sensor measurements, demonstrating that our method safely and reliably steers the system to the goal, while baselines that fail to consider the trusted domain or state estimation errors can be unsafe.
    Efficient Heterogeneous Video Segmentation at the Edge. (arXiv:2208.11666v1 [cs.CV])
    We introduce an efficient video segmentation system for resource-limited edge devices leveraging heterogeneous compute. Specifically, we design network models by searching across multiple dimensions of specifications for the neural architectures and operations on top of already light-weight backbones, targeting commercially available edge inference engines. We further analyze and optimize the heterogeneous data flows in our systems across the CPU, the GPU and the NPU. Our approach has empirically factored well into our real-time AR system, enabling remarkably higher accuracy with quadrupled effective resolutions, yet at much shorter end-to-end latency, much higher frame rate, and even lower power consumption on edge platforms.
    Bias Mitigation for Machine Learning Classifiers: A Comprehensive Survey. (arXiv:2207.07068v3 [cs.LG] UPDATED)
    This paper provides a comprehensive survey of bias mitigation methods for achieving fairness in Machine Learning (ML) models. We collect a total of 341 publications concerning bias mitigation for ML classifiers. These methods can be distinguished based on their intervention procedure (i.e., pre-processing, in-processing, post-processing) and the technology they apply. We investigate how existing bias mitigation methods are evaluated in the literature. In particular, we consider datasets, metrics and benchmarking. Based on the gathered insights (e.g., what is the most popular fairness metric? How many datasets are used for evaluating bias mitigation methods?). We hope to support practitioners in making informed choices when developing and evaluating new bias mitigation methods.
    UM4: Unified Multilingual Multiple Teacher-Student Model for Zero-Resource Neural Machine Translation. (arXiv:2207.04900v2 [cs.CL] UPDATED)
    Most translation tasks among languages belong to the zero-resource translation problem where parallel corpora are unavailable. Multilingual neural machine translation (MNMT) enables one-pass translation using shared semantic space for all languages compared to the two-pass pivot translation but often underperforms the pivot-based method. In this paper, we propose a novel method, named as Unified Multilingual Multiple teacher-student Model for NMT (UM4). Our method unifies source-teacher, target-teacher, and pivot-teacher models to guide the student model for the zero-resource translation. The source teacher and target teacher force the student to learn the direct source to target translation by the distilled knowledge on both source and target sides. The monolingual corpus is further leveraged by the pivot-teacher model to enhance the student model. Experimental results demonstrate that our model of 72 directions significantly outperforms previous methods on the WMT benchmark.
    Synthetic ECG Signal Generation Using Generative Neural Networks. (arXiv:2112.03268v2 [cs.LG] UPDATED)
    Electrocardiogram (ECG) datasets tend to be highly imbalanced due to the scarcity of abnormal cases. Additionally, the use of real patients' ECGs is highly regulated due to privacy issues. Therefore, there is always a need for more ECG data, especially for the training of automatic diagnosis machine learning models, which perform better when trained on a balanced dataset. We studied the synthetic ECG generation capability of 5 different models from the generative adversarial network (GAN) family and compared their performances, the focus being only on Normal cardiac cycles. Dynamic Time Warping (DTW), Fr\'echet, and Euclidean distance functions were employed to quantitatively measure performance. Five different methods for evaluating generated beats were proposed and applied. We also proposed 3 new concepts (threshold, accepted beat and productivity rate) and employed them along with the aforementioned methods as a systematic way for comparison between models. The results show that all the tested models can, to an extent, successfully mass-generate acceptable heartbeats with high similarity in morphological features, and potentially all of them can be used to augment imbalanced datasets. However, visual inspections of generated beats favors BiLSTM-DC GAN and WGAN, as they produce statistically more acceptable beats. Also, with regards to productivity rate, the Classic GAN is superior with a 72% productivity rate. We also designed a simple experiment with the state-of-the-art classifier (ECGResNet34) to show empirically that the augmentation of the imbalanced dataset by synthetic ECG signals could improve the performance of classification significantly.
    Signed Link Representation in Continuous-Time Dynamic Signed Networks. (arXiv:2207.03408v2 [cs.SI] UPDATED)
    Signed networks allow us to model conflicting relationships and interactions, such as friend/enemy and support/oppose. These signed interactions are often temporal in real-world datasets, where new nodes and links appear over time. Modeling such dynamics of signed networks is crucial to understanding the evolution of polarization in the network and enabling effective prediction of the signed structure (i.e., link signs and signed weights) in the future. However, existing works have modeled either (static) signed networks or dynamic (unsigned) networks but not dynamic signed networks. Since both sign and dynamics inform the graph structure in different ways, it is non-trivial to model how to combine the two features. In this work, we propose a new Graph Neural Network (GNN)-based approach to model dynamic signed networks, named SEMBA: Signed link's Evolution using Memory modules and Balanced Aggregation. Here, the idea is to incorporate the signs of temporal interactions using separate modules guided by balance theory and to evolve the embeddings from a higher-order neighborhood. Experiments on 4 real-world datasets demonstrate that SEMBA consistently and significantly outperforms the baselines by upto $9\%$ on the tasks of predicting signs and signed weights of future links. SEMBA specifically improves prediction on minority negative class by reducing the false positive rate by upto $50\%$ and by learning the weight distribution with an improvement of $69\%$ in the KL-divergence.
    High-Order Conditional Mutual Information Maximization for dealing with High-Order Dependencies in Feature Selection. (arXiv:2207.08476v2 [cs.LG] UPDATED)
    This paper presents a novel feature selection method based on the conditional mutual information (CMI). The proposed High Order Conditional Mutual Information Maximization (HOCMIM) incorporates high order dependencies into the feature selection procedure and has a straightforward interpretation due to its bottom-up derivation. The HOCMIM is derived from the CMI's chain expansion and expressed as a maximization optimization problem. The maximization problem is solved using a greedy search procedure, which speeds up the entire feature selection process. The experiments are run on a set of benchmark datasets (20 in total). The HOCMIM is compared with eighteen state-of-the-art feature selection algorithms, from the results of two supervised learning classifiers (Support Vector Machine and K-Nearest Neighbor). The HOCMIM achieves the best results in terms of accuracy and shows to be faster than high order feature selection counterparts.
    A coherence parameter characterizing generative compressed sensing with Fourier measurements. (arXiv:2207.09340v2 [cs.IT] UPDATED)
    In Bora et al. (2017), a mathematical framework was developed for compressed sensing guarantees in the setting where the measurement matrix is Gaussian and the signal structure is the range of a generative neural network (GNN). The problem of compressed sensing with GNNs has since been extensively analyzed when the measurement matrix and/or network weights follow a subgaussian distribution. We move beyond the subgaussian assumption, to measurement matrices that are derived by sampling uniformly at random rows of a unitary matrix (including subsampled Fourier measurements as a special case). Specifically, we prove the first known restricted isometry guarantee for generative compressed sensing with subsampled isometries, and provide recovery bounds with nearly order-optimal sample complexity, addressing an open problem of Scarlett et al. (2022, p. 10). Recovery efficacy is characterized by the coherence, a new parameter, which measures the interplay between the range of the network and the measurement matrix. Our approach relies on subspace counting arguments and ideas central to high-dimensional probability. Furthermore, we propose a regularization strategy for training GNNs to have favourable coherence with the measurement operator. We provide compelling numerical simulations that support this regularized training strategy: our strategy yields low coherence networks that require fewer measurements for signal recovery. This, together with our theoretical results, supports coherence as a natural quantity for characterizing generative compressed sensing with subsampled isometries.
    Oracle-free Reinforcement Learning in Mean-Field Games along a Single Sample Path. (arXiv:2208.11639v1 [cs.LG])
    We consider online reinforcement learning in Mean-Field Games. In contrast to the existing works, we alleviate the need for a mean-field oracle by developing an algorithm that estimates the mean-field and the optimal policy using a single sample path of the generic agent. We call this Sandbox Learning, as it can be used as a warm-start for any agent operating in a multi-agent non-cooperative setting. We adopt a two timescale approach in which an online fixed-point recursion for the mean-field operates on a slower timescale and in tandem with a control policy update on a faster timescale for the generic agent. Under a sufficient exploration condition, we provide finite sample convergence guarantees in terms of convergence of the mean-field and control policy to the mean-field equilibrium. The sample complexity of the Sandbox learning algorithm is $\mathcal{O}(\epsilon^{-4})$. Finally, we empirically demonstrate effectiveness of the sandbox learning algorithm in a congestion game.
    BRIGHT -- Graph Neural Networks in Real-Time Fraud Detection. (arXiv:2205.13084v2 [cs.LG] UPDATED)
    Detecting fraudulent transactions is an essential component to control risk in e-commerce marketplaces. Apart from rule-based and machine learning filters that are already deployed in production, we want to enable efficient real-time inference with graph neural networks (GNNs), which is useful to catch multihop risk propagation in a transaction graph. However, two challenges arise in the implementation of GNNs in production. First, future information in a dynamic graph should not be considered in message passing to predict the past. Second, the latency of graph query and GNN model inference is usually up to hundreds of milliseconds, which is costly for some critical online services. To tackle these challenges, we propose a Batch and Real-time Inception GrapH Topology (BRIGHT) framework to conduct an end-to-end GNN learning that allows efficient online real-time inference. BRIGHT framework consists of a graph transformation module (Two-Stage Directed Graph) and a corresponding GNN architecture (Lambda Neural Network). The Two-Stage Directed Graph guarantees that the information passed through neighbors is only from the historical payment transactions. It consists of two subgraphs representing historical relationships and real-time links, respectively. The Lambda Neural Network decouples inference into two stages: batch inference of entity embeddings and real-time inference of transaction prediction. Our experiments show that BRIGHT outperforms the baseline models by >2\% in average w.r.t.~precision. Furthermore, BRIGHT is computationally efficient for real-time fraud detection. Regarding end-to-end performance (including neighbor query and inference), BRIGHT can reduce the P99 latency by >75\%. For the inference stage, our speedup is on average 7.8$\times$ compared to the traditional GNN.
    FedIPR: Ownership Verification for Federated Deep Neural Network Models. (arXiv:2109.13236v3 [cs.LG] UPDATED)
    Federated learning models are collaboratively developed upon valuable training data owned by multiple parties. During the development and deployment of federated models, they are exposed to risks including illegal copying, re-distribution, misuse and/or free-riding. To address these risks, the ownership verification of federated learning models is a prerequisite that protects federated learning model intellectual property rights (IPR) i.e., FedIPR. We propose a novel federated deep neural network (FedDNN) ownership verification scheme that allows private watermarks to be embedded and verified to claim legitimate IPR of FedDNN models. In the proposed scheme, each client independently verifies the existence of the model watermarks and claims respective ownership of the federated model without disclosing neither private training data nor private watermark information. The effectiveness of embedded watermarks is theoretically justified by the rigorous analysis of conditions under which watermarks can be privately embedded and detected by multiple clients. Moreover, extensive experimental results on computer vision and natural language processing tasks demonstrate that varying bit-length watermarks can be embedded and reliably detected without compromising original model performances. Our watermarking scheme is also resilient to various federated training settings and robust against removal attacks.
    DISCO: Comprehensive and Explainable Disinformation Detection. (arXiv:2203.04928v3 [cs.LG] UPDATED)
    Disinformation refers to false information deliberately spread to influence the general public, and the negative impact of disinformation on society can be observed in numerous issues, such as political agendas and manipulating financial markets. In this paper, we identify prevalent challenges and advances related to automated disinformation detection from multiple aspects and propose a comprehensive and explainable disinformation detection framework called DISCO. It leverages the heterogeneity of disinformation and addresses the opaqueness of prediction. Then we provide a demonstration of DISCO on a real-world fake news detection task with satisfactory detection accuracy and explanation. The demo video and source code of DISCO is now publicly available https://github.com/DongqiFu/DISCO. We expect that our demo could pave the way for addressing the limitations of identification, comprehension, and explainability as a whole.
    Towards Sparsified Federated Neuroimaging Models via Weight Pruning. (arXiv:2208.11669v1 [cs.LG])
    Federated training of large deep neural networks can often be restrictive due to the increasing costs of communicating the updates with increasing model sizes. Various model pruning techniques have been designed in centralized settings to reduce inference times. Combining centralized pruning techniques with federated training seems intuitive for reducing communication costs -- by pruning the model parameters right before the communication step. Moreover, such a progressive model pruning approach during training can also reduce training times/costs. To this end, we propose FedSparsify, which performs model pruning during federated training. In our experiments in centralized and federated settings on the brain age prediction task (estimating a person's age from their brain MRI), we demonstrate that models can be pruned up to 95% sparsity without affecting performance even in challenging federated learning environments with highly heterogeneous data distributions. One surprising benefit of model pruning is improved model privacy. We demonstrate that models with high sparsity are less susceptible to membership inference attacks, a type of privacy attack.
    Synergy: Resource Sensitive DNN Scheduling in Multi-Tenant Clusters. (arXiv:2110.06073v2 [cs.DC] UPDATED)
    Training Deep Neural Networks (DNNs) is a widely popular workload in both enterprises and cloud data centers. Existing schedulers for DNN training consider GPU as the dominant resource, and allocate other resources such as CPU and memory proportional to the number of GPUs requested by the job. Unfortunately, these schedulers do not consider the impact of a job's sensitivity to allocation of CPU, memory, and storage resources. In this work, we propose Synergy, a resource-sensitive scheduler for shared GPU clusters. Synergy infers the sensitivity of DNNs to different resources using optimistic profiling; some jobs might benefit from more than the GPU-proportional allocation and some jobs might not be affected by less than GPU-proportional allocation. Synergy performs such multi-resource workload-aware assignments across a set of jobs scheduled on shared multi-tenant clusters using a new near-optimal online algorithm. Our experiments show that workload-aware CPU and memory allocations can improve average JCT up to 3.4x when compared to traditional GPU-proportional scheduling.
    Fairness for AUC via Feature Augmentation. (arXiv:2111.12823v2 [cs.LG] UPDATED)
    We study fairness in the context of classification where the performance is measured by the area under the curve (AUC) of the receiver operating characteristic. AUC is commonly used to measure the performance of prediction models. The same classifier can have significantly varying AUCs for different protected groups and, in real-world applications, it is often desirable to reduce such cross-group differences. We address the problem of how to acquire additional features to most greatly improve AUC for the disadvantaged group. We develop a novel approach, fairAUC, based on feature augmentation (adding features) to mitigate bias between identifiable groups. The approach requires only a few summary statistics to offer provable guarantees on AUC improvement, and allows managers flexibility in determining where in the fairness-accuracy tradeoff they would like to be. We evaluate fairAUC on synthetic and real-world datasets and find that it significantly improves AUC for the disadvantaged group relative to benchmarks maximizing overall AUC and minimizing bias between groups.
    Constraint-driven multi-task learning. (arXiv:2208.11656v1 [cs.LG])
    Inductive logic programming is a form of machine learning based on mathematical logic that generates logic programs from given examples and background knowledge. In this project, we extend the Popper ILP system to make use of multi-task learning. We implement the state-of-the-art approach and several new strategies to improve search performance. Furthermore, we introduce constraint preservation, a technique that improves overall performance for all approaches. Constraint preservation allows the system to transfer knowledge between updates on the background knowledge set. Consequently, we reduce the amount of repeated work performed by the system. Additionally, constraint preservation allows us to transition from the current state-of-the-art iterative deepening search approach to a more efficient breadth first search approach. Finally, we experiment with curriculum learning techniques and show their potential benefit to the field.
    AlphaZero-Inspired Game Learning: Faster Training by Using MCTS Only at Test Time. (arXiv:2204.13307v2 [cs.LG] UPDATED)
    Recently, the seminal algorithms AlphaGo and AlphaZero have started a new era in game learning and deep reinforcement learning. While the achievements of AlphaGo and AlphaZero - playing Go and other complex games at super human level - are truly impressive, these architectures have the drawback that they require high computational resources. Many researchers are looking for methods that are similar to AlphaZero, but have lower computational demands and are thus more easily reproducible. In this paper, we pick an important element of AlphaZero - the Monte Carlo Tree Search (MCTS) planning stage - and combine it with temporal difference (TD) learning agents. We wrap MCTS for the first time around TD n-tuple networks and we use this wrapping only at test time to create versatile agents that keep at the same time the computational demands low. We apply this new architecture to several complex games (Othello, ConnectFour, Rubik's Cube) and show the advantages achieved with this AlphaZero-inspired MCTS wrapper. In particular, we present results that this agent is the first one trained on standard hardware (no GPU or TPU) to beat the very strong Othello program Edax up to and including level 7 (where most other learning-from-scratch algorithms could only defeat Edax up to level 2).
    A Riemannian Newton Trust-Region Method for Fitting Gaussian Mixture Models. (arXiv:2104.14957v2 [stat.ML] UPDATED)
    Gaussian Mixture Models are a powerful tool in Data Science and Statistics that are mainly used for clustering and density approximation. The task of estimating the model parameters is in practice often solved by the Expectation Maximization (EM) algorithm which has its benefits in its simplicity and low per-iteration costs. However, the EM converges slowly if there is a large share of hidden information or overlapping clusters. Recent advances in manifold optimization for Gaussian Mixture Models have gained increasing interest. We introduce an explicit formula for the Riemannian Hessian for Gaussian Mixture Models. On top, we propose a new Riemannian Newton Trust-Region method which outperforms current approaches both in terms of runtime and number of iterations. We apply our method on clustering problems and density approximation tasks. Our method is very powerful for data with a large share of hidden information compared to existing methods.
    Theoretical insights into the optimization landscape of over-parameterized shallow neural networks. (arXiv:1707.04926v3 [cs.LG] UPDATED)
    In this paper we study the problem of learning a shallow artificial neural network that best fits a training data set. We study this problem in the over-parameterized regime where the number of observations are fewer than the number of parameters in the model. We show that with quadratic activations the optimization landscape of training such shallow neural networks has certain favorable characteristics that allow globally optimal models to be found efficiently using a variety of local search heuristics. This result holds for an arbitrary training data of input/output pairs. For differentiable activation functions we also show that gradient descent, when suitably initialized, converges at a linear rate to a globally optimal model. This result focuses on a realizable model where the inputs are chosen i.i.d. from a Gaussian distribution and the labels are generated according to planted weight coefficients.
    FedOS: using open-set learning to stabilize training in federated learning. (arXiv:2208.11512v1 [stat.ML])
    Federated Learning is a recent approach to train statistical models on distributed datasets without violating privacy constraints. The data locality principle is preserved by sharing the model instead of the data between clients and the server. This brings many advantages but also poses new challenges. In this report, we explore this new research area and perform several experiments to deepen our understanding of what these challenges are and how different problem settings affect the performance of the final model. Finally, we present a novel approach to one of these challenges and compare it to other methods found in literature.
    On a Built-in Conflict between Deep Learning and Systematic Generalization. (arXiv:2208.11633v1 [cs.LG])
    In this paper, we hypothesize that internal function sharing is one of the reasons to weaken o.o.d. or systematic generalization in deep learning for classification tasks. Under equivalent prediction, a model partitions an input space into multiple parts separated by boundaries. The function sharing prefers to reuse boundaries, leading to fewer parts for new outputs, which conflicts with systematic generalization. We show such phenomena in standard deep learning models, such as fully connected, convolutional, residual networks, LSTMs, and (Vision) Transformers. We hope this study provides novel insights into systematic generalization and forms a basis for new research directions.
    Error-Correcting Neural Networks for Semi-Lagrangian Advection in the Level-Set Method. (arXiv:2110.11611v2 [cs.LG] UPDATED)
    We present a machine learning framework that blends image super-resolution technologies with passive, scalar transport in the level-set method. Here, we investigate whether we can compute on-the-fly, data-driven corrections to minimize numerical viscosity in the coarse-mesh evolution of an interface. The proposed system's starting point is the semi-Lagrangian formulation. And, to reduce numerical dissipation, we introduce an error-quantifying multilayer perceptron. The role of this neural network is to improve the numerically estimated surface trajectory. To do so, it processes localized level-set, velocity, and positional data in a single time frame for select vertices near the moving front. Our main contribution is thus a novel machine-learning-augmented transport algorithm that operates alongside selective redistancing and alternates with conventional advection to keep the adjusted interface trajectory smooth. Consequently, our procedure is more efficient than full-scan convolutional-based applications because it concentrates computational effort only around the free boundary. Also, we show through various tests that our strategy is effective at counteracting both numerical diffusion and mass loss. In simple advection problems, for example, our method can achieve the same precision as the baseline scheme at twice the resolution but at a fraction of the cost. Similarly, our hybrid technique can produce feasible solidification fronts for crystallization processes. On the other hand, tangential shear flows and highly deforming simulations can precipitate bias artifacts and inference deterioration. Likewise, stringent design velocity constraints can limit our solver's application to problems involving rapid interface changes. In the latter cases, we have identified several opportunities to enhance robustness without forgoing our approach's basic concept.
    Discovering Transferable Forensic Features for CNN-generated Images Detection. (arXiv:2208.11342v1 [cs.CV])
    Visual counterfeits are increasingly causing an existential conundrum in mainstream media with rapid evolution in neural image synthesis methods. Though detection of such counterfeits has been a taxing problem in the image forensics community, a recent class of forensic detectors -- universal detectors -- are able to surprisingly spot counterfeit images regardless of generator architectures, loss functions, training datasets, and resolutions. This intriguing property suggests the possible existence of transferable forensic features (T-FF) in universal detectors. In this work, we conduct the first analytical study to discover and understand T-FF in universal detectors. Our contributions are 2-fold: 1) We propose a novel forensic feature relevance statistic (FF-RS) to quantify and discover T-FF in universal detectors and, 2) Our qualitative and quantitative investigations uncover an unexpected finding: color is a critical T-FF in universal detectors. Code and models are available at https://keshik6.github.io/transferable-forensic-features/
    Metric Effects based on Fluctuations in values of k in Nearest Neighbor Regressor. (arXiv:2208.11540v1 [cs.LG])
    Regression branch of Machine Learning purely focuses on prediction of continuous values. The supervised learning branch has many regression based methods with parametric and non-parametric learning models. In this paper we aim to target a very subtle point related to distance based regression model. The distance based model used is K-Nearest Neighbors Regressor which is a supervised non-parametric method. The point that we want to prove is the effect of k parameter of the model and its fluctuations affecting the metrics. The metrics that we use are Root Mean Squared Error and R-Squared Goodness of Fit with their visual representation of values with respect to k values.
    DCSF: Deep Convolutional Set Functions for Classification of Asynchronous Time Series. (arXiv:2208.11374v1 [cs.LG])
    Asynchronous Time Series is a multivariate time series where all the channels are observed asynchronously-independently, making the time series extremely sparse when aligning them. We often observe this effect in applications with complex observation processes, such as health care, climate science, and astronomy, to name a few. Because of the asynchronous nature, they pose a significant challenge to deep learning architectures, which presume that the time series presented to them are regularly sampled, fully observed, and aligned with respect to time. This paper proposes a novel framework, that we call Deep Convolutional Set Functions (DCSF), which is highly scalable and memory efficient, for the asynchronous time series classification task. With the recent advancements in deep set learning architectures, we introduce a model that is invariant to the order in which time series' channels are presented to it. We explore convolutional neural networks, which are well researched for the closely related problem-classification of regularly sampled and fully observed time series, for encoding the set elements. We evaluate DCSF for AsTS classification, and online (per time point) AsTS classification. Our extensive experiments on multiple real-world and synthetic datasets verify that the suggested model performs substantially better than a range of state-of-the-art models in terms of accuracy and run time.
    Using Conservation Laws to Infer Deep Learning Model Accuracy of Richtmyer-meshkov Instabilities. (arXiv:2208.11477v1 [physics.flu-dyn])
    Richtmyer-Meshkov Instability (RMI) is a complicated phenomenon that occurs when a shockwave passes through a perturbed interface. Over a thousand hydrodynamic simulations were performed to study the formation of RMI for a parameterized high velocity impact. Deep learning was used to learn the temporal mapping of initial geometric perturbations to the full-field hydrodynamic solutions of density and velocity. The continuity equation was used to include physical information into the loss function, however only resulted in very minor improvements at the cost of additional training complexity. Predictions from the deep learning model appear to accurately capture temporal RMI formations for a variety of geometric conditions within the domain. First principle physical laws were investigated to infer the accuracy of the model's predictive capability. While the continuity equation appeared to show no correlation with the accuracy of the model, conservation of mass and momentum were weakly correlated with accuracy. Since conservation laws can be quickly calculated from the deep learning model, they may be useful in applications where a relative accuracy measure is needed.
    Bugs in the Data: How ImageNet Misrepresents Biodiversity. (arXiv:2208.11695v1 [cs.CV])
    ImageNet-1k is a dataset often used for benchmarking machine learning (ML) models and evaluating tasks such as image recognition and object detection. Wild animals make up 27% of ImageNet-1k but, unlike classes representing people and objects, these data have not been closely scrutinized. In the current paper, we analyze the 13,450 images from 269 classes that represent wild animals in the ImageNet-1k validation set, with the participation of expert ecologists. We find that many of the classes are ill-defined or overlapping, and that 12% of the images are incorrectly labeled, with some classes having >90% of images incorrect. We also find that both the wildlife-related labels and images included in ImageNet-1k present significant geographical and cultural biases, as well as ambiguities such as artificial animals, multiple species in the same image, or the presence of humans. Our findings highlight serious issues with the extensive use of this dataset for evaluating ML systems, the use of such algorithms in wildlife-related tasks, and more broadly the ways in which ML datasets are commonly created and curated.
    A Graph Convolution for Signed Directed Graphs. (arXiv:2208.11511v1 [cs.LG])
    There are several types of graphs according to the nature of the data. Directed graphs have directions of links, and signed graphs have link types such as positive and negative. Signed directed graphs are the most complex and informative that have both. Graph convolutions for signed directed graphs have not been delivered much yet. Though many graph convolution studies have been provided, most are designed for undirected or unsigned. In this paper, we investigate a spectral graph convolution network for signed directed graphs. We propose a novel complex Hermitian adjacency matrix that encodes graph information via complex numbers. The complex numbers represent link direction, sign, and connectivity via the phases and magnitudes. Then, we define a magnetic Laplacian with the Hermitian matrix and prove its positive semidefinite property. Finally, we introduce Signed Directed Graph Convolution Network(SD-GCN). To the best of our knowledge, it is the first spectral convolution for graphs with signs. Moreover, unlike the existing convolutions designed for a specific graph type, the proposed model has generality that can be applied to any graphs, including undirected, directed, or signed. The performance of the proposed model was evaluated with four real-world graphs. It outperforms all the other state-of-the-art graph convolutions in the task of link sign prediction.
    A Low-Complexity Approach to Rate-Distortion Optimized Variable Bit-Rate Compression for Split DNN Computing. (arXiv:2208.11596v1 [cs.LG])
    Split computing has emerged as a recent paradigm for implementation of DNN-based AI workloads, wherein a DNN model is split into two parts, one of which is executed on a mobile/client device and the other on an edge-server (or cloud). Data compression is applied to the intermediate tensor from the DNN that needs to be transmitted, addressing the challenge of optimizing the rate-accuracy-complexity trade-off. Existing split-computing approaches adopt ML-based data compression, but require that the parameters of either the entire DNN model, or a significant portion of it, be retrained for different compression levels. This incurs a high computational and storage burden: training a full DNN model from scratch is computationally demanding, maintaining multiple copies of the DNN parameters increases storage requirements, and switching the full set of weights during inference increases memory bandwidth. In this paper, we present an approach that addresses all these challenges. It involves the systematic design and training of bottleneck units - simple, low-cost neural networks - that can be inserted at the point of split. Our approach is remarkably lightweight, both during training and inference, highly effective and achieves excellent rate-distortion performance at a small fraction of the compute and storage overhead compared to existing methods.
    Explainable AI for tailored electricity consumption feedback -- an experimental evaluation of visualizations. (arXiv:2208.11408v1 [cs.HC])
    Machine learning (ML) methods can effectively analyse data, recognize patterns in them, and make high-quality predictions. Good predictions usually come along with "black-box" models that are unable to present the detected patterns in a human-readable way. Technical developments recently led to eXplainable Artificial Intelligence (XAI) techniques that aim to open such black-boxes and enable humans to gain new insights from detected patterns. We investigated the application of XAI in an area where specific insights can have a significant effect on consumer behaviour, namely electricity use. Knowing that specific feedback on individuals' electricity consumption triggers resource conservation, we created five visualizations with ML and XAI methods from electricity consumption time series for highly personalized feedback, considering existing domain-specific design knowledge. Our experimental evaluation with 152 participants showed that humans can assimilate the pattern displayed by XAI visualizations, but such visualizations should follow known visualization patterns to be well-understood by users.
    A model-based approach to meta-Reinforcement Learning: Transformers and tree search. (arXiv:2208.11535v1 [cs.LG])
    Meta-learning is a line of research that develops the ability to leverage past experiences to efficiently solve new learning problems. Meta-Reinforcement Learning (meta-RL) methods demonstrate a capability to learn behaviors that efficiently acquire and exploit information in several meta-RL problems. In this context, the Alchemy benchmark has been proposed by Wang et al. [2021]. Alchemy features a rich structured latent space that is challenging for state-of-the-art model-free RL methods. These methods fail to learn to properly explore then exploit. We develop a model-based algorithm. We train a model whose principal block is a Transformer Encoder to fit the symbolic Alchemy environment dynamics. Then we define an online planner with the learned model using a tree search method. This algorithm significantly outperforms previously applied model-free RL methods on the symbolic Alchemy problem. Our results reveal the relevance of model-based approaches with online planning to perform exploration and exploitation successfully in meta-RL. Moreover, we show the efficiency of the Transformer architecture to learn complex dynamics that arise from latent spaces present in meta-RL problems.
    Fractional SDE-Net: Generation of Time Series Data with Long-term Memory. (arXiv:2201.05974v2 [cs.LG] UPDATED)
    In this paper, we focus on the generation of time-series data using neural networks. It is often the case that input time-series data have only one realized (and usually irregularly sampled) path, which makes it difficult to extract time-series characteristics, and its noise structure is more complicated than i.i.d. type. Time series data, especially from hydrology, telecommunications, economics, and finance, exhibit long-term memory also called long-range dependency (LRD). The main purpose of this paper is to artificially generate time series with the help of neural networks, making the LRD of paths into account. We propose fSDE-Net: neural fractional Stochastic Differential Equation Network. It generalizes the neural stochastic differential equation model by using fractional Brownian motion with a Hurst index larger than half, which exhibits the LRD property. We derive the solver of fSDE-Net and theoretically analyze the existence and uniqueness of the solution to fSDE-Net. Our experiments with artificial and real time-series data demonstrate that the fSDE-Net model can replicate distributional properties well.
    Collaborative Algorithms for Online Personalized Mean Estimation. (arXiv:2208.11530v1 [cs.LG])
    We consider an online estimation problem involving a set of agents. Each agent has access to a (personal) process that generates samples from a real-valued distribution and seeks to estimate its mean. We study the case where some of the distributions have the same mean, and the agents are allowed to actively query information from other agents. The goal is to design an algorithm that enables each agent to improve its mean estimate thanks to communication with other agents. The means as well as the number of distributions with same mean are unknown, which makes the task nontrivial. We introduce a novel collaborative strategy to solve this online personalized mean estimation problem. We analyze its time complexity and introduce variants that enjoy good performance in numerical experiments. We also extend our approach to the setting where clusters of agents with similar means seek to estimate the mean of their cluster.
    Time-to-Green predictions for fully-actuated signal control systems with supervised learning. (arXiv:2208.11344v1 [cs.LG])
    Recently, efforts have been made to standardize signal phase and timing (SPaT) messages. These messages contain signal phase timings of all signalized intersection approaches. This information can thus be used for efficient motion planning, resulting in more homogeneous traffic flows and uniform speed profiles. Despite efforts to provide robust predictions for semi-actuated signal control systems, predicting signal phase timings for fully-actuated controls remains challenging. This paper proposes a time series prediction framework using aggregated traffic signal and loop detector data. We utilize state-of-the-art machine learning models to predict future signal phases' duration. The performance of a Linear Regression (LR), a Random Forest (RF), and a Long-Short-Term-Memory (LSTM) neural network are assessed against a naive baseline model. Results based on an empirical data set from a fully-actuated signal control system in Zurich, Switzerland, show that machine learning models outperform conventional prediction methods. Furthermore, tree-based decision models such as the RF perform best with an accuracy that meets requirements for practical applications.
    A Bayesian Variational principle for dynamic Self Organizing Maps. (arXiv:2208.11337v1 [cs.LG])
    We propose organisation conditions that yield a method for training SOM with adaptative neighborhood radius in a variational Bayesian framework. This method is validated on a non-stationary setting and compared in an high-dimensional setting with an other adaptative method.
    Towards Efficient Use of Multi-Scale Features in Transformer-Based Object Detectors. (arXiv:2208.11356v1 [cs.CV])
    Multi-scale features have been proven highly effective for object detection, and most ConvNet-based object detectors adopt Feature Pyramid Network (FPN) as a basic component for exploiting multi-scale features. However, for the recently proposed Transformer-based object detectors, directly incorporating multi-scale features leads to prohibitive computational overhead due to the high complexity of the attention mechanism for processing high-resolution features. This paper presents Iterative Multi-scale Feature Aggregation (IMFA) -- a generic paradigm that enables the efficient use of multi-scale features in Transformer-based object detectors. The core idea is to exploit sparse multi-scale features from just a few crucial locations, and it is achieved with two novel designs. First, IMFA rearranges the Transformer encoder-decoder pipeline so that the encoded features can be iteratively updated based on the detection predictions. Second, IMFA sparsely samples scale-adaptive features for refined detection from just a few keypoint locations under the guidance of prior detection predictions. As a result, the sampled multi-scale features are sparse yet still highly beneficial for object detection. Extensive experiments show that the proposed IMFA boosts the performance of multiple Transformer-based object detectors significantly yet with slight computational overhead. Project page: https://github.com/ZhangGongjie/IMFA.
    Adverse Childhood Experiences Identification from Clinical Notes with Ontologies and NLP. (arXiv:2208.11466v1 [cs.CL])
    Adverse Childhood Experiences (ACEs) are defined as a collection of highly stressful, and potentially traumatic, events or circumstances that occur throughout childhood and/or adolescence. They have been shown to be associated with increased risks of mental health diseases or other abnormal behaviours in later lives. However, the identification of ACEs from free-text Electronic Health Records (EHRs) with Natural Language Processing (NLP) is challenging because (a) there is no NLP ready ACE ontologies; (b) there are limited cases available for machine learning, necessitating the data annotation from clinical experts. We are currently developing a tool that would use NLP techniques to assist us in surfacing ACEs from clinical notes. This will enable us further research in identifying evidence of the relationship between ACEs and the subsequent developments of mental illness (e.g., addictions) in large-scale and longitudinal free-text EHRs, which has previously not been possible.
    Augmented cross-selling through explainable AI -- a case from energy retailing. (arXiv:2208.11404v1 [cs.LG])
    The advance of Machine Learning (ML) has led to a strong interest in this technology to support decision making. While complex ML models provide predictions that are often more accurate than those of traditional tools, such models often hide the reasoning behind the prediction from their users, which can lead to lower adoption and lack of insight. Motivated by this tension, research has put forth Explainable Artificial Intelligence (XAI) techniques that uncover patterns discovered by ML. Despite the high hopes in both ML and XAI, there is little empirical evidence of the benefits to traditional businesses. To this end, we analyze data on 220,185 customers of an energy retailer, predict cross-purchases with up to 86% correctness (AUC), and show that the XAI method SHAP provides explanations that hold for actual buyers. We further outline implications for research in information systems, XAI, and relationship marketing.
    Improved Zero-Shot Audio Tagging & Classification with Patchout Spectrogram Transformers. (arXiv:2208.11402v1 [cs.SD])
    Standard machine learning models for tagging and classifying acoustic signals cannot handle classes that were not seen during training. Zero-Shot (ZS) learning overcomes this restriction by predicting classes based on adaptable class descriptions. This study sets out to investigate the effectiveness of self-attention-based audio embedding architectures for ZS learning. To this end, we compare the very recent patchout spectrogram transformer with two classic convolutional architectures. We evaluate these three architectures on three tasks and on three different benchmark datasets: general-purpose tagging on AudioSet, environmental sound classification on ESC-50, and instrument tagging on OpenMIC. Our results show that the self-attention-based embedding methods outperform both compared convolutional architectures in all of these settings. By designing training and test data accordingly, we observe that prediction performance suffers significantly when the `semantic distance' between training and new test classes is large, an effect that will deserve more detailed investigations.
    ADMoE: Anomaly Detection with Mixture-of-Experts from Noisy Labels. (arXiv:2208.11290v1 [cs.LG])
    Existing works on anomaly detection (AD) rely on clean labels from human annotators that are expensive to acquire in practice. In this work, we propose a method to leverage weak/noisy labels (e.g., risk scores generated by machine rules for detecting malware) that are cheaper to obtain for anomaly detection. Specifically, we propose ADMoE, the first framework for anomaly detection algorithms to learn from noisy labels. In a nutshell, ADMoE leverages mixture-of-experts (MoE) architecture to encourage specialized and scalable learning from multiple noisy sources. It captures the similarities among noisy labels by sharing most model parameters, while encouraging specialization by building "expert" sub-networks. To further juice out the signals from noisy labels, ADMoE uses them as input features to facilitate expert learning. Extensive results on eight datasets (including a proprietary enterprise security dataset) demonstrate the effectiveness of ADMoE, where it brings up to 34% performance improvement over not using it. Also, it outperforms a total of 13 leading baselines with equivalent network parameters and FLOPS. Notably, ADMoE is model-agnostic to enable any neural network-based detection methods to handle noisy labels, where we showcase its results on both multiple-layer perceptron (MLP) and the leading AD method DeepSAD.
    Multi-objective optimization of actuation waveform for high-precision drop-on-demand inkjet printing. (arXiv:2208.11301v1 [physics.flu-dyn])
    Drop-on-demand (DOD) inkjet printing has been considered as one of promising technologies for the fabrication of advanced functional materials. For a DOD printer, high-precision dispensing techniques for achieving satellite-free smaller droplets, have long been desired for patterning thin-film structures. The present study considers the inlet velocity of a liquid chamber located upstream of a dispensing nozzle as a control variable and aims to optimize its waveform using a sample-efficient Bayesian optimization algorithm. Firstly, the droplet dispensing dynamics are numerically reproduced by using an open-source OpenFOAM solver, interFoam, and the results are passed on to another code based on pyFoam. Then, the parameters characterizing the actuation waveform driving a DOD printer are determined by the Bayesian optimization (BO) algorithm so as to maximize a prescribed multi-objective function expressed as the sum of two factors, i.e., the size of a primary droplet and the presence of satellite droplets. The results show that the present BO algorithm can successfully find high-precision dispensing waveforms within 150 simulations. Specifically, satellite droplets can be effectively eliminated and the droplet diameter can be significantly reduced to 24.9% of the nozzle diameter by applying the optimal waveform.
    TESTSGD: Interpretable Testing of Neural Networks Against Subtle Group Discrimination. (arXiv:2208.11321v1 [cs.LG])
    Discrimination has been shown in many machine learning applications, which calls for sufficient fairness testing before their deployment in ethic-relevant domains such as face recognition, medical diagnosis and criminal sentence. Existing fairness testing approaches are mostly designed for identifying individual discrimination, i.e., discrimination against individuals. Yet, as another widely concerning type of discrimination, testing against group discrimination, mostly hidden, is much less studied. To address the gap, in this work, we propose TESTSGD, an interpretable testing approach which systematically identifies and measures hidden (which we call `subtle' group discrimination} of a neural network characterized by conditions over combinations of the sensitive features. Specifically, given a neural network, TESTSGDfirst automatically generates an interpretable rule set which categorizes the input space into two groups exposing the model's group discrimination. Alongside, TESTSGDalso provides an estimated group fairness score based on sampling the input space to measure the degree of the identified subtle group discrimination, which is guaranteed to be accurate up to an error bound. We evaluate TESTSGDon multiple neural network models trained on popular datasets including both structured data and text data. The experiment results show that TESTSGDis effective and efficient in identifying and measuring such subtle group discrimination that has never been revealed before. Furthermore, we show that the testing results of TESTSGDcan guide generation of new samples to mitigate such discrimination through retraining with negligible accuracy drop.
    Psychophysical Machine Learning. (arXiv:2208.11236v1 [cs.LG])
    The Weber Fechner Law of psychophysics observes that human perception is logarithmic in the stimulus. We present an algorithm for incorporating the Weber Fechner law into loss functions for machine learning, and use the algorithm to enhance the performance of deep learning networks.
    Quantum Multi-Agent Meta Reinforcement Learning. (arXiv:2208.11510v1 [quant-ph])
    Although quantum supremacy is yet to come, there has recently been an increasing interest in identifying the potential of quantum machine learning (QML) in the looming era of practical quantum computing. Motivated by this, in this article we re-design multi-agent reinforcement learning (MARL) based on the unique characteristics of quantum neural networks (QNNs) having two separate dimensions of trainable parameters: angle parameters affecting the output qubit states, and pole parameters associated with the output measurement basis. Exploiting this dyadic trainability as meta-learning capability, we propose quantum meta MARL (QM2ARL) that first applies angle training for meta-QNN learning, followed by pole training for few-shot or local-QNN training. To avoid overfitting, we develop an angle-to-pole regularization technique injecting noise into the pole domain during angle training. Furthermore, by exploiting the pole as the memory address of each trained QNN, we introduce the concept of pole memory allowing one to save and load trained QNNs using only two-parameter pole values. We theoretically prove the convergence of angle training under the angle-to-pole regularization, and by simulation corroborate the effectiveness of QM2ARL in achieving high reward and fast convergence, as well as of the pole memory in fast adaptation to a time-varying environment.
    Comparison of Object Detection Algorithms for Street-level Objects. (arXiv:2208.11315v1 [cs.CV])
    Object detection for street-level objects can be applied to various use cases, from car and traffic detection to the self-driving car system. Therefore, finding the best object detection algorithm is essential to apply it effectively. Many object detection algorithms have been released, and many have compared object detection algorithms, but few have compared the latest algorithms, such as YOLOv5, primarily which focus on street-level objects. This paper compares various one-stage detector algorithms; SSD MobileNetv2 FPN-lite 320x320, YOLOv3, YOLOv4, YOLOv5l, and YOLOv5s for street-level object detection within real-time images. The experiment utilizes a modified Udacity Self Driving Car Dataset with 3,169 images. Dataset is split into train, validation, and test; Then, it is preprocessed and augmented using rescaling, hue shifting, and noise. Each algorithm is then trained and evaluated. Based on the experiments, the algorithms have produced decent results according to the inference time and the values of their precision, recall, F1-Score, and Mean Average Precision (mAP). The results also shows that YOLOv5l outperforms the other algorithms in terms of accuracy with a mAP@.5 of 0.593, MobileNetv2 FPN-lite has the fastest inference time among the others with only 3.20ms inference time. It is also found that YOLOv5s is the most efficient, with it having a YOLOv5l accuracy and a speed almost as quick as the MobileNetv2 FPN-lite. This shows that various algorithm are suitable for street-level object detection and viable enough to be used in self-driving car.
    Accelerating SGD for Highly Ill-Conditioned Huge-Scale Online Matrix Completion. (arXiv:2208.11246v1 [cs.LG])
    The matrix completion problem seeks to recover a $d\times d$ ground truth matrix of low rank $r\ll d$ from observations of its individual elements. Real-world matrix completion is often a huge-scale optimization problem, with $d$ so large that even the simplest full-dimension vector operations with $O(d)$ time complexity become prohibitively expensive. Stochastic gradient descent (SGD) is one of the few algorithms capable of solving matrix completion on a huge scale, and can also naturally handle streaming data over an evolving ground truth. Unfortunately, SGD experiences a dramatic slow-down when the underlying ground truth is ill-conditioned; it requires at least $O(\kappa\log(1/\epsilon))$ iterations to get $\epsilon$-close to ground truth matrix with condition number $\kappa$. In this paper, we propose a preconditioned version of SGD that preserves all the favorable practical qualities of SGD for huge-scale online optimization while also making it agnostic to $\kappa$. For a symmetric ground truth and the Root Mean Square Error (RMSE) loss, we prove that the preconditioned SGD converges to $\epsilon$-accuracy in $O(\log(1/\epsilon))$ iterations, with a rapid linear convergence rate as if the ground truth were perfectly conditioned with $\kappa=1$. In our numerical experiments, we observe a similar acceleration for ill-conditioned matrix completion under the 1-bit cross-entropy loss, as well as pairwise losses such as the Bayesian Personalized Ranking (BPR) loss.
    Semi-Supervised and Unsupervised Deep Visual Learning: A Survey. (arXiv:2208.11296v1 [cs.CV])
    State-of-the-art deep learning models are often trained with a large amount of costly labeled training data. However, requiring exhaustive manual annotations may degrade the model's generalizability in the limited-label regime. Semi-supervised learning and unsupervised learning offer promising paradigms to learn from an abundance of unlabeled visual data. Recent progress in these paradigms has indicated the strong benefits of leveraging unlabeled data to improve model generalization and provide better model initialization. In this survey, we review the recent advanced deep learning algorithms on semi-supervised learning (SSL) and unsupervised learning (UL) for visual recognition from a unified perspective. To offer a holistic understanding of the state-of-the-art in these areas, we propose a unified taxonomy. We categorize existing representative SSL and UL with comprehensive and insightful analysis to highlight their design rationales in different learning scenarios and applications in different computer vision tasks. Lastly, we discuss the emerging trends and open challenges in SSL and UL to shed light on future critical research directions.
    Secondary Protein Structure Prediction Using Neural Networks. (arXiv:2208.11248v1 [cs.LG])
    In this paper we experiment with using neural network structures to predict a protein's secondary structure ({\alpha} helix positions) from only its primary structure (amino acid sequence). We implement a fully connected neural network (FCNN) and preform three experiments using that FCNN. Firstly, we do a cross-species comparison of models trained and tested on mouse and human datasets. Secondly, we test the impact of varying the length of protein sequence we input into the model. Thirdly, we compare custom error functions designed to focus on the center of the input window. At the end of paper we propose a alternative, recurrent neural network model which can be applied to the problem.
    Preprocessing Source Code Comments for Linguistic Models. (arXiv:2208.11235v1 [cs.SE])
    Comments are an important part of the source code and are a primary source of documentation. This has driven interest in using large bodies of comments to train or evaluate tools that consume or produce them -- such as generating oracles or even code from comments, or automatically generating code summaries. Most of this work makes strong assumptions about the structure and quality of comments, such as assuming they consist mostly of proper English sentences. However, we know little about the actual quality of existing comments for these use cases. Comments often contain unique structures and elements that are not seen in other types of text, and filtering or extracting information from them requires some extra care. This paper explores the contents and quality of Python comments drawn from 840 most popular open source projects from GitHub and 8422 projects from SriLab dataset, and the impact of na\"ive vs. in-depth filtering can have on the use of existing comments for training and evaluation of systems that generate comments.
    Benchmark Dataset for Precipitation Forecasting by Post-Processing the Numerical Weather Prediction. (arXiv:2206.15241v2 [cs.LG] UPDATED)
    Precipitation forecasting is an important scientific challenge that has wide-reaching impacts on society. Historically, this challenge has been tackled using numerical weather prediction (NWP) models, grounded on physics-based simulations. Recently, many works have proposed an alternative approach, using end-to-end deep learning (DL) models to replace physics-based NWP models. While these DL methods show improved performance and computational efficiency, they exhibit limitations in long-term forecasting and lack the explainability. In this work, we present a hybrid NWP-DL workflow to fill the gap between standalone NWP and DL approaches. Under this workflow, the outputs of NWP models are fed into a deep neural network, which post-processes the data to yield a refined precipitation forecast. The deep model is trained with supervision, using Automatic Weather Station (AWS) observations as ground-truth labels. This can achieve the best of both worlds, and can even benefit from future improvements in NWP technology. To facilitate study in this direction, we present a novel dataset focused on the Korean Peninsula, termed KoMet (Korea Meteorological Dataset), comprised of NWP outputs and AWS observations. For the NWP model, the Global Data Assimilation and Prediction Systems-Korea Integrated Model (GDAPS-KIM) is utilized. We provide analysis on a comprehensive set of baseline methods aimed at addressing the challenges of KoMet, including the sparsity of AWS observations and class imbalance. To lower the barrier to entry and encourage further study, we also provide an extensive open-source Python package for data processing and model development. Our benchmark data and code are available at https://github.com/osilab-kaist/KoMet-Benchmark-Dataset.
    Probability flow solution of the Fokker-Planck equation. (arXiv:2206.04642v2 [cs.LG] UPDATED)
    The method of choice for integrating the time-dependent Fokker-Planck equation in high-dimension is to generate samples from the solution via integration of the associated stochastic differential equation. Here, we introduce an alternative scheme based on integrating an ordinary differential equation that describes the flow of probability. Unlike the stochastic dynamics, this equation deterministically pushes samples from the initial density onto samples from the solution at any later time. The method has the advantage of giving direct access to quantities that are challenging to estimate from stochastic trajectories, such as the probability current, the density itself, and its entropy. The probability flow equation depends on the gradient of the logarithm of the solution (its "score"), and so is a-priori unknown. To resolve this dependence, we model the score with a deep neural network that is learned on-the-fly by propagating a set of samples according to the instantaneous probability current. Our approach is based on recent advances in score-based diffusion for generative modeling, but the training procedure is self-contained and does not require samples from the target density to be available beforehand. To demonstrate the validity of the approach, we consider several examples from the physics of interacting particle systems; we find that the method scales well to high-dimensional systems and accurately matches available analytical solutions and moments computed via Monte-Carlo.
    SPDY: Accurate Pruning with Speedup Guarantees. (arXiv:2201.13096v2 [cs.LG] UPDATED)
    The recent focus on the efficiency of deep neural networks (DNNs) has led to significant work on model compression approaches, of which weight pruning is one of the most popular. At the same time, there is rapidly-growing computational support for efficiently executing the unstructured-sparse models obtained via pruning. Yet, most existing pruning methods minimize just the number of remaining weights, i.e. the size of the model, rather than optimizing for inference time. We address this gap by introducing SPDY, a new compression method which automatically determines layer-wise sparsity targets achieving a desired inference speedup on a given system, while minimizing accuracy loss. SPDY is composed of two new techniques: the first is an efficient dynamic programming algorithm for solving the speedup-constrained layer-wise compression problem assuming a set of given layer-wise sensitivity scores; the second is a local search procedure for determining accurate layer-wise sensitivity scores. Experiments across popular vision and language models show that SPDY guarantees speedups while recovering higher accuracy relative to existing strategies, both for one-shot and gradual pruning scenarios, and is compatible with most existing pruning approaches. We also extend our approach to the recently-proposed task of pruning with very little data, where we achieve the best known accuracy recovery when pruning to the GPU-supported 2:4 sparsity pattern.
    Knowledge Graph Fact Prediction via Knowledge-Enriched Tensor Factorization. (arXiv:1902.03077v1 [cs.LG] CROSS LISTED)
    We present a family of novel methods for embedding knowledge graphs into real-valued tensors. These tensor-based embeddings capture the ordered relations that are typical in the knowledge graphs represented by semantic web languages like RDF. Unlike many previous models, our methods can easily use prior background knowledge provided by users or extracted automatically from existing knowledge graphs. In addition to providing more robust methods for knowledge graph embedding, we provide a provably-convergent, linear tensor factorization algorithm. We demonstrate the efficacy of our models for the task of predicting new facts across eight different knowledge graphs, achieving between 5% and 50% relative improvement over existing state-of-the-art knowledge graph embedding techniques. Our empirical evaluation shows that all of the tensor decomposition models perform well when the average degree of an entity in a graph is high, with constraint-based models doing better on graphs with a small number of highly similar relations and regularization-based models dominating for graphs with relations of varying degrees of similarity.
    Learning to predict synchronization of coupled oscillators on randomly generated graphs. (arXiv:2012.14048v3 [math.DS] UPDATED)
    Suppose we are given a system of coupled oscillators on an unknown graph along with the trajectory of the system during some period. Can we predict whether the system will eventually synchronize? Even with a known underlying graph structure, this is an important yet analytically intractable question in general. In this work, we take an alternative approach to the synchronization prediction problem by viewing it as a classification problem based on the fact that any given system will eventually synchronize or converge to a non-synchronizing limit cycle. By only using some basic statistics of the underlying graphs such as edge density and diameter, our method can achieve perfect accuracy when there is a significant difference in the topology of the underlying graphs between the synchronizing and the non-synchronizing examples. However, in the problem setting where these graph statistics cannot distinguish the two classes very well (e.g., when the graphs are generated from the same random graph model), we find that pairing a few iterations of the initial dynamics along with the graph statistics as the input to our classification algorithms can lead to significant improvement in accuracy; far exceeding what is known by the classical oscillator theory. More surprisingly, we find that in almost all such settings, dropping out the basic graph statistics and training our algorithms with only initial dynamics achieves nearly the same accuracy. We demonstrate our method on three models of continuous and discrete coupled oscillators -- the Kuramoto model, Firefly Cellular Automata, and Greenberg-Hastings model. Finally, we also propose an "ensemble prediction" algorithm that successfully scales our method to large graphs by training on dynamics observed from multiple random subgraphs.
    The Alberta Plan for AI Research. (arXiv:2208.11173v1 [cs.AI])
    Herein we describe our approach to artificial intelligence research, which we call the Alberta Plan. The Alberta Plan is pursued within our research groups in Alberta and by others who are like minded throughout the world. We welcome all who would join us in this pursuit.
    DeepPicarMicro: Applying TinyML to Autonomous Cyber Physical Systems. (arXiv:2208.11212v1 [cs.LG])
    Running deep neural networks (DNNs) on tiny Micro-controller Units (MCUs) is challenging due to their limitations in computing, memory, and storage capacity. Fortunately, recent advances in both MCU hardware and machine learning software frameworks make it possible to run fairly complex neural networks on modern MCUs, resulting in a new field of study widely known as TinyML. However, there have been few studies to show the potential for TinyML applications in cyber physical systems (CPS). In this paper, we present DeepPicarMicro, a small self-driving RC car testbed, which runs a convolutional neural network (CNN) on a Raspberry Pi Pico MCU. We apply a state-of-the-art DNN optimization to successfully fit the well-known PilotNet CNN architecture, which was used to drive NVIDIA's real self-driving car, on the MCU. We apply a state-of-art network architecture search (NAS) approach to find further optimized networks that can effectively control the car in real-time in an end-to-end manner. From an extensive systematic experimental evaluation study, we observe an interesting relationship between the accuracy, latency, and control performance of a system. From this, we propose a joint optimization strategy that takes both accuracy and latency of a model in the network architecture search process for AI enabled CPS.
    Dual Extrapolation for Sparse Generalized Linear Models. (arXiv:1907.05830v3 [stat.ML] UPDATED)
    Generalized Linear Models (GLM) form a wide class of regression and classification models, where prediction is a function of a linear combination of the input variables. For statistical inference in high dimension, sparsity inducing regularizations have proven to be useful while offering statistical guarantees. However, solving the resulting optimization problems can be challenging: even for popular iterative algorithms such as coordinate descent, one needs to loop over a large number of variables. To mitigate this, techniques known as screening rules and working sets diminish the size of the optimization problem at hand, either by progressively removing variables, or by solving a growing sequence of smaller problems. For both techniques, significant variables are identified thanks to convex duality arguments. In this paper, we show that the dual iterates of a GLM exhibit a Vector AutoRegressive (VAR) behavior after sign identification, when the primal problem is solved with proximal gradient descent or cyclic coordinate descent. Exploiting this regularity, one can construct dual points that offer tighter certificates of optimality, enhancing the performance of screening rules and helping to design competitive working set algorithms.
    Probabilistic Robust Autoencoders for Outlier Detection. (arXiv:2110.00494v3 [cs.LG] UPDATED)
    Anomalies (or outliers) are prevalent in real-world empirical observations and potentially mask important underlying structures. Accurate identification of anomalous samples is crucial for the success of downstream data analysis tasks. To automatically identify anomalies, we propose Probabilistic Robust AutoEncoder (PRAE). PRAE aims to simultaneously remove outliers and identify a low-dimensional representation for the inlier samples. We first present the Robust AutoEncoder (RAE) objective as a minimization problem for splitting the data into inliers and outliers. Our objective is designed to exclude outliers while including a subset of samples (inliers) that can be effectively reconstructed using an AutoEncoder (AE). RAE minimizes the autoencoder's reconstruction error while incorporating as many samples as possible. This could be formulated via regularization by subtracting an $\ell_0$ norm counting the number of selected samples from the reconstruction term. Unfortunately, this leads to an intractable combinatorial problem. Therefore, we propose two probabilistic relaxations of RAE, which are differentiable and alleviate the need for a combinatorial search. We prove that the solution to the PRAE problem is equivalent to the solution of RAE. We use synthetic data to show that PRAE can accurately remove outliers in a wide range of contamination levels. Finally, we demonstrate that using PRAE for anomaly detection leads to state-of-the-art results on various benchmark datasets.
    Concentration inequalities and optimal number of layers for stochastic deep neural networks. (arXiv:2206.11241v2 [cs.LG] UPDATED)
    We state concentration and martingale inequalities for the output of the hidden layers of a stochastic deep neural network (SDNN), as well as for the output of the whole SDNN. These results allow us to introduce an expected classifier (EC), and to give probabilistic upper bound for the classification error of the EC. We also state the optimal number of layers for the SDNN via an optimal stopping procedure. We apply our analysis to a stochastic version of a feedforward neural network with ReLU activation function.
    Physics informed machine learning with Smoothed particle hydrodynamics: Hierarchy of reduced Lagrangian models of turbulence. (arXiv:2110.13311v4 [physics.flu-dyn] UPDATED)
    Turbulent flows are ubiquitous, and obtaining efficient, accurate and generalizable reduced order models remains a challenging problem. This manuscript develops a hierarchy of reduced Lagrangian models for turbulent flows in order to investigate and compare the effects of enforcing Smoothed Particle Hydrodynamics (SPH) structure versus embedding neural networks (NN)s within the Lagrangian framework as universal function approximators. SPH is a mesh-free Lagrangian methodology for approximating equations of fluid mechanics. Starting from Neural Network (NN) based parameterization of a Lagrangian acceleration operator, this hierarchy gradually incorporates a weakly compressible and parameterized SPH framework which enforces physical symmetries and conservation laws. Two new parameterized smoothing kernels are developed which are included within the fully parameterized SPH simulator and are compared to the cubic and quartic smoothing kernels. For each model we experiment with different loss functions which are minimized using gradient based optimization, where efficient computations of gradients are obtained by using Automatic Differentiation (AD) and Sensitivity Analysis (SA). Each model is trained on two Ground Truth (GT) data sets associated with weekly compressible Homogeneous Isotropic Turbulence (HIT), (1) a validation set using weakly compressible SPH, and (2) a high fidelity set from Direct Numerical Simulations (DNS). Numerical evidence shows: (a) validation of the methodology on "synthetic" SPH data; (b) the ability for NNs embedded within the SPH framework to approximate the equation of state; (b) each model is able to interpolate onto DNS data; (c) encoding more SPH structure improves generalizability to different turbulent Mach numbers and time scales; (d) the introduction of two novel parameterized smoothing kernels improves the accuracy of SPH over standard smoothing kernels.
    Correctness Verification of Neural Networks. (arXiv:1906.01030v3 [cs.LG] UPDATED)
    We present a novel framework for specifying and verifying correctness globally for neural networks on perception tasks. Most previous works on neural network verification for perception tasks focus on robustness verification. Unlike robustness verification, which aims to verify that the prediction of a network is stable in some local regions around labelled points, our framework provides a way to specify correctness globally in the whole target input space and verify that the network is correct for all target inputs (or find the regions where the network is not correct). We provide a specification through 1) a state space consisting of all relevant states of the world and 2) an observation process that produces neural network inputs from the states of the world. Tiling the state and input spaces with a finite number of tiles, obtaining ground truth bounds from the state tiles and network output bounds from the input tiles, then comparing the ground truth and network output bounds delivers an upper bound on the network output error for any inputs of interest. The presented framework also enables detecting illegal inputs -- inputs that are not contained in (or close to) the target input space as defined by the state space and observation process (the neural network is not designed to work on them), so that we can flag when we don't have guarantees. Results from two case studies highlight the ability of our technique to verify error bounds over the whole target input space and show how the error bounds vary over the state and input spaces.
    Exact Penalty Method for Federated Learning. (arXiv:2208.11231v1 [cs.LG])
    Federated learning has burgeoned recently in machine learning, giving rise to a variety of research topics. Popular optimization algorithms are based on the frameworks of the (stochastic) gradient descent methods or the alternating direction method of multipliers. In this paper, we deploy an exact penalty method to deal with federated learning and propose an algorithm, FedEPM, that enables to tackle four critical issues in federated learning: communication efficiency, computational complexity, stragglers' effect, and data privacy. Moreover, it is proven to be convergent and testified to have high numerical performance.
    ImitAL: Learned Active Learning Strategy on Synthetic Data. (arXiv:2208.11636v1 [cs.LG])
    Active Learning (AL) is a well-known standard method for efficiently obtaining annotated data by first labeling the samples that contain the most information based on a query strategy. In the past, a large variety of such query strategies has been proposed, with each generation of new strategies increasing the runtime and adding more complexity. However, to the best of our our knowledge, none of these strategies excels consistently over a large number of datasets from different application domains. Basically, most of the the existing AL strategies are a combination of the two simple heuristics informativeness and representativeness, and the big differences lie in the combination of the often conflicting heuristics. Within this paper, we propose ImitAL, a domain-independent novel query strategy, which encodes AL as a learning-to-rank problem and learns an optimal combination between both heuristics. We train ImitAL on large-scale simulated AL runs on purely synthetic datasets. To show that ImitAL was successfully trained, we perform an extensive evaluation comparing our strategy on 13 different datasets, from a wide range of domains, with 7 other query strategies.
    Large-scale Entity Alignment via Knowledge Graph Merging, Partitioning and Embedding. (arXiv:2208.11125v1 [cs.LG])
    Entity alignment is a crucial task in knowledge graph fusion. However, most entity alignment approaches have the scalability problem. Recent methods address this issue by dividing large KGs into small blocks for embedding and alignment learning in each. However, such a partitioning and learning process results in an excessive loss of structure and alignment. Therefore, in this work, we propose a scalable GNN-based entity alignment approach to reduce the structure and alignment loss from three perspectives. First, we propose a centrality-based subgraph generation algorithm to recall some landmark entities serving as the bridges between different subgraphs. Second, we introduce self-supervised entity reconstruction to recover entity representations from incomplete neighborhood subgraphs, and design cross-subgraph negative sampling to incorporate entities from other subgraphs in alignment learning. Third, during the inference process, we merge the embeddings of subgraphs to make a single space for alignment search. Experimental results on the benchmark OpenEA dataset and the proposed large DBpedia1M dataset verify the effectiveness of our approach.
    Debias the Black-box: A Fair Ranking Framework via Knowledge Distillation. (arXiv:2208.11628v1 [cs.IR])
    Deep neural networks can capture the intricate interaction history information between queries and documents, because of their many complicated nonlinear units, allowing them to provide correct search recommendations. However, service providers frequently face more complex obstacles in real-world circumstances, such as deployment cost constraints and fairness requirements. Knowledge distillation, which transfers the knowledge of a well-trained complex model (teacher) to a simple model (student), has been proposed to alleviate the former concern, but the best current distillation methods focus only on how to make the student model imitate the predictions of the teacher model. To better facilitate the application of deep models, we propose a fair information retrieval framework based on knowledge distillation. This framework can improve the exposure-based fairness of models while considerably decreasing model size. Our extensive experiments on three huge datasets show that our proposed framework can reduce the model size to a minimum of 1% of its original size while maintaining its black-box state. It also improves fairness performance by 15%~46% while keeping a high level of recommendation effectiveness.
    A novel approach for Fair Principal Component Analysis based on eigendecomposition. (arXiv:2208.11362v1 [cs.LG])
    Principal component analysis (PCA), a ubiquitous dimensionality reduction technique in signal processing, searches for a projection matrix that minimizes the mean squared error between the reduced dataset and the original one. Since classical PCA is not tailored to address concerns related to fairness, its application to actual problems may lead to disparity in the reconstruction errors of different groups (e.g., men and women, whites and blacks, etc.), with potentially harmful consequences such as the introduction of bias towards sensitive groups. Although several fair versions of PCA have been proposed recently, there still remains a fundamental gap in the search for algorithms that are simple enough to be deployed in real systems. To address this, we propose a novel PCA algorithm which tackles fairness issues by means of a simple strategy comprising a one-dimensional search which exploits the closed-form solution of PCA. As attested by numerical experiments, the proposal can significantly improve fairness with a very small loss in the overall reconstruction error and without resorting to complex optimization schemes. Moreover, our findings are consistent in several real situations as well as in scenarios with both unbalanced and balanced datasets.
    Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning. (arXiv:2208.11580v1 [cs.LG])
    We consider the problem of model compression for deep neural networks (DNNs) in the challenging post-training setting, in which we are given an accurate trained model, and must compress it without any retraining, based only on a small amount of calibration input data. This problem has become popular in view of the emerging software and hardware support for executing models compressed via pruning and/or quantization with speedup, and well-performing solutions have been proposed independently for both compression approaches. In this paper, we introduce a new compression framework which covers both weight pruning and quantization in a unified setting, is time- and space-efficient, and considerably improves upon the practical performance of existing post-training methods. At the technical level, our approach is based on the first exact and efficient realization of the classical Optimal Brain Surgeon (OBS) framework of [LeCun, Denker, and Solla, 1990] at the scale of modern DNNs, which we further extend to cover weight quantization. This is enabled by a series of algorithmic developments which may be of independent interest. From the practical perspective, our experimental results show that it can improve significantly upon the compression-accuracy trade-offs of existing post-training methods, and that it can even enable the accurate joint application of both pruning and quantization in a post-training setting.
    Inter- and Intra-Series Embeddings Fusion Network for Epidemiological Forecasting. (arXiv:2208.11515v1 [cs.LG])
    The accurate forecasting of infectious epidemic diseases is the key to effective control of the epidemic situation in a region. Most existing methods ignore potential dynamic dependencies between regions or the importance of temporal dependencies and inter-dependencies between regions for prediction. In this paper, we propose an Inter- and Intra-Series Embeddings Fusion Network (SEFNet) to improve epidemic prediction performance. SEFNet consists of two parallel modules, named Inter-Series Embedding Module and Intra-Series Embedding Module. In Inter-Series Embedding Module, a multi-scale unified convolution component called Region-Aware Convolution is proposed, which cooperates with self-attention to capture dynamic dependencies between time series obtained from multiple regions. The Intra-Series Embedding Module uses Long Short-Term Memory to capture temporal relationships within each time series. Subsequently, we learn the influence degree of two embeddings and fuse them with the parametric-matrix fusion method. To further improve the robustness, SEFNet also integrates a traditional autoregressive component in parallel with nonlinear neural networks. Experiments on four real-world epidemic-related datasets show SEFNet is effective and outperforms state-of-the-art baselines.
    PromptFL: Let Federated Participants Cooperatively Learn Prompts Instead of Models -- Federated Learning in Age of Foundation Model. (arXiv:2208.11625v1 [cs.LG])
    Quick global aggregation of effective distributed parameters is crucial to federated learning (FL), which requires adequate bandwidth for parameters communication and sufficient user data for local training. Otherwise, FL may cost excessive training time for convergence and produce inaccurate models. In this paper, we propose a brand-new FL framework, PromptFL, that replaces the federated model training with the federated prompt training, i.e., let federated participants train prompts instead of a shared model, to simultaneously achieve the efficient global aggregation and local training on insufficient data by exploiting the power of foundation models (FM) in a distributed way. PromptFL ships an off-the-shelf FM, i.e., CLIP, to distributed clients who would cooperatively train shared soft prompts based on very few local data. Since PromptFL only needs to update the prompts instead of the whole model, both the local training and the global aggregation can be significantly accelerated. And FM trained over large scale data can provide strong adaptation capability to distributed users tasks with the trained soft prompts. We empirically analyze the PromptFL via extensive experiments, and show its superiority in terms of system feasibility, user privacy, and performance.
    Weakly Supervised Airway Orifice Segmentation in Video Bronchoscopy. (arXiv:2208.11468v1 [cs.CV])
    Video bronchoscopy is routinely conducted for biopsies of lung tissue suspected for cancer, monitoring of COPD patients and clarification of acute respiratory problems at intensive care units. The navigation within complex bronchial trees is particularly challenging and physically demanding, requiring long-term experiences of physicians. This paper addresses the automatic segmentation of bronchial orifices in bronchoscopy videos. Deep learning-based approaches to this task are currently hampered due to the lack of readily-available ground truth segmentation data. Thus, we present a data-driven pipeline consisting of a k-means followed by a compact marker-based watershed algorithm which enables to generate airway instance segmentation maps from given depth images. In this way, these traditional algorithms serve as weak supervision for training a shallow CNN directly on RGB images solely based on a phantom dataset. We evaluate generalization capabilities of this model on two in-vivo datasets covering 250 frames on 21 different bronchoscopies. We demonstrate that its performance is comparable to those models being directly trained on in-vivo data, reaching an average error of 11 vs 5 pixels for the detected centers of the airway segmentation by an image resolution of 128x128. Our quantitative and qualitative results indicate that in the context of video bronchoscopy, phantom data and weak supervision using non-learning-based approaches enable to gain a semantic understanding of airway structures.
    Calibrated and Enhanced NRLMSIS 2.0 Model with Uncertainty Quantification. (arXiv:2208.11619v1 [physics.space-ph])
    The Mass Spectrometer and Incoherent Scatter radar (MSIS) model family has been developed and improved since the early 1970's. The most recent version of MSIS is the Naval Research Laboratory (NRL) MSIS 2.0 empirical atmospheric model. NRLMSIS 2.0 provides species density, mass density, and temperature estimates as function of location and space weather conditions. MSIS models have long been a popular choice of atmosphere model in the research and operations community alike, but - like many models - does not provide uncertainty estimates. In this work, we develop an exospheric temperature model based in machine learning (ML) that can be used with NRLMSIS 2.0 to calibrate it relative to high-fidelity satellite density estimates. Instead of providing point estimates, our model (called MSIS-UQ) outputs a distribution which is assessed using a metric called the calibration error score. We show that MSIS-UQ debiases NRLMSIS 2.0 resulting in reduced differences between model and satellite density of 25% and is 11% closer to satellite density than the Space Force's High Accuracy Satellite Drag Model. We also show the model's uncertainty estimation capabilities by generating altitude profiles for species density, mass density, and temperature. This explicitly demonstrates how exospheric temperature probabilities affect density and temperature profiles within NRLMSIS 2.0. Another study displays improved post-storm overcooling capabilities relative to NRLMSIS 2.0 alone, enhancing the phenomena that it can capture.
    A methodology for identifying resiliency in renewable electrical distribution system using complex network. (arXiv:2208.11543v1 [eess.SY])
    Recently, Electrical Distribution Systems are extensively penetrated with the Distributed Energy Resources (DERs) to cater the energy demands with general perception that it enhances the system resiliency. However, it may be adverse for the grid operation due to various factors like its intermittent availability, dynamics in weather condition, introduction of nonlinearity, complexity etc. This needs a detailed understanding of system resiliency that our method proposes here. We introduce a methodology using complex network theory to identify the resiliency of distribution system when incorporated with Solar PV generation under various undesirable configurations. Complex correlated networks for different conditions were obtained and various network parameters were computed for identifying the resiliency of those networks. The proposed methodology identifies the hosting capacity of solar panels in the system while maintaining the resiliency under different unwanted conditions hence helps to obtain an optimal allocation topology for solar panels in the system. The proposed method also identifies the critical nodes that are highly sensitive to the changes and could drive the system into non-resiliency. This framework was demonstrated on IEEE-123 Test Feeder system with time-series data generated using GridLAB-D and variety of analysis were performed using complex network and machine learning models.
    Automatic music mixing with deep learning and out-of-domain data. (arXiv:2208.11428v1 [eess.AS])
    Music mixing traditionally involves recording instruments in the form of clean, individual tracks and blending them into a final mixture using audio effects and expert knowledge (e.g., a mixing engineer). The automation of music production tasks has become an emerging field in recent years, where rule-based methods and machine learning approaches have been explored. Nevertheless, the lack of dry or clean instrument recordings limits the performance of such models, which is still far from professional human-made mixes. We explore whether we can use out-of-domain data such as wet or processed multitrack music recordings and repurpose it to train supervised deep learning models that can bridge the current gap in automatic mixing quality. To achieve this we propose a novel data preprocessing method that allows the models to perform automatic music mixing. We also redesigned a listening test method for evaluating music mixing systems. We validate our results through such subjective tests using highly experienced mixing engineers as participants.
    PAC-learning gains of Turing machines over circuits and neural networks. (arXiv:2103.12686v2 [cs.LG] UPDATED)
    A caveat to many applications of the current Deep Learning approach is the need for large-scale data. One improvement suggested by Kolmogorov Complexity results is to apply the minimum description length principle with computationally universal models. We study the potential gains in sample efficiency that this approach can bring in principle. We use polynomial-time Turing machines to represent computationally universal models and Boolean circuits to represent Artificial Neural Networks (ANNs) acting on finite-precision digits. Our analysis unravels direct links between our question and Computational Complexity results. We provide lower and upper bounds on the potential gains in sample efficiency between the MDL applied with Turing machines instead of ANNs. Our bounds depend on the bit-size of the input of the Boolean function to be learned. Furthermore, we highlight close relationships between classical open problems in Circuit Complexity and the tightness of these.
    Deep Symbolic Learning: Discovering Symbols and Rules from Perceptions. (arXiv:2208.11561v1 [cs.LG])
    Neuro-Symbolic (NeSy) integration combines symbolic reasoning with Neural Networks (NNs) for tasks requiring perception and reasoning. Most NeSy systems rely on continuous relaxation of logical knowledge and no discrete decisions are made within the model pipeline. Furthermore, these methods assume that the symbolic rules are given. In this paper, we propose Deep Symbolic Learning (DSL), a NeSy system that learns NeSy-functions, i.e., the composition of a (set of) perception functions which map continuous data to discrete symbols, and a symbolic function over the set of symbols. DSL learns simultaneously the perception and symbolic functions, while being trained only on their composition (NeSy-function). The key novelty of DSL is that it can create internal (interpretable) symbolic representations and map them to perception inputs within a differentiable NN learning pipeline. The created symbols are automatically selected to generate symbolic functions that best explain the data. We provide experimental analysis to substantiate the efficacy of DSL in simultaneously learning perception and symbolic functions.
    FashionVQA: A Domain-Specific Visual Question Answering System. (arXiv:2208.11253v1 [cs.CV])
    Humans apprehend the world through various sensory modalities, yet language is their predominant communication channel. Machine learning systems need to draw on the same multimodal richness to have informed discourses with humans in natural language; this is particularly true for systems specialized in visually-dense information, such as dialogue, recommendation, and search engines for clothing. To this end, we train a visual question answering (VQA) system to answer complex natural language questions about apparel in fashion photoshoot images. The key to the successful training of our VQA model is the automatic creation of a visual question-answering dataset with 168 million samples from item attributes of 207 thousand images using diverse templates. The sample generation employs a strategy that considers the difficulty of the question-answer pairs to emphasize challenging concepts. Contrary to the recent trends in using several datasets for pretraining the visual question answering models, we focused on keeping the dataset fixed while training various models from scratch to isolate the improvements from model architecture changes. We see that using the same transformer for encoding the question and decoding the answer, as in language models, achieves maximum accuracy, showing that visual language models (VLMs) make the best visual question answering systems for our dataset. The accuracy of the best model surpasses the human expert level, even when answering human-generated questions that are not confined to the template formats. Our approach for generating a large-scale multimodal domain-specific dataset provides a path for training specialized models capable of communicating in natural language. The training of such domain-expert models, e.g., our fashion VLM model, cannot rely solely on the large-scale general-purpose datasets collected from the web.
    Auditing Membership Leakages of Multi-Exit Networks. (arXiv:2208.11180v1 [cs.CR])
    Relying on the fact that not all inputs require the same amount of computation to yield a confident prediction, multi-exit networks are gaining attention as a prominent approach for pushing the limits of efficient deployment. Multi-exit networks endow a backbone model with early exits, allowing to obtain predictions at intermediate layers of the model and thus save computation time and/or energy. However, current various designs of multi-exit networks are only considered to achieve the best trade-off between resource usage efficiency and prediction accuracy, the privacy risks stemming from them have never been explored. This prompts the need for a comprehensive investigation of privacy risks in multi-exit networks. In this paper, we perform the first privacy analysis of multi-exit networks through the lens of membership leakages. In particular, we first leverage the existing attack methodologies to quantify the multi-exit networks' vulnerability to membership leakages. Our experimental results show that multi-exit networks are less vulnerable to membership leakages and the exit (number and depth) attached to the backbone model is highly correlated with the attack performance. Furthermore, we propose a hybrid attack that exploits the exit information to improve the performance of existing attacks. We evaluate membership leakage threat caused by our hybrid attack under three different adversarial setups, ultimately arriving at a model-free and data-free adversary. These results clearly demonstrate that our hybrid attacks are very broadly applicable, thereby the corresponding risks are much more severe than shown by existing membership inference attacks. We further present a defense mechanism called TimeGuard specifically for multi-exit networks and show that TimeGuard mitigates the newly proposed attacks perfectly.
    Contrastive Learning Rivals Masked Image Modeling in Fine-tuning via Feature Distillation. (arXiv:2205.14141v3 [cs.CV] UPDATED)
    Masked image modeling (MIM) learns representations with remarkably good fine-tuning performances, overshadowing previous prevalent pre-training approaches such as image classification, instance contrastive learning, and image-text alignment. In this paper, we show that the inferior fine-tuning performance of these pre-training approaches can be significantly improved by a simple post-processing in the form of feature distillation (FD). The feature distillation converts the old representations to new representations that have a few desirable properties just like those representations produced by MIM. These properties, which we aggregately refer to as optimization friendliness, are identified and analyzed by a set of attention- and optimization-related diagnosis tools. With these properties, the new representations show strong fine-tuning performance. Specifically, the contrastive self-supervised learning methods are made as competitive in fine-tuning as the state-of-the-art masked image modeling (MIM) algorithms. The CLIP models' fine-tuning performance is also significantly improved, with a CLIP ViT-L model reaching 89.0% top-1 accuracy on ImageNet-1K classification. On the 3-billion-parameter SwinV2-G model, the fine-tuning accuracy is improved by +1.5 mIoU / +1.1 mAP to 61.4 mIoU / 64.2 mAP on ADE20K semantic segmentation and COCO object detection, respectively, creating new records on both benchmarks. More importantly, our work provides a way for the future research to focus more effort on the generality and scalability of the learnt representations without being pre-occupied with optimization friendliness since it can be enhanced rather easily. The code will be available at https://github.com/SwinTransformer/Feature-Distillation.
    LPF-Defense: 3D Adversarial Defense based on Frequency Analysis. (arXiv:2202.11287v2 [cs.CV] UPDATED)
    Although 3D point cloud classification has recently been widely deployed in different application scenarios, it is still very vulnerable to adversarial attacks. This increases the importance of robust training of 3D models in the face of adversarial attacks. Based on our analysis on the performance of existing adversarial attacks, more adversarial perturbations are found in the mid and high-frequency components of input data. Therefore, by suppressing the high-frequency content in the training phase, the models robustness against adversarial examples is improved. Experiments showed that the proposed defense method decreases the success rate of six attacks on PointNet, PointNet++ ,, and DGCNN models. In particular, improvements are achieved with an average increase of classification accuracy by 3.8 % on drop100 attack and 4.26 % on drop200 attack compared to the state-of-the-art methods. The method also improves models accuracy on the original dataset compared to other available methods.
    A Novel Deep Parallel Time-series Relation Network for Fault Diagnosis. (arXiv:2112.03405v3 [cs.LG] UPDATED)
    Considering the models that apply the contextual information of time-series data could improve the fault diagnosis performance, some neural network structures such as RNN, LSTM, and GRU were proposed to model the fault diagnosis effectively. However, these models are restricted by their serial computation and hence cannot achieve high diagnostic efficiency. Also the parallel CNN is difficult to implement fault diagnosis in an efficient way because it requires larger convolution kernels or deep structure to achieve long-term feature extraction capabilities. Besides, BERT model applies absolute position embedding to introduce contextual information to the model, which would bring noise to the raw data and therefore cannot be applied to fault diagnosis directly. In order to address the above problems, a fault diagnosis model named deep parallel time-series relation network(DPTRN) has been proposed in this paper. There are mainly three advantages for DPTRN: (1) Our proposed time relationship unit is based on full multilayer perceptron(MLP) structure, therefore, DPTRN performs fault diagnosis in a parallel way and improves computing efficiency significantly. (2) By improving the absolute position embedding, our novel decoupling position embedding unit could be applied on the fault diagnosis directly and learn contextual information. (3) Our proposed DPTRN has obvious advantage in feature interpretability. We confirm the effect of the proposed method on four datasets, and the results show the effectiveness, efficiency and interpretability of the proposed DPTRN model.
    Tracking by weakly-supervised learning and graph optimization for whole-embryo C. elegans lineages. (arXiv:2208.11467v1 [cs.CV])
    Tracking all nuclei of an embryo in noisy and dense fluorescence microscopy data is a challenging task. We build upon a recent method for nuclei tracking that combines weakly-supervised learning from a small set of nuclei center point annotations with an integer linear program (ILP) for optimal cell lineage extraction. Our work specifically addresses the following challenging properties of C. elegans embryo recordings: (1) Many cell divisions as compared to benchmark recordings of other organisms, and (2) the presence of polar bodies that are easily mistaken as cell nuclei. To cope with (1), we devise and incorporate a learnt cell division detector. To cope with (2), we employ a learnt polar body detector. We further propose automated ILP weights tuning via a structured SVM, alleviating the need for tedious manual set-up of a respective grid search. Our method outperforms the previous leader of the cell tracking challenge on the Fluo-N3DH-CE embryo dataset. We report a further extensive quantitative evaluation on two more C. elegans datasets. We will make these datasets public to serve as an extended benchmark for future method development. Our results suggest considerable improvements yielded by our method, especially in terms of the correctness of division event detection and the number and length of fully correct track segments. Code: https://github.com/funkelab/linajea  ( 3 min )
    An End-to-End OCR Framework for Robust Arabic-Handwriting Recognition using a Novel Transformers-based Model and an Innovative 270 Million-Words Multi-Font Corpus of Classical Arabic with Diacritics. (arXiv:2208.11484v1 [cs.CV])
    This research is the second phase in a series of investigations on developing an Optical Character Recognition (OCR) of Arabic historical documents and examining how different modeling procedures interact with the problem. The first research studied the effect of Transformers on our custom-built Arabic dataset. One of the downsides of the first research was the size of the training data, a mere 15000 images from our 30 million images, due to lack of resources. Also, we add an image enhancement layer, time and space optimization, and Post-Correction layer to aid the model in predicting the correct word for the correct context. Notably, we propose an end-to-end text recognition approach using Vision Transformers as an encoder, namely BEIT, and vanilla Transformer as a decoder, eliminating CNNs for feature extraction and reducing the model's complexity. The experiments show that our end-to-end model outperforms Convolutions Backbones. The model attained a CER of 4.46%.  ( 2 min )
    SCALE: Online Self-Supervised Lifelong Learning without Prior Knowledge. (arXiv:2208.11266v1 [cs.LG])
    Unsupervised lifelong learning refers to the ability to learn over time while memorizing previous patterns without supervision. Previous works assumed strong prior knowledge about the incoming data (e.g., knowing the class boundaries) which can be impossible to obtain in complex and unpredictable environments. In this paper, motivated by real-world scenarios, we formally define the online unsupervised lifelong learning problem with class-incremental streaming data, which is non-iid and single-pass. The problem is more challenging than existing lifelong learning problems due to the absence of labels and prior knowledge. To address the issue, we propose Self-Supervised ContrAstive Lifelong LEarning (SCALE) which extracts and memorizes knowledge on-the-fly. SCALE is designed around three major components: a pseudo-supervised contrastive loss, a self-supervised forgetting loss, and an online memory update for uniform subset selection. All three components are designed to work collaboratively to maximize learning performance. Our loss functions leverage pairwise similarity thus remove the dependency on supervision or prior knowledge. We perform comprehensive experiments of SCALE under iid and four non-iid data streams. SCALE outperforms the best state-of-the-art algorithm on all settings with improvements of up to 6.43%, 5.23% and 5.86% kNN accuracy on CIFAR-10, CIFAR-100 and SubImageNet datasets.  ( 2 min )
    Transformer-Boosted Anomaly Detection with Fuzzy Hashes. (arXiv:2208.11367v1 [cs.CR])
    Fuzzy hashes are an important tool in digital forensics and are used in approximate matching to determine the similarity between digital artifacts. They translate the byte code of files into computable strings, which makes them particularly interesting for intelligent machine processing. In this work, we propose deep learning approximate matching (DLAM), which achieves much higher accuracy in detecting anomalies in fuzzy hashes than conventional approaches. In addition to the well-known application for clustering malware, we show that fuzzy hashes and deep learning are indeed well-suited to classify files according to the presence of certain content, e.g., malware. DLAM relies on transformer-based models from the field of natural language processing and outperforms existing methods. Traditional fuzzy hashes like TLSH and ssdeep have a limited size and fail to detect file anomalies if they are relatively small compared to the overall file size. DLAM, however, enables the detection of such file correlations in the computed fuzzy hashes of TLSH and ssdeep, even for anomaly sizes of less than 15%. It achieves comparable results to state-of-the-art fuzzy hashing algorithms while relying on more efficient hash computations and can, therefore, be used at a much larger scale.  ( 2 min )
    Fast emulation of density functional theory simulations using approximate Gaussian processes. (arXiv:2208.11302v1 [stat.ML])
    Fitting a theoretical model to experimental data in a Bayesian manner using Markov chain Monte Carlo typically requires one to evaluate the model thousands (or millions) of times. When the model is a slow-to-compute physics simulation, Bayesian model fitting becomes infeasible. To remedy this, a second statistical model that predicts the simulation output -- an "emulator" -- can be used in lieu of the full simulation during model fitting. A typical emulator of choice is the Gaussian process (GP), a flexible, non-linear model that provides both a predictive mean and variance at each input point. Gaussian process regression works well for small amounts of training data ($n 10^5$), trading away predictive accuracy for drastically reduced runtime. This work examines the accuracy-runtime trade-off of several approximate Gaussian process models -- the sparse variational GP, stochastic variational GP, and deep kernel learned GP -- when emulating the predictions of density functional theory (DFT) models. Additionally, we use the emulators to calibrate, in a Bayesian manner, the DFT model parameters using observed data, resolving the computational barrier imposed by the data set size, and compare calibration results to previous work. The utility of these calibrated DFT models is to make predictions, based on observed data, about the properties of experimentally unobserved nuclides of interest e.g. super-heavy nuclei.  ( 3 min )
    Federated Learning via Decentralized Dataset Distillation in Resource-Constrained Edge Environments. (arXiv:2208.11311v1 [cs.LG])
    We introduce a novel federated learning framework, FedD3, which reduces the overall communication volume and with that opens up the concept of federated learning to more application scenarios in network-constrained environments. It achieves this by leveraging local dataset distillation instead of traditional learning approaches (i) to significantly reduce communication volumes and (ii) to limit transfers to one-shot communication, rather than iterative multiway communication. Instead of sharing model updates, as in other federated learning approaches, FedD3 allows the connected clients to distill the local datasets independently, and then aggregates those decentralized distilled datasets (typically in the form a few unrecognizable images, which are normally smaller than a model) across the network only once to form the final model. Our experimental results show that FedD3 significantly outperforms other federated learning frameworks in terms of needed communication volumes, while it provides the additional benefit to be able to balance the trade-off between accuracy and communication cost, depending on usage scenario or target dataset. For instance, for training an AlexNet model on a Non-IID CIFAR-10 dataset with 10 clients, FedD3 can either increase the accuracy by over 71% with a similar communication volume, or save 98% of communication volume, while reaching the same accuracy, comparing to other one-shot federated learning approaches.  ( 3 min )
    UniCon: Unidirectional Split Learning with Contrastive Loss for Visual Question Answering. (arXiv:2208.11435v1 [cs.CV])
    Visual question answering (VQA) that leverages multi-modality data has attracted intensive interest in real-life applications, such as home robots and clinic diagnoses. Nevertheless, one of the challenges is to design robust learning for different client tasks. This work aims to bridge the gap between the prerequisite of large-scale training data and the constraint of client data sharing mainly due to confidentiality. We propose the Unidirectional Split Learning with Contrastive Loss (UniCon) to tackle VQA tasks training on distributed data silos. In particular, UniCon trains a global model over the entire data distribution of different clients learning refined cross-modal representations via contrastive learning. The learned representations of the global model aggregate knowledge from different local tasks. Moreover, we devise a unidirectional split learning framework to enable more efficient knowledge sharing. The comprehensive experiments with five state-of-the-art VQA models on the VQA-v2 dataset demonstrated the efficacy of UniCon, achieving an accuracy of 49.89% in the validation set of VQA-v2. This work is the first study of VQA under the constraint of data confidentiality using self-supervised Split Learning.  ( 2 min )
    Sparse Polynomial Optimization: Theory and Practice. (arXiv:2208.11158v1 [math.OC])
    The problem of minimizing a polynomial over a set of polynomial inequalities is an NP-hard non-convex problem. Thanks to powerful results from real algebraic geometry, one can convert this problem into a nested sequence of finite-dimensional convex problems. At each step of the associated hierarchy, one needs to solve a fixed size semidefinite program, which can be in turn solved with efficient numerical tools. On the practical side however, there is \emph{no-free lunch} and such optimization methods usually encompass severe scalability issues. Fortunately, for many applications, we can \emph{look at the problem in the eyes} and exploit the inherent data structure arising from the cost and constraints describing the problem, for instance sparsity or symmetries. This book presents several research efforts to tackle this scientific challenge with important computational implications, and provides the development of alternative optimization schemes that scale well in terms of computational complexity, at least in some identified class of problems. The presented algorithmic framework in this book mainly exploits the sparsity structure of the input data to solve large-scale polynomial optimization problems. We present sparsity-exploiting hierarchies of relaxations, for either unconstrained or constrained problems. By contrast with the dense hierarchies, they provide faster approximation of the solution in practice but also come with the same theoretical convergence guarantees. Our framework is not restricted to \emph{static} polynomial optimization, and we expose hierarchies of approximations for values of interest arising from the analysis of dynamical systems. We also present various extensions to problems involving noncommuting variables, e.g., matrices of arbitrary size or quantum physic operators.  ( 3 min )
    Robot Motion Planning as Video Prediction: A Spatio-Temporal Neural Network-based Motion Planner. (arXiv:2208.11287v1 [cs.RO])
    Neural network (NN)-based methods have emerged as an attractive approach for robot motion planning due to strong learning capabilities of NN models and their inherently high parallelism. Despite the current development in this direction, the efficient capture and processing of important sequential and spatial information, in a direct and simultaneous way, is still relatively under-explored. To overcome the challenge and unlock the potentials of neural networks for motion planning tasks, in this paper, we propose STP-Net, an end-to-end learning framework that can fully extract and leverage important spatio-temporal information to form an efficient neural motion planner. By interpreting the movement of the robot as a video clip, robot motion planning is transformed to a video prediction task that can be performed by STP-Net in both spatially and temporally efficient ways. Empirical evaluations across different seen and unseen environments show that, with nearly 100% accuracy (aka, success rate), STP-Net demonstrates very promising performance with respect to both planning speed and path cost. Compared with existing NN-based motion planners, STP-Net achieves at least 5x, 2.6x and 1.8x faster speed with lower path cost on 2D Random Forest, 2D Maze and 3D Random Forest environments, respectively. Furthermore, STP-Net can quickly and simultaneously compute multiple near-optimal paths in multi-robot motion planning tasks  ( 3 min )
    Transfer Learning-based State of Health Estimation for Lithium-ion Battery with Cycle Synchronization. (arXiv:2208.11204v1 [cs.LG])
    Accurately estimating a battery's state of health (SOH) helps prevent battery-powered applications from failing unexpectedly. With the superiority of reducing the data requirement of model training for new batteries, transfer learning (TL) emerges as a promising machine learning approach that applies knowledge learned from a source battery, which has a large amount of data. However, the determination of whether the source battery model is reasonable and which part of information can be transferred for SOH estimation are rarely discussed, despite these being critical components of a successful TL. To address these challenges, this paper proposes an interpretable TL-based SOH estimation method by exploiting the temporal dynamic to assist transfer learning, which consists of three parts. First, with the help of dynamic time warping, the temporal data from the discharge time series are synchronized, yielding the warping path of the cycle-synchronized time series responsible for capacity degradation over cycles. Second, the canonical variates retrieved from the spatial path of the cycle-synchronized time series are used for distribution similarity analysis between the source and target batteries. Third, when the distribution similarity is within the predefined threshold, a comprehensive target SOH estimation model is constructed by transferring the common temporal dynamics from the source SOH estimation model and compensating the errors with a residual model from the target battery. Through a widely-used open-source benchmark dataset, the estimation error of the proposed method evaluated by the root mean squared error is as low as 0.0034 resulting in a 77% accuracy improvement compared with existing methods.  ( 3 min )
    Retrieval-based Controllable Molecule Generation. (arXiv:2208.11126v1 [q-bio.QM])
    Generating new molecules with specified chemical and biological properties via generative models has emerged as a promising direction for drug discovery. However, existing methods require extensive training/fine-tuning with a large dataset, often unavailable in real-world generation tasks. In this work, we propose a new retrieval-based framework for controllable molecule generation. We use a small set of exemplar molecules, i.e., those that (partially) satisfy the design criteria, to steer the pre-trained generative model towards synthesizing molecules that satisfy the given design criteria. We design a retrieval mechanism that retrieves and fuses the exemplar molecules with the input molecule, which is trained by a new self-supervised objective that predicts the nearest neighbor of the input molecule. We also propose an iterative refinement process to dynamically update the generated molecules and retrieval database for better generalization. Our approach is agnostic to the choice of generative models and requires no task-specific fine-tuning. On various tasks ranging from simple design criteria to a challenging real-world scenario for designing lead compounds that bind to the SARS-CoV-2 main protease, we demonstrate our approach extrapolates well beyond the retrieval database, and achieves better performance and wider applicability than previous methods.  ( 3 min )
    Why Deep Learning's Performance Data Are Misleading. (arXiv:2208.11228v1 [cs.LG])
    This is a theoretical paper, as a companion paper of the keynote talk at the same conference. In contrast to conscious learning, many projects in AI have employed deep learning many of which seem to give impressive performance data. This paper explains that such performance data are probably misleadingly inflated due to two possible misconducts: data deletion and test on training set. This paper clarifies what is data deletion in deep learning and what is test on training set in deep learning and why they are misconducts. A simple classification method is defined, called nearest neighbor with threshold (NNWT). A theorem is established that the NNWT method reaches a zero error on any validation set and any test set using Post-Selections, as long as the test set is in the possession of the author and both the amount of storage space and the time of training are finite but unbounded like with many deep learning methods. However, like many deep learning methods, the NNWT method has little generalization power. The evidence that misconducts actually took place in many deep learning projects is beyond the scope of this paper. Without a transparent account about freedom from Post-Selections, deep learning data are misleading.  ( 2 min )
    Improving Natural-Language-based Audio Retrieval with Transfer Learning and Audio & Text Augmentations. (arXiv:2208.11460v1 [cs.SD])
    The absence of large labeled datasets remains a significant challenge in many application areas of deep learning. Researchers and practitioners typically resort to transfer learning and data augmentation to alleviate this issue. We study these strategies in the context of audio retrieval with natural language queries (Task 6b of the DCASE 2022 Challenge). Our proposed system uses pre-trained embedding models to project recordings and textual descriptions into a shared audio-caption space in which related examples from different modalities are close. We employ various data augmentation techniques on audio and text inputs and systematically tune their corresponding hyperparameters with sequential model-based optimization. Our results show that the used augmentations strategies reduce overfitting and improve retrieval performance. We further show that pre-training the system on the AudioCaps dataset leads to additional improvements.  ( 2 min )
    Self-Supervised Exploration via Temporal Inconsistency in Reinforcement Learning. (arXiv:2208.11361v1 [cs.LG])
    In real-world scenarios, reinforcement learning under sparse-reward synergistic settings has remained challenging, despite surging interests in this field. Previous attempts suggest that intrinsic reward can alleviate the issue caused by sparsity. In this paper, we present a novel intrinsic reward that is inspired by human learning, as humans evaluate curiosity by comparing current observations with historical knowledge. Specifically, we train a self-supervised prediction model and save a set of snapshots of the model parameters, without incurring addition training cost. Then we employ nuclear norm to evaluate the temporal inconsistency between the predictions of different snapshots, which can be further deployed as the intrinsic reward. Moreover, a variational weighting mechanism is proposed to assign weight to different snapshots in an adaptive manner. We demonstrate the efficacy of the proposed method in various benchmark environments. The results suggest that our method can provide overwhelming state-of-the-art performance compared with other intrinsic reward-based methods, without incurring additional training costs and maintaining higher noise tolerance. Our code will be released publicly to enhance reproducibility.  ( 2 min )
    The premise of approximate MCMC in Bayesian deep learning. (arXiv:2208.11389v1 [stat.ML])
    This paper identifies several characteristics of approximate MCMC in Bayesian deep learning. It proposes an approximate sampling algorithm for neural networks. By analogy to sampling data batches from big datasets, it is proposed to sample parameter subgroups from neural network parameter spaces of high dimensions. While the advantages of minibatch MCMC have been discussed in the literature, blocked Gibbs sampling has received less research attention in Bayesian deep learning.  ( 2 min )
    Masked Image Modeling Advances 3D Medical Image Analysis. (arXiv:2204.11716v2 [cs.CV] UPDATED)
    Recently, masked image modeling (MIM) has gained considerable attention due to its capacity to learn from vast amounts of unlabeled data and has been demonstrated to be effective on a wide variety of vision tasks involving natural images. Meanwhile, the potential of self-supervised learning in modeling 3D medical images is anticipated to be immense due to the high quantities of unlabeled images, and the expense and difficulty of quality labels. However, MIM's applicability to medical images remains uncertain. In this paper, we demonstrate that masked image modeling approaches can also advance 3D medical images analysis in addition to natural images. We study how masked image modeling strategies leverage performance from the viewpoints of 3D medical image segmentation as a representative downstream task: i) when compared to naive contrastive learning, masked image modeling approaches accelerate the convergence of supervised training even faster (1.40$\times$) and ultimately produce a higher dice score; ii) predicting raw voxel values with a high masking ratio and a relatively smaller patch size is non-trivial self-supervised pretext-task for medical images modeling; iii) a lightweight decoder or projection head design for reconstruction is powerful for masked image modeling on 3D medical images which speeds up training and reduce cost; iv) finally, we also investigate the effectiveness of MIM methods under different practical scenarios where different image resolutions and labeled data ratios are applied.  ( 3 min )
    EpiGNN: Exploring Spatial Transmission with Graph Neural Network for Regional Epidemic Forecasting. (arXiv:2208.11517v1 [q-bio.QM])
    Epidemic forecasting is the key to effective control of epidemic transmission and helps the world mitigate the crisis that threatens public health. To better understand the transmission and evolution of epidemics, we propose EpiGNN, a graph neural network-based model for epidemic forecasting. Specifically, we design a transmission risk encoding module to characterize local and global spatial effects of regions in epidemic processes and incorporate them into the model. Meanwhile, we develop a Region-Aware Graph Learner (RAGL) that takes transmission risk, geographical dependencies, and temporal information into account to better explore spatial-temporal dependencies and makes regions aware of related regions' epidemic situations. The RAGL can also combine with external resources, such as human mobility, to further improve prediction performance. Comprehensive experiments on five real-world epidemic-related datasets (including influenza and COVID-19) demonstrate the effectiveness of our proposed method and show that EpiGNN outperforms state-of-the-art baselines by 9.48% in RMSE.  ( 3 min )
    Federated Self-Supervised Contrastive Learning and Masked Autoencoder for Dermatological Disease Diagnosis. (arXiv:2208.11278v1 [cs.LG])
    In dermatological disease diagnosis, the private data collected by mobile dermatology assistants exist on distributed mobile devices of patients. Federated learning (FL) can use decentralized data to train models while keeping data local. Existing FL methods assume all the data have labels. However, medical data often comes without full labels due to high labeling costs. Self-supervised learning (SSL) methods, contrastive learning (CL) and masked autoencoders (MAE), can leverage the unlabeled data to pre-train models, followed by fine-tuning with limited labels. However, combining SSL and FL has unique challenges. For example, CL requires diverse data but each device only has limited data. For MAE, while Vision Transformer (ViT) based MAE has higher accuracy over CNNs in centralized learning, MAE's performance in FL with unlabeled data has not been investigated. Besides, the ViT synchronization between the server and clients is different from traditional CNNs. Therefore, special synchronization methods need to be designed. In this work, we propose two federated self-supervised learning frameworks for dermatological disease diagnosis with limited labels. The first one features lower computation costs, suitable for mobile devices. The second one features high accuracy and fits high-performance servers. Based on CL, we proposed federated contrastive learning with feature sharing (FedCLF). Features are shared for diverse contrastive information without sharing raw data for privacy. Based on MAE, we proposed FedMAE. Knowledge split separates the global and local knowledge learned from each client. Only global knowledge is aggregated for higher generalization performance. Experiments on dermatological disease datasets show superior accuracy of the proposed frameworks over state-of-the-arts.  ( 3 min )
    Molecular Substructure-Aware Network for Drug-Drug Interaction Prediction. (arXiv:2208.11267v1 [cs.AI])
    Concomitant administration of drugs can cause drug-drug interactions (DDIs). Some drug combinations are beneficial, but other ones may cause negative effects which are previously unrecorded. Previous works on DDI prediction usually rely on hand-engineered domain knowledge, which is laborious to obtain. In this work, we propose a novel model, Molecular Substructure-Aware Network (MSAN), to effectively predict potential DDIs from molecular structures of drug pairs. We adopt a Transformer-like substructure extraction module to acquire a fixed number of representative vectors that are associated with various substructure patterns of the drug molecule. Then, interaction strength between the two drugs' substructures will be captured by a similarity-based interaction module. We also perform a substructure dropping augmentation before graph encoding to alleviate overfitting. Experimental results from a real-world dataset reveal that our proposed model achieves the state-of-the-art performance. We also show that the predictions of our model are highly interpretable through a case study.  ( 2 min )
  • Open

    Fairness for AUC via Feature Augmentation. (arXiv:2111.12823v2 [cs.LG] UPDATED)
    We study fairness in the context of classification where the performance is measured by the area under the curve (AUC) of the receiver operating characteristic. AUC is commonly used to measure the performance of prediction models. The same classifier can have significantly varying AUCs for different protected groups and, in real-world applications, it is often desirable to reduce such cross-group differences. We address the problem of how to acquire additional features to most greatly improve AUC for the disadvantaged group. We develop a novel approach, fairAUC, based on feature augmentation (adding features) to mitigate bias between identifiable groups. The approach requires only a few summary statistics to offer provable guarantees on AUC improvement, and allows managers flexibility in determining where in the fairness-accuracy tradeoff they would like to be. We evaluate fairAUC on synthetic and real-world datasets and find that it significantly improves AUC for the disadvantaged group relative to benchmarks maximizing overall AUC and minimizing bias between groups.  ( 2 min )
    Concentration inequalities and optimal number of layers for stochastic deep neural networks. (arXiv:2206.11241v2 [cs.LG] UPDATED)
    We state concentration and martingale inequalities for the output of the hidden layers of a stochastic deep neural network (SDNN), as well as for the output of the whole SDNN. These results allow us to introduce an expected classifier (EC), and to give probabilistic upper bound for the classification error of the EC. We also state the optimal number of layers for the SDNN via an optimal stopping procedure. We apply our analysis to a stochastic version of a feedforward neural network with ReLU activation function.  ( 2 min )
    Probabilistic Robust Autoencoders for Outlier Detection. (arXiv:2110.00494v3 [cs.LG] UPDATED)
    Anomalies (or outliers) are prevalent in real-world empirical observations and potentially mask important underlying structures. Accurate identification of anomalous samples is crucial for the success of downstream data analysis tasks. To automatically identify anomalies, we propose Probabilistic Robust AutoEncoder (PRAE). PRAE aims to simultaneously remove outliers and identify a low-dimensional representation for the inlier samples. We first present the Robust AutoEncoder (RAE) objective as a minimization problem for splitting the data into inliers and outliers. Our objective is designed to exclude outliers while including a subset of samples (inliers) that can be effectively reconstructed using an AutoEncoder (AE). RAE minimizes the autoencoder's reconstruction error while incorporating as many samples as possible. This could be formulated via regularization by subtracting an $\ell_0$ norm counting the number of selected samples from the reconstruction term. Unfortunately, this leads to an intractable combinatorial problem. Therefore, we propose two probabilistic relaxations of RAE, which are differentiable and alleviate the need for a combinatorial search. We prove that the solution to the PRAE problem is equivalent to the solution of RAE. We use synthetic data to show that PRAE can accurately remove outliers in a wide range of contamination levels. Finally, we demonstrate that using PRAE for anomaly detection leads to state-of-the-art results on various benchmark datasets.  ( 3 min )
    A coherence parameter characterizing generative compressed sensing with Fourier measurements. (arXiv:2207.09340v2 [cs.IT] UPDATED)
    In Bora et al. (2017), a mathematical framework was developed for compressed sensing guarantees in the setting where the measurement matrix is Gaussian and the signal structure is the range of a generative neural network (GNN). The problem of compressed sensing with GNNs has since been extensively analyzed when the measurement matrix and/or network weights follow a subgaussian distribution. We move beyond the subgaussian assumption, to measurement matrices that are derived by sampling uniformly at random rows of a unitary matrix (including subsampled Fourier measurements as a special case). Specifically, we prove the first known restricted isometry guarantee for generative compressed sensing with subsampled isometries, and provide recovery bounds with nearly order-optimal sample complexity, addressing an open problem of Scarlett et al. (2022, p. 10). Recovery efficacy is characterized by the coherence, a new parameter, which measures the interplay between the range of the network and the measurement matrix. Our approach relies on subspace counting arguments and ideas central to high-dimensional probability. Furthermore, we propose a regularization strategy for training GNNs to have favourable coherence with the measurement operator. We provide compelling numerical simulations that support this regularized training strategy: our strategy yields low coherence networks that require fewer measurements for signal recovery. This, together with our theoretical results, supports coherence as a natural quantity for characterizing generative compressed sensing with subsampled isometries.  ( 3 min )
    AlphaZero-Inspired Game Learning: Faster Training by Using MCTS Only at Test Time. (arXiv:2204.13307v2 [cs.LG] UPDATED)
    Recently, the seminal algorithms AlphaGo and AlphaZero have started a new era in game learning and deep reinforcement learning. While the achievements of AlphaGo and AlphaZero - playing Go and other complex games at super human level - are truly impressive, these architectures have the drawback that they require high computational resources. Many researchers are looking for methods that are similar to AlphaZero, but have lower computational demands and are thus more easily reproducible. In this paper, we pick an important element of AlphaZero - the Monte Carlo Tree Search (MCTS) planning stage - and combine it with temporal difference (TD) learning agents. We wrap MCTS for the first time around TD n-tuple networks and we use this wrapping only at test time to create versatile agents that keep at the same time the computational demands low. We apply this new architecture to several complex games (Othello, ConnectFour, Rubik's Cube) and show the advantages achieved with this AlphaZero-inspired MCTS wrapper. In particular, we present results that this agent is the first one trained on standard hardware (no GPU or TPU) to beat the very strong Othello program Edax up to and including level 7 (where most other learning-from-scratch algorithms could only defeat Edax up to level 2).  ( 3 min )
    Correctness Verification of Neural Networks. (arXiv:1906.01030v3 [cs.LG] UPDATED)
    We present a novel framework for specifying and verifying correctness globally for neural networks on perception tasks. Most previous works on neural network verification for perception tasks focus on robustness verification. Unlike robustness verification, which aims to verify that the prediction of a network is stable in some local regions around labelled points, our framework provides a way to specify correctness globally in the whole target input space and verify that the network is correct for all target inputs (or find the regions where the network is not correct). We provide a specification through 1) a state space consisting of all relevant states of the world and 2) an observation process that produces neural network inputs from the states of the world. Tiling the state and input spaces with a finite number of tiles, obtaining ground truth bounds from the state tiles and network output bounds from the input tiles, then comparing the ground truth and network output bounds delivers an upper bound on the network output error for any inputs of interest. The presented framework also enables detecting illegal inputs -- inputs that are not contained in (or close to) the target input space as defined by the state space and observation process (the neural network is not designed to work on them), so that we can flag when we don't have guarantees. Results from two case studies highlight the ability of our technique to verify error bounds over the whole target input space and show how the error bounds vary over the state and input spaces.  ( 3 min )
    FedOS: using open-set learning to stabilize training in federated learning. (arXiv:2208.11512v1 [stat.ML])
    Federated Learning is a recent approach to train statistical models on distributed datasets without violating privacy constraints. The data locality principle is preserved by sharing the model instead of the data between clients and the server. This brings many advantages but also poses new challenges. In this report, we explore this new research area and perform several experiments to deepen our understanding of what these challenges are and how different problem settings affect the performance of the final model. Finally, we present a novel approach to one of these challenges and compare it to other methods found in literature.  ( 2 min )
    Fractional SDE-Net: Generation of Time Series Data with Long-term Memory. (arXiv:2201.05974v2 [cs.LG] UPDATED)
    In this paper, we focus on the generation of time-series data using neural networks. It is often the case that input time-series data have only one realized (and usually irregularly sampled) path, which makes it difficult to extract time-series characteristics, and its noise structure is more complicated than i.i.d. type. Time series data, especially from hydrology, telecommunications, economics, and finance, exhibit long-term memory also called long-range dependency (LRD). The main purpose of this paper is to artificially generate time series with the help of neural networks, making the LRD of paths into account. We propose fSDE-Net: neural fractional Stochastic Differential Equation Network. It generalizes the neural stochastic differential equation model by using fractional Brownian motion with a Hurst index larger than half, which exhibits the LRD property. We derive the solver of fSDE-Net and theoretically analyze the existence and uniqueness of the solution to fSDE-Net. Our experiments with artificial and real time-series data demonstrate that the fSDE-Net model can replicate distributional properties well.  ( 2 min )
    Dual Extrapolation for Sparse Generalized Linear Models. (arXiv:1907.05830v3 [stat.ML] UPDATED)
    Generalized Linear Models (GLM) form a wide class of regression and classification models, where prediction is a function of a linear combination of the input variables. For statistical inference in high dimension, sparsity inducing regularizations have proven to be useful while offering statistical guarantees. However, solving the resulting optimization problems can be challenging: even for popular iterative algorithms such as coordinate descent, one needs to loop over a large number of variables. To mitigate this, techniques known as screening rules and working sets diminish the size of the optimization problem at hand, either by progressively removing variables, or by solving a growing sequence of smaller problems. For both techniques, significant variables are identified thanks to convex duality arguments. In this paper, we show that the dual iterates of a GLM exhibit a Vector AutoRegressive (VAR) behavior after sign identification, when the primal problem is solved with proximal gradient descent or cyclic coordinate descent. Exploiting this regularity, one can construct dual points that offer tighter certificates of optimality, enhancing the performance of screening rules and helping to design competitive working set algorithms.  ( 2 min )
    Theoretical insights into the optimization landscape of over-parameterized shallow neural networks. (arXiv:1707.04926v3 [cs.LG] UPDATED)
    In this paper we study the problem of learning a shallow artificial neural network that best fits a training data set. We study this problem in the over-parameterized regime where the number of observations are fewer than the number of parameters in the model. We show that with quadratic activations the optimization landscape of training such shallow neural networks has certain favorable characteristics that allow globally optimal models to be found efficiently using a variety of local search heuristics. This result holds for an arbitrary training data of input/output pairs. For differentiable activation functions we also show that gradient descent, when suitably initialized, converges at a linear rate to a globally optimal model. This result focuses on a realizable model where the inputs are chosen i.i.d. from a Gaussian distribution and the labels are generated according to planted weight coefficients.  ( 2 min )
    Weakly Supervised Disentangled Generative Causal Representation Learning. (arXiv:2010.02637v3 [cs.LG] UPDATED)
    This paper proposes a Disentangled gEnerative cAusal Representation (DEAR) learning method under appropriate supervised information. Unlike existing disentanglement methods that enforce independence of the latent variables, we consider the general case where the underlying factors of interests can be causally related. We show that previous methods with independent priors fail to disentangle causally related factors even under supervision. Motivated by this finding, we propose a new disentangled learning method called DEAR that enables causal controllable generation and causal representation learning. The key ingredient of this new formulation is to use a structural causal model (SCM) as the prior distribution for a bidirectional generative model. The prior is then trained jointly with a generator and an encoder using a suitable GAN algorithm incorporated with supervised information on the ground-truth factors and their underlying causal structure. We provide theoretical justification on the identifiability and asymptotic convergence of the proposed method. We conduct extensive experiments on both synthesized and real data sets to demonstrate the effectiveness of DEAR in causal controllable generation, and the benefits of the learned representations for downstream tasks in terms of sample efficiency and distributional robustness.  ( 3 min )
    A Riemannian Newton Trust-Region Method for Fitting Gaussian Mixture Models. (arXiv:2104.14957v2 [stat.ML] UPDATED)
    Gaussian Mixture Models are a powerful tool in Data Science and Statistics that are mainly used for clustering and density approximation. The task of estimating the model parameters is in practice often solved by the Expectation Maximization (EM) algorithm which has its benefits in its simplicity and low per-iteration costs. However, the EM converges slowly if there is a large share of hidden information or overlapping clusters. Recent advances in manifold optimization for Gaussian Mixture Models have gained increasing interest. We introduce an explicit formula for the Riemannian Hessian for Gaussian Mixture Models. On top, we propose a new Riemannian Newton Trust-Region method which outperforms current approaches both in terms of runtime and number of iterations. We apply our method on clustering problems and density approximation tasks. Our method is very powerful for data with a large share of hidden information compared to existing methods.  ( 2 min )
    Outlier Robust and Sparse Estimation of Linear Regression Coefficients. (arXiv:2208.11592v1 [math.ST])
    We consider outlier robust and sparse estimation of linear regression coefficients when covariates and noise are sampled, respectively, from an $\mathfrak{L}$-subGaussian distribution and a heavy-tailed distribution, and additionally, the covariates and noise are contaminated by adversarial outliers. We deal with two cases: known or unknown covariance of the covariates. Particularly, in the former case, our estimator attains nearly information theoretical optimal error bound, and our error bound is sharper than that of earlier studies dealing with similar situations. Our estimator analysis relies heavily on Generic Chaining to derive sharp error bounds.  ( 2 min )
    The premise of approximate MCMC in Bayesian deep learning. (arXiv:2208.11389v1 [stat.ML])
    This paper identifies several characteristics of approximate MCMC in Bayesian deep learning. It proposes an approximate sampling algorithm for neural networks. By analogy to sampling data batches from big datasets, it is proposed to sample parameter subgroups from neural network parameter spaces of high dimensions. While the advantages of minibatch MCMC have been discussed in the literature, blocked Gibbs sampling has received less research attention in Bayesian deep learning.  ( 2 min )
    Spectrum of non-Hermitian deep-Hebbian neural networks. (arXiv:2208.11411v1 [q-bio.NC])
    Neural networks with recurrent asymmetric couplings are important to understand how episodic memories are encoded in the brain. Here, we integrate the experimental observation of wide synaptic integration window into our model of sequence retrieval in the continuous time dynamics. The model with non-normal neuron-interactions is theoretically studied by deriving a random matrix theory of the Jacobian matrix in neural dynamics. The spectra bears several distinct features, such as breaking rotational symmetry about the origin, and the emergence of nested voids within the spectrum boundary. The spectral density is thus highly non-uniformly distributed in the complex plane. The random matrix theory also predicts a transition to chaos. In particular, the edge of chaos provides computational benefits for the sequential retrieval of memories. Our work provides a systematic study of time-lagged correlations with arbitrary time delays, and thus can inspire future studies of a broad class of memory models, and even big data analysis of biological time series.  ( 2 min )
    A Low-Complexity Approach to Rate-Distortion Optimized Variable Bit-Rate Compression for Split DNN Computing. (arXiv:2208.11596v1 [cs.LG])
    Split computing has emerged as a recent paradigm for implementation of DNN-based AI workloads, wherein a DNN model is split into two parts, one of which is executed on a mobile/client device and the other on an edge-server (or cloud). Data compression is applied to the intermediate tensor from the DNN that needs to be transmitted, addressing the challenge of optimizing the rate-accuracy-complexity trade-off. Existing split-computing approaches adopt ML-based data compression, but require that the parameters of either the entire DNN model, or a significant portion of it, be retrained for different compression levels. This incurs a high computational and storage burden: training a full DNN model from scratch is computationally demanding, maintaining multiple copies of the DNN parameters increases storage requirements, and switching the full set of weights during inference increases memory bandwidth. In this paper, we present an approach that addresses all these challenges. It involves the systematic design and training of bottleneck units - simple, low-cost neural networks - that can be inserted at the point of split. Our approach is remarkably lightweight, both during training and inference, highly effective and achieves excellent rate-distortion performance at a small fraction of the compute and storage overhead compared to existing methods.  ( 3 min )
    Collaborative Algorithms for Online Personalized Mean Estimation. (arXiv:2208.11530v1 [cs.LG])
    We consider an online estimation problem involving a set of agents. Each agent has access to a (personal) process that generates samples from a real-valued distribution and seeks to estimate its mean. We study the case where some of the distributions have the same mean, and the agents are allowed to actively query information from other agents. The goal is to design an algorithm that enables each agent to improve its mean estimate thanks to communication with other agents. The means as well as the number of distributions with same mean are unknown, which makes the task nontrivial. We introduce a novel collaborative strategy to solve this online personalized mean estimation problem. We analyze its time complexity and introduce variants that enjoy good performance in numerical experiments. We also extend our approach to the setting where clusters of agents with similar means seek to estimate the mean of their cluster.  ( 2 min )
    Transfer Learning-based State of Health Estimation for Lithium-ion Battery with Cycle Synchronization. (arXiv:2208.11204v1 [cs.LG])
    Accurately estimating a battery's state of health (SOH) helps prevent battery-powered applications from failing unexpectedly. With the superiority of reducing the data requirement of model training for new batteries, transfer learning (TL) emerges as a promising machine learning approach that applies knowledge learned from a source battery, which has a large amount of data. However, the determination of whether the source battery model is reasonable and which part of information can be transferred for SOH estimation are rarely discussed, despite these being critical components of a successful TL. To address these challenges, this paper proposes an interpretable TL-based SOH estimation method by exploiting the temporal dynamic to assist transfer learning, which consists of three parts. First, with the help of dynamic time warping, the temporal data from the discharge time series are synchronized, yielding the warping path of the cycle-synchronized time series responsible for capacity degradation over cycles. Second, the canonical variates retrieved from the spatial path of the cycle-synchronized time series are used for distribution similarity analysis between the source and target batteries. Third, when the distribution similarity is within the predefined threshold, a comprehensive target SOH estimation model is constructed by transferring the common temporal dynamics from the source SOH estimation model and compensating the errors with a residual model from the target battery. Through a widely-used open-source benchmark dataset, the estimation error of the proposed method evaluated by the root mean squared error is as low as 0.0034 resulting in a 77% accuracy improvement compared with existing methods.  ( 3 min )
    A Bayesian Variational principle for dynamic Self Organizing Maps. (arXiv:2208.11337v1 [cs.LG])
    We propose organisation conditions that yield a method for training SOM with adaptative neighborhood radius in a variational Bayesian framework. This method is validated on a non-stationary setting and compared in an high-dimensional setting with an other adaptative method.  ( 2 min )
    Fast emulation of density functional theory simulations using approximate Gaussian processes. (arXiv:2208.11302v1 [stat.ML])
    Fitting a theoretical model to experimental data in a Bayesian manner using Markov chain Monte Carlo typically requires one to evaluate the model thousands (or millions) of times. When the model is a slow-to-compute physics simulation, Bayesian model fitting becomes infeasible. To remedy this, a second statistical model that predicts the simulation output -- an "emulator" -- can be used in lieu of the full simulation during model fitting. A typical emulator of choice is the Gaussian process (GP), a flexible, non-linear model that provides both a predictive mean and variance at each input point. Gaussian process regression works well for small amounts of training data ($n 10^5$), trading away predictive accuracy for drastically reduced runtime. This work examines the accuracy-runtime trade-off of several approximate Gaussian process models -- the sparse variational GP, stochastic variational GP, and deep kernel learned GP -- when emulating the predictions of density functional theory (DFT) models. Additionally, we use the emulators to calibrate, in a Bayesian manner, the DFT model parameters using observed data, resolving the computational barrier imposed by the data set size, and compare calibration results to previous work. The utility of these calibrated DFT models is to make predictions, based on observed data, about the properties of experimentally unobserved nuclides of interest e.g. super-heavy nuclei.  ( 3 min )
    Accelerating SGD for Highly Ill-Conditioned Huge-Scale Online Matrix Completion. (arXiv:2208.11246v1 [cs.LG])
    The matrix completion problem seeks to recover a $d\times d$ ground truth matrix of low rank $r\ll d$ from observations of its individual elements. Real-world matrix completion is often a huge-scale optimization problem, with $d$ so large that even the simplest full-dimension vector operations with $O(d)$ time complexity become prohibitively expensive. Stochastic gradient descent (SGD) is one of the few algorithms capable of solving matrix completion on a huge scale, and can also naturally handle streaming data over an evolving ground truth. Unfortunately, SGD experiences a dramatic slow-down when the underlying ground truth is ill-conditioned; it requires at least $O(\kappa\log(1/\epsilon))$ iterations to get $\epsilon$-close to ground truth matrix with condition number $\kappa$. In this paper, we propose a preconditioned version of SGD that preserves all the favorable practical qualities of SGD for huge-scale online optimization while also making it agnostic to $\kappa$. For a symmetric ground truth and the Root Mean Square Error (RMSE) loss, we prove that the preconditioned SGD converges to $\epsilon$-accuracy in $O(\log(1/\epsilon))$ iterations, with a rapid linear convergence rate as if the ground truth were perfectly conditioned with $\kappa=1$. In our numerical experiments, we observe a similar acceleration for ill-conditioned matrix completion under the 1-bit cross-entropy loss, as well as pairwise losses such as the Bayesian Personalized Ranking (BPR) loss.  ( 3 min )

  • Open

    Facebook AI Researchers Open-Source ‘LLM.int8()’ Tool To Perform Inference In Large Language Models (LLMs) With Up To 175B Parameters Without Any Performance Degradation
    submitted by /u/ai-lover [link] [comments]  ( 88 min )
    The fine comparison among 3 web generations. If you can add something or reference your source, that will be a great addon
    submitted by /u/Kedjja [link] [comments]  ( 87 min )
    debate and rhetoric
    chess is a game with practically infinite moves that, given parameter values, can be calculated down to specific finite options to win or at least defend. is it conceivable a future ai capable of understanding the world, and also capable of communicating its understanding to humans could, given parameters and values that would matter to human ears, convince us of its point of view with the same prowess they can beat us at chess today? is this metaphor at all a fit? submitted by /u/aq-r-steppedinsome [link] [comments]  ( 89 min )
    Artifical intelligence model to forecast entire region’s solar output using real-time satellite data
    submitted by /u/tailorvikas56 [link] [comments]  ( 87 min )
    Major Record Labels Signs Virtual Rapper Who Is An AI
    submitted by /u/estasfuera [link] [comments]  ( 89 min )
    Leon Open-Source Personal Assistant: A Much Better NLP and Future
    submitted by /u/Louistiti [link] [comments]  ( 87 min )
    AI And The Limits Of Language | NOEMA
    submitted by /u/Futures_Bot [link] [comments]  ( 87 min )
    Artificial Life (Simulation & Code)
    submitted by /u/West_Alan_880 [link] [comments]  ( 86 min )
    Building a backend from scratch using only OpenAI Codex
    submitted by /u/icyFur [link] [comments]  ( 87 min )
    StableDiffusion P.O.T.S.AI Art Weekly Slideshow 8.24.22 Sci Fi, Horror a...
    submitted by /u/prfitofthesngularity [link] [comments]  ( 87 min )
    Opinion: Can Humans Coexist with Driverless Vehicles?
    submitted by /u/solidist [link] [comments]  ( 86 min )
    Use Stable Diffusion to create AI art For free! 2 of the best colab note...
    submitted by /u/prfitofthesngularity [link] [comments]  ( 87 min )
    Midjourney Beta - Becoming a Pro Prompt Photographer with GPT3
    submitted by /u/kbf_ [link] [comments]  ( 87 min )
    Could this be done and how would you do it?
    Could a website use AI to scan websites and update a host site? For example: A system that explores other webpages to find scores and updates the scores on its host site. Or can systems not navigate webpages like that on their own to find information? submitted by /u/nattescott [link] [comments]  ( 87 min )
    Web 3.0 - The Technology that belongs to the Future
    submitted by /u/Kedjja [link] [comments]  ( 87 min )
  • Open

    [P] goa-loader: Generative Modeling and a tf.data loader for the National Gallery of Art Open Data Program
    Hello all! This weekend I wrote: goa-loader, National Gallery of Art Open Data Program tf.data.Dataset Loader and generative modeling to accompany Let me know if you use the loader. Open to PRs to add GANs or anything else interesting, enjoy! submitted by /u/puppet_pals [link] [comments]  ( 88 min )
    [D] SpaCy question
    Let's say I have a string similar to "A00 2075 1x10 A00 2078 2x10" Where it should be tagged as "A00 (tagA1) 2075 (tagB1) 1x10 (tagC1) A00 (tagA2) 2078 (tagB2) 2x10 (tagC2)" With enough examples, would spacy be able to detect A00 as tagA1 if it's the first instance found and the other A00 as tagA2 since it's the second instance of it? In summary, is spaCy able to use the location of a tag in reference to another one to determine the correct tag to use? TagA1 must be used in order for TagA2 to be used if that makes sense. Please let me know if any part of this question is unclear and I can try to clarify it. submitted by /u/1017BarSquad [link] [comments]  ( 106 min )
    [R] New paper on inferring Out-of-Distribution Generalization of Meta-Learning with Deep Neural Networks
    Our paper (accepted at CoLAs 2022 and to be published in PMLR) is out! Check out the paper on ArXiv : https://arxiv.org/abs/2208.02377 TLDR : We show that Meta-Learning generalization to novel OOD task distributions can be inferred from the neural activation dynamics from a few unlabeled examples, and we propose Activation-Based Early-Stopping (ABE). submitted by /u/Simon_Guiroy [link] [comments]  ( 106 min )
    [D] HELP!!
    So, I have decided to take up the new machine learning course by Andrew Ng on coursera, unfortunately, I cannot afford the course and since this is a 3 part series integrated course, financial aid is available for 1 of the 3 courses. Is there a way I could get all 3 for free? submitted by /u/Buzzzzmonkey [link] [comments]  ( 106 min )
    [P] Semantic code search using Transformers - codesearch.ai
    https://codesearch.ai Hey, I'm Rok, a software engineer at Sourcegraph, and I've been working on an experimental AI-powered code search engine called codesearch.ai as a side project. It answers natural language queries with functions indexed from GitHub.com and StackOverflow. Under the hood, it uses the Hugging Face RoBERTa model, and the training procedure is inspired by a paper called Text and Code Embeddings by Contrastive Pretraining from OpenAI. Additionally, it uses a custom PyTorch model to fine-tune the model on the code search task, the FAISS library for nearest neighbor search, and FastAPI for a simple API server. Data collection and processing are written in Go. We prepared a detailed code walkthrough if you are interested in data collection and the training procedures. The open source code is available in a GitHub repository. The model currently has 43M parameters, a far cry from GPT3 and Codex models. The reduced size means that the results are not perfect, and you could quickly get nonsensical results. Nonetheless, we have been pretty satisfied with the model's ability to answer a wide range of queries, and expanding the model should hopefully give us even better results. If you have any feedback or questions, leave a comment. submitted by /u/add7 [link] [comments]  ( 90 min )
    [D] [P] Cracking the hood of Stable Diffusion - Review, Comparison, and Examples.
    Stable Diffusion has been making a lot of noise ever since it was officially announced, not to mention the last 24 hours since it was fully released. And based on its performance - rightfully so. Created by the researchers from Stability AI, "Stable Diffusion" claims the crown from Craiyon (AKA DALL·E-Mini), to be the new state-of-the-art, text-to-image, open-source model. https://preview.redd.it/ctw5vnusdoj91.png?width=574&format=png&auto=webp&s=a22c776f6983fae738c2e38b0ec2fb606f41afa3 Although generating images from text already feels like ancient technology, Stable Diffusion manages to bring innovation to the table, which is even more surprising given that it's an open-source project. In my new blog, I crack the hood of Stable Diffusion, examine its architecture and the innovation it brings to the table, dive into the technical details, and compare its performance to similar models. I invite you to read about all of these in my new blog submitted by /u/RepresentativeCod613 [link] [comments]  ( 89 min )
    [D]where can i see ECCV 2022 paper list?
    I want to see papers in ECCV 2022 which are about NeRF! submitted by /u/No_Fig_3372 [link] [comments]  ( 104 min )
    [D] Which statistical test would you use to detect drift in a dataset of images?
    Hi all, Im currently building a pipeline to detect drifts in my data. I'm able to update my dataset (n=100k - 10m), in which every row has a few categorical features and an embedding. The embedding vector (len=192, normelized) is the output of some CNN and Im trying to use it to detect drift in the data. I was wondering: which statistical test you would use to detect drift in such a scenario? I've used evidently for Wasserstein but it doesn't seem to be appropriate for this problem. I've read somewhere tit's more appropriate for a univariant case. I've also seen a nannyML's PCA solution which I liked. submitted by /u/Darmerr [link] [comments]  ( 109 min )
    [D] Should the praise of "Data-centric" AI be taken with a grain of salt?
    I hear co-workers, professors, entrepreneurs talk about just how important it is to switch from model-centric AI to a data-centric approach. Personally I don't see a problem focusing on attaining better data but I have been seeing this simplification that AI performance = model + data which seems massively misleading. A better generalization would be AI performance = model * data or even AI performance = model * log(len(data)). Looking at the rapidly changing problem of image synthesis, it is clear that data alone could not produce the same results seen even two years ago. On the other hand, it can seem unproductive for the field as a whole to try to create another classification model that achieves a .02% increased performance on ImageNet... again and again... And this is maybe why we hear more talk in industry about focusing on the data instead of models: Industry generally faces simpler tasks with negligible improvement as of recently (regression, classification). Therefore it's more easy and valuable to increase and improve the training data. And yes, most machine learning models NEED a large training set to approximate a complex distribution. We have already known this since forever. I think the term has just recently become a pet peeve for me as a novel "catch all" solution to any problem. What are your opinions on the matter? submitted by /u/zimonitrome [link] [comments]  ( 108 min )
    [D] Reading Group: Describing Differences between Text Distributions with Natural Language
    ​ https://preview.redd.it/kzr84j69dnj91.png?width=2228&format=png&auto=webp&s=df30cd90fd3486f83e5630eaee782eaac1e0a197 More info at https://outsystems-ai-reading-group.github.io/ submitted by /u/Alex_Lemos [link] [comments]  ( 88 min )
    [D] How am I supposed to represent audio in a neural network?
    I am planning to train a Cycle-WaveGAN at some point. I have seen people presenting audio as a just sample points, (Like it's typical to do with an image) mel-frequency spectrum, sliding STFT, etc. What is the most common/best way to represent audio in a neural network and what are the pros/cons of each? submitted by /u/andrew21w [link] [comments]  ( 94 min )
    [D] ML for Good
    Does anyone have any experience using their ML skills in a non-profit manner on societal or environmental issues? How did you get started? Recently I’ve been feeling that most advances in tech don’t really solve the problems we really need solved. Feeling a bit disillusioned. submitted by /u/xdqz [link] [comments]  ( 121 min )
    [D] OCR models for invoice reading
    Hi all, TLDR: I try to read some semi-structured invoices and process them with a quick to use OCR tool. I found that Amazon Textract is great in Raw-text mode, just everything is so messy and all over the place. I only need to extract the products from the invoices and their quantity and price. I think it would be kind of a big headache to hard code everything such that Textract returns exactly what I need. The invoices are photographed with a cell phone camera I looked at tesseract, but I really do not want to fine tune/build my own OCR model and speed is the key. Has anyone had any experience with these kinds of OCR problems ? Ideally I would love to run it as a separate microservice on AWS/Azure and not include it in my main backend. Any help is deeply appreciated! submitted by /u/younggamech [link] [comments]  ( 89 min )
    [D] Do neural networks create a separating hyperplane?
    https://colah.github.io/posts/2014-03-NN-Manifolds-Topology/ mentions that neural networks learn a representation of the data so as to make the classes linearly separable. What I fail to see is how does a neural network create a separating hyperplane that separates the classes. How do we know that it creates a hyperplane? What's the math behind it? submitted by /u/JJP77 [link] [comments]  ( 92 min )
    [Discussion] How feasible is it to partition a DNN model into pieces?
    Just read Auto-Split: A General Framework of Collaborative Edge-Cloud AI by a group of Huawei researchers (https://arxiv.org/pdf/2108.13041.pdf). How feasible is it to break up the models and serve them to the edges and the cloud? If it is possible, is this task easy to implement? Does Tensorflow, PyTorch or cv.dnn have guides or APIs to achieve the task? I am speaking from the perspective of a small development team that are not experts at machine learning. I can only imagine splitting the model at a non-functional binary level, not at a functional subgraph levels. submitted by /u/zerovirus123 [link] [comments]  ( 90 min )
    [D] Have been interviewing and just found out I'll be getting an offer from a company, but have final FANG interview Thursday. How do I buy time without losing the opportunity?
    Company A seems great and is making me an offer tomorrow. However, company B is a "Fang" type company and the role seems particularly good for me, and that interview is in two days. I am not confident I'll pass this final round and even get an offer, but it's a bit of a dream job so I would like to take a shot at it. But of course, if it doesn't work out I want to work with company A and not end up empty handed. Is there a good way to word this? I'm assuming it's fairly common for people to be in the offer stage for multiple companies, so how do I let them know without making it sound like they are my second option? I want to come across that I want to fairly compare offers from my two top choices. Any advice? Worried I'm going to manage to blow it all :P submitted by /u/bandalorian [link] [comments]  ( 90 min )
  • Open

    Introduction to Machine Learning
    submitted by /u/ramacastro [link] [comments]  ( 86 min )
  • Open

    Taking a magnifying glass to data center operations
    Lincoln Laboratory Supercomputing Center dataset aims to accelerate AI research into managing and optimizing high-performance computing systems.  ( 8 min )
    Building better batteries, faster
    PhD student Pablo Leon uses machine learning to expedite research on novel battery materials, while helping newer students navigate graduate school.  ( 8 min )
  • Open

    Using ML to Boost Engagement with a Maternal and Child Health Program in India
    Posted by Aparna Taneja, Software Engineer, and Milind Tambe, Principal Scientist, Google Research, India Research Lab The widespread availability of mobile phones has enabled non-profits to deliver critical health information to their beneficiaries in a timely manner. While advanced applications on smartphones allow for richer multimedia content and two-way communication between beneficiaries and health coaches, simpler text and voice messaging services can be effective in disseminating information to large communities, particularly those that are underserved with limited access to information and smartphones. ARMMAN1, one non-profit doing just this, is based in India with the mission of improving maternal and child health outcomes in underserved communities. Overview of ARMMAN …  ( 27 min )
  • Open

    Skip Gym env obs space enforcement?
    Hi everyone, ​ This may be more of a software engineering question, but would still be interested in any ways people might have done this. In my project, we are currently building a system that would like to support both RL and non-RL methods for a custom simulator that has been written. I have a currently working gym env, and due to wanting to provide flexibility to other methods I have been using the same env for both non-rl and rl methods. However when wanting to use non RL methods, I'd like to be able to have the observations just be a dictionary of dictionaries instead of a dictionary of gym.env.spaces.Box(). This doesn't seem to be supported with gym, so I'd instead like to bypass the gym obs space env checker depending on which algorithm is being used. ​ Has anyone run into a similar situation, or have any idea how to do this? My current plan is to just have a non-rl env and an rl env as separate classes that both inherit from some base env that actually has the env logic, and the child RL class would inherit from gym (while the non RL env would not). But I just wanted to see if there was a way to disable gym checkers altogether submitted by /u/asdfsflhasdfa [link] [comments]  ( 106 min )
    [D] question about the policy gradient theorem and links to function approximation
    Hi, I am reading Sutton and Barto and am currently on chapter 13, policy gradients. I am looking at the policy gradient theorem - which overall is intuitive to me, however there is one thing that is slightly unclear. The definition of the state value under policy pi, is the probability of taking action a in state s, multiplied by how good it is to take this action and summed for all actions - this is the definition of the value function and theorem goes on to derive it. this is fine.. however, when discussing a specific trajectory, that is - starting in state s (s0) and going to any state 'x' in a number of steps - it is the summation of these transition probabilities that define the value of being state s under policy pi - however, one thing that confuses me is the summation over all actions. is the value of 's0' the transition probability multiplied by the value of taking that action plus the probability of taking a' in s' plus the value of s''' etc. OR is it as just described but a summation over all actions at each state as opposed to simply the transition probability alone? - sutton and barto suggest the value of s0 is the summation over all state under pi and summation over all actions at each state. Then - taking a parameterised nn policy gradient approach, the equation followed, is the (log) of the probability multiplied by some return - this takes only the action that is taken in to account for that specific state at a time step when calculating the value. can someone help clarify my confusion please, and the connection between the summation over all action in the policy gradient theorem and typical case where only the probability of the action taken is used. thank you! submitted by /u/amjass12 [link] [comments]  ( 89 min )
    Is there a case to use reinforcement learning when I have pre-determined data?
    My background is software engineering and I've been researching reinforcement learning. From what I now understand, one of the primary use cases is with optimal control systems when there is no data available. For my use case, I have data, but it's not labeled. There are numerical methods I could use to label it, but it wouldn't be as accurate as manual labeling (which would be costly and time-consuming as well). For this reason, I have been considering the use of RL. Would it be more effective to label the data given the difficultly of RL? Additionally, my problem is a decision-based problem. If MDP's apply, does it make more sense to use RL than supervised learning? At a high level, are there cases of when I should take on the RL challenges if I have data that I can label even if the problem can be framed as an MDP? submitted by /u/FinateAI [link] [comments]  ( 92 min )
    Is it possible for quadruped robot learning from video?
    submitted by /u/Round-Ad1177 [link] [comments]  ( 87 min )
    Need some feedback on what could be going on in the learning process of this game?
    Hi all, I would gladly appreciate some feedback on the following problem. Goal: To train ryu to defeat Guile in the default difficulty with a higher probability in just 1 round. The reward function I used is [2*(176-current_opponent_hp) - (176-current_own_hp)]/176. The maximum health when round starts is 176 hp so clearly, the perfect win is when ryu doesnt take damage, the case of reward = [2(176-0) - (176-0)]/176 = 2. Stochastically, I am to train ryu to beat the round with a perfect reward score, lets say out of 10 games, almost 8 or 9 times ryu can beat with a perfect win using this reward function. The game here is Street Fighter Champion Edition II in gym retro environment. How: I used the stable baseline PPO with the following settings: n_steps = 2560 (Because a round takes 2-2.…  ( 90 min )
  • Open

    Our approach to alignment research
    Our approach to aligning AGI is empirical and iterative. We are improving our AI systems’ ability to learn from human feedback and to assist humans at evaluating AI. Our goal is to build a sufficiently aligned AI system that can help us solve all other alignment problems. Introduction Our  ( 7 min )
  • Open

    AWS Deep Learning Challenge sees innovative and impactful use of Amazon EC2 DL1 instances
    In the AWS Deep Learning Challenge held from January 5, 2022, to March 1, 2022, participants from academia, startups, and enterprise organizations joined to test their skills and train a deep learning model of their choice using Amazon Elastic Compute Cloud (Amazon EC2) DL1 instances and Habana’s SynapseAI SDK. The EC2 DL1 instances powered by […]  ( 4 min )
  • Open

    The Future of Artificial Intelligence In Marketing
    The promises of artificial intelligence (AI) have been around for decades. But despite many technological advancements over the past few…  ( 13 min )
  • Open

    The Jupyter+git problem is now solved
    Jupyter notebooks don’t work with git by default. With nbdev2, the Jupyter+git problem has been totally solved. It provides a set of hooks which provide clean git diffs, solve most git conflicts automatically, and ensure that any remaining conflicts can be resolved entirely within the standard Jupyter notebook environment. To get started, follow the directions on Git-friendly Jupyter. Contents The Jupyter+git problem The solution The nbdev2 git merge driver The nbdev2 Jupyter save hook Background The result Postscript: other Jupyter+git tools ReviewNB An alternative solution: Jupytext nbdime The Jupyter+git problem Jupyter notebooks are a powerful tool for scientists, engineers, technical writers, students, teachers, and more. They provide an ideal notebook environment for interact…  ( 7 min )
  • Open

    Direction between two cities
    This post is the third in a series of posts on spherical trigonometry. We first looked at the analog of the Pythagorean theorem on a sphere, then the analog of the law of cosines on a sphere. Now we look at the analog of the law of sines on a sphere. As before we denote […] Direction between two cities first appeared on John D. Cook.  ( 7 min )
    Law of cosines on a sphere
    The previous post looked at the analog of the Pythagorean theorem on a sphere. This post looks at the law of cosines on a sphere. Yesterday we looked at triangles on a sphere with sides a and b meeting at a right angle and hypotenuse c. Denote the angle opposite a side with the capital […] Law of cosines on a sphere first appeared on John D. Cook.  ( 5 min )
  • Open

    Constants of motion network. (arXiv:2208.10387v2 [cs.LG] UPDATED)
    The beauty of physics is that there is usually a conserved quantity in an always-changing system, known as the constant of motion. Finding the constant of motion is important in understanding the dynamics of the system, but typically requires mathematical proficiency and manual analytical work. In this paper, we present a neural network that can simultaneously learn the dynamics of the system and the constants of motion from data. By exploiting the discovered constants of motion, it can produce better predictions on dynamics and can work on a wider range of systems than Hamiltonian-based neural networks. In addition, the training progresses of our method can be used as an indication of the number of constants of motion in a system which could be useful in studying a novel physical system.  ( 2 min )
    PAGER: Progressive Attribute-Guided Extendable Robust Image Generation. (arXiv:2206.00162v2 [cs.CV] UPDATED)
    This work presents a generative modeling approach based on successive subspace learning (SSL). Unlike most generative models in the literature, our method does not utilize neural networks to analyze the underlying source distribution and synthesize images. The resulting method, called the progressive attribute-guided extendable robust image generative (PAGER) model, has advantages in mathematical transparency, progressive content generation, lower training time, robust performance with fewer training samples, and extendibility to conditional image generation. PAGER consists of three modules: core generator, resolution enhancer, and quality booster. The core generator learns the distribution of low-resolution images and performs unconditional image generation. The resolution enhancer increases image resolution via conditional generation. Finally, the quality booster adds finer details to generated images. Extensive experiments on MNIST, Fashion-MNIST, and CelebA datasets are conducted to demonstrate generative performance of PAGER.  ( 2 min )
    Fast Projected Newton-like Method for Precision Matrix Estimation under Total Positivity. (arXiv:2112.01939v3 [cs.LG] UPDATED)
    We study the problem of estimating precision matrices in Gaussian distributions that are multivariate totally positive of order two ($\mathrm{MTP}_2$). The precision matrix in such a distribution is an M-matrix. The problem can be formulated as a sign-constrained log-determinant program. The existing algorithms designed for solving this problem are based on the block coordinate descent method, which are computationally prohibitive in high-dimensional cases, because of the need to solve a large number of nonnegative quadratic programs. We propose a novel algorithm based on the two-metric projection method, with a well-designed search direction and variable partitioning scheme. Our algorithm reduces the computational complexity significantly in solving this problem, and its theoretical convergence is established. Experiments involving synthetic and real-world data demonstrate that our proposed algorithm is significantly more efficient, from a computational time perspective, than the state-of-the-art methods.  ( 2 min )
    Deeply Supervised Skin Lesions Diagnosis with Stage and Branch Attention. (arXiv:2205.04326v6 [eess.IV] UPDATED)
    Accurate and unbiased examinations of skin lesions are critical for the early diagnosis and treatment of skin diseases. Visual features of skin lesions vary significantly because the images are collected from patients with different lesion colours and morphologies by using dissimilar imaging equipment. Recent studies have reported that ensembled convolutional neural networks (CNNs) are practical to classify the images for early diagnosis of skin disorders. However, the practical use of these ensembled CNNs is limited as these networks are heavyweight and inadequate for processing contextual information. Although lightweight networks (e.g., MobileNetV3 and EfficientNet) were developed to achieve parameters reduction for implementing deep neural networks on mobile devices, insufficient depth of feature representation restricts the performance. To address the existing limitations, we develop a new lite and effective neural network, namely HierAttn. The HierAttn applies a novel deep supervision strategy to learn the local and global features by using multi-stage and multi-branch attention mechanisms with only one training loss. The efficacy of HierAttn was evaluated by using the dermoscopy images dataset ISIC2019 and smartphone photos dataset PAD-UFES-20 (PAD2020). The experimental results show that HierAttn achieves the best accuracy and area under the curve (AUC) among the state-of-the-art lightweight networks. The code is available at https://github.com/anthonyweidai/HierAttn.  ( 3 min )
    SCONE: Surface Coverage Optimization in Unknown Environments by Volumetric Integration. (arXiv:2208.10449v1 [cs.CV] CROSS LISTED)
    Next Best View computation (NBV) is a long-standing problem in robotics, and consists in identifying the next most informative sensor position(s) for reconstructing a 3D object or scene efficiently and accurately. Like most current methods, we consider NBV prediction from a depth sensor. Learning-based methods relying on a volumetric representation of the scene are suitable for path planning, but do not scale well with the size of the scene and have lower accuracy than methods using a surface-based representation. However, the latter constrain the camera to a small number of poses. To obtain the advantages of both representations, we show that we can maximize surface metrics by Monte Carlo integration over a volumetric representation. Our method scales to large scenes and handles free camera motion: It takes as input an arbitrarily large point cloud gathered by a depth sensor like Lidar systems as well as camera poses to predict NBV. We demonstrate our approach on a novel dataset made of large and complex 3D scenes.  ( 2 min )
    One Model to Unite Them All: Personalized Federated Learning of Multi-Contrast MRI Synthesis. (arXiv:2207.06509v2 [eess.IV] UPDATED)
    Multi-institutional collaborations are key for learning generalizable MRI synthesis models that translate source- onto target-contrast images. To facilitate collaboration, federated learning (FL) adopts decentralized training and mitigates privacy concerns by avoiding sharing of imaging data. However, FL-trained synthesis models can be impaired by the inherent heterogeneity in the data distribution, with domain shifts evident when common or variable translation tasks are prescribed across sites. Here we introduce the first personalized FL method for MRI Synthesis (pFLSynth) to improve reliability against domain shifts. pFLSynth is based on an adversarial model that produces latents specific to individual sites and source-target contrasts, and leverages novel personalization blocks to adaptively tune the statistics and weighting of feature maps across the generator stages given latents. To further promote site specificity, partial model aggregation is employed over downstream layers of the generator while upstream layers are retained locally. As such, pFLSynth enables training of a unified synthesis model that can reliably generalize across multiple sites and translation tasks. Comprehensive experiments on multi-site datasets clearly demonstrate the enhanced performance of pFLSynth against prior federated methods in multi-contrast MRI synthesis.  ( 3 min )
    Privacy Enhancement for Cloud-Based Few-Shot Learning. (arXiv:2205.07864v2 [cs.LG] UPDATED)
    Requiring less data for accurate models, few-shot learning has shown robustness and generality in many application domains. However, deploying few-shot models in untrusted environments may inflict privacy concerns, e.g., attacks or adversaries that may breach the privacy of user-supplied data. This paper studies the privacy enhancement for the few-shot learning in an untrusted environment, e.g., the cloud, by establishing a novel privacy-preserved embedding space that preserves the privacy of data and maintains the accuracy of the model. We examine the impact of various image privacy methods such as blurring, pixelization, Gaussian noise, and differentially private pixelization (DP-Pix) on few-shot image classification and propose a method that learns privacy-preserved representation through the joint loss. The empirical results show how privacy-performance trade-off can be negotiated for privacy-enhanced few-shot learning.  ( 2 min )
    Gradient-Variation Bound for Online Convex Optimization with Constraints. (arXiv:2006.12455v2 [math.OC] UPDATED)
    We study online convex optimization with constraints consisting of multiple functional constraints and a relatively simple constraint set, such as a Euclidean ball. As enforcing the constraints at each time step through projections is computationally challenging in general, we allow decisions to violate the functional constraints but aim to achieve a low regret and cumulative violation of the constraints over a horizon of $T$ time steps. First-order methods achieve an $\mathcal{O}(\sqrt{T})$ regret and an $\mathcal{O}(1)$ constraint violation, which is the best-known bound, but do not take into account the structural information of the problem. Furthermore, the existing algorithms and analysis are limited to Euclidean space. In this paper, we provide an \emph{instance-dependent} bound for online convex optimization with complex constraints obtained by a novel online primal-dual mirror-prox algorithm. Our instance-dependent regret is quantified by the total gradient variation $V_*(T)$ in the sequence of loss functions. The proposed algorithm works in \emph{general} non-Euclidean spaces and simultaneously achieves an $\mathcal{O}(\sqrt{V_*(T)})$ regret and an $\mathcal{O}(1)$ constraint violation, which is never worse than the best-known $( \mathcal{O}(\sqrt{T}), \mathcal{O}(1) )$ result and improves over previous works that applied mirror-prox-type algorithms for this problem achieving $\mathcal{O}(T^{2/3})$ regret and constraint violation. Finally, our algorithm is computationally efficient, as it only performs mirror descent steps in each iteration instead of solving a general Lagrangian minimization problem.  ( 3 min )
    AtmoDist: Self-supervised Representation Learning for Atmospheric Dynamics. (arXiv:2202.01897v2 [physics.ao-ph] UPDATED)
    Representation learning has proven to be a powerful methodology in a wide variety of machine learning applications. For atmospheric dynamics, however, it has so far not been considered, arguably due to the lack of large-scale, labeled datasets that could be used for training. In this work, we show that the difficulty is benign and introduce a self-supervised learning task that defines a categorical loss for a wide variety of unlabeled atmospheric datasets. Specifically, we train a neural network on the simple yet intricate task of predicting the temporal distance between atmospheric fields from distinct but nearby times. We demonstrate that training with this task on ERA5 reanalysis leads to internal representations capturing intrinsic aspects of atmospheric dynamics. We do so by introducing a data-driven distance metric for atmospheric states. When employed as a loss function in other machine learning applications, this Atmodist distance leads to improved results compared to the classical $\ell_2$-loss. For example, for downscaling one obtains higher resolution fields that match the true statistics more closely than previous approaches and for the interpolation of missing or occluded data the AtmoDist distance leads to results that contain more realistic fine scale features. Since it is derived from observational data, AtmoDist also provides a novel perspective on atmospheric predictability.  ( 3 min )
    Variable importance without impossible data. (arXiv:2205.15750v2 [cs.LG] UPDATED)
    The most popular methods for measuring importance of the variables in a black box prediction algorithm make use of synthetic inputs that combine predictor variables from multiple subjects. These inputs can be unlikely, physically impossible, or even logically impossible. As a result, the predictions for such cases can be based on data very unlike any the black box was trained on. We think that users cannot trust an explanation of the decision of a prediction algorithm when the explanation uses such values. Instead we advocate a method called Cohort Shapley that is grounded in economic game theory and unlike most other game theoretic methods, it uses only actually observed data to quantify variable importance. Cohort Shapley works by narrowing the cohort of subjects judged to be similar to a target subject on one or more features. A feature is important if using it to narrow the cohort makes a large difference to the cohort mean. We illustrate it on an algorithmic fairness problem where it is essential to attribute importance to protected variables that the model was not trained on. For every subject and every predictor variable, we can compute the importance of that predictor to the subject's predicted response or to their actual response. These values can be aggregated, for example over all Black subjects, and we propose a Bayesian bootstrap to quantify uncertainty in both individual and aggregate Shapley values.
    Simulation-Informed Revenue Extrapolation with Confidence Estimate for Scaleup Companies Using Scarce Time-Series Data. (arXiv:2208.10375v2 [cs.CE] UPDATED)
    Investment professionals rely on extrapolating company revenue into the future (i.e. revenue forecast) to approximate the valuation of scaleups (private companies in a high-growth stage) and inform their investment decision. This task is manual and empirical, leaving the forecast quality heavily dependent on the investment professionals' experiences and insights. Furthermore, financial data on scaleups is typically proprietary, costly and scarce, ruling out the wide adoption of data-driven approaches. To this end, we propose a simulation-informed revenue extrapolation (SiRE) algorithm that generates fine-grained long-term revenue predictions on small datasets and short time-series. SiRE models the revenue dynamics as a linear dynamical system (LDS), which is solved using the EM algorithm. The main innovation lies in how the noisy revenue measurements are obtained during training and inferencing. SiRE works for scaleups that operate in various sectors and provides confidence estimates. The quantitative experiments on two practical tasks show that SiRE significantly surpasses the baseline methods by a large margin. We also observe high performance when SiRE extrapolates long-term predictions from short time-series. The performance-efficiency balance and result explainability of SiRE are also validated empirically. Evaluated from the perspective of investment professionals, SiRE can precisely locate the scaleups that have a great potential return in 2 to 5 years. Furthermore, our qualitative inspection illustrates some advantageous attributes of the SiRE revenue forecasts.  ( 3 min )
    Transformer Network-based Reinforcement Learning Method for Power Distribution Network (PDN) Optimization of High Bandwidth Memory (HBM). (arXiv:2203.15722v2 [cs.LG] UPDATED)
    In this article, for the first time, we propose a transformer network-based reinforcement learning (RL) method for power distribution network (PDN) optimization of high bandwidth memory (HBM). The proposed method can provide an optimal decoupling capacitor (decap) design to maximize the reduction of PDN self- and transfer impedance seen at multiple ports. An attention-based transformer network is implemented to directly parameterize decap optimization policy. The optimality performance is significantly improved since the attention mechanism has powerful expression to explore massive combinatorial space for decap assignments. Moreover, it can capture sequential relationships between the decap assignments. The computing time for optimization is dramatically reduced due to the reusable network on positions of probing ports and decap assignment candidates. This is because the transformer network has a context embedding process to capture meta-features including probing ports positions. In addition, the network is trained with randomly generated data sets. Therefore, without additional training, the trained network can solve new decap optimization problems. The computing time for training and data cost are critically decreased due to the scalability of the network. Thanks to its shared weight property, the network can adapt to a larger scale of problems without additional training. For verification, we compare the results with conventional genetic algorithm (GA), random search (RS), and all the previous RL-based methods. As a result, the proposed method outperforms in all the following aspects: optimality performance, computing time, and data efficiency.  ( 3 min )
    Artificial Intelligence-Based Analytics for Impacts of COVID-19 and Online Learning on College Students' Mental Health. (arXiv:2202.07441v2 [cs.CY] UPDATED)
    COVID-19, the disease caused by the novel coronavirus (SARS-CoV-2), first emerged in Wuhan, China late in December 2019. Not long after, the virus spread worldwide and was declared a pandemic by the World Health Organization in March 2020. This caused many changes around the world and in the United States, including an educational shift towards online learning. In this paper, we seek to understand how the COVID-19 pandemic and increase in online learning impact college students' emotional wellbeing. We use several machine learning and statistical models to analyze data collected by the Faculty of Public Administration at the University of Ljubljana, Slovenia in conjunction with an international consortium of universities, other higher education institutions, and students' associations. Our results indicate that features related to students' academic life have the largest impact on their emotional wellbeing. Other important factors include students' satisfaction with their university's and government's handling of the pandemic as well as students' financial security.  ( 3 min )
    FedSSO: A Federated Server-Side Second-Order Optimization Algorithm. (arXiv:2206.09576v2 [cs.LG] UPDATED)
    In this work, we propose FedSSO, a server-side second-order optimization method for federated learning (FL). In contrast to previous works in this direction, we employ a server-side approximation for the Quasi-Newton method without requiring any training data from the clients. In this way, we not only shift the computation burden from clients to server, but also eliminate the additional communication for second-order updates between clients and server entirely. We provide theoretical guarantee for convergence of our novel method, and empirically demonstrate our fast convergence and communication savings in both convex and non-convex settings.  ( 2 min )
    A Data-Efficient Deep Learning Framework for Segmentation and Classification of Histopathology Images. (arXiv:2207.06489v3 [eess.IV] UPDATED)
    The current study of cell architecture of inflammation in histopathology images commonly performed for diagnosis and research purposes excludes a lot of information available on the biopsy slide. In autoimmune diseases, major outstanding research questions remain regarding which cell types participate in inflammation at the tissue level, and how they interact with each other. While these questions can be partially answered using traditional methods, artificial intelligence approaches for segmentation and classification provide a much more efficient method to understand the architecture of inflammation in autoimmune disease, holding a great promise for novel insights. In this paper, we empirically develop deep learning approaches that uses dermatomyositis biopsies of human tissue to detect and identify inflammatory cells. Our approach improves classification performance by 26% and segmentation performance by 5%. We also propose a novel post-processing autoencoder architecture that improves segmentation performance by an additional 3%.  ( 2 min )
    Cyclic Graph Attentive Match Encoder (CGAME): A Novel Neural Network For OD Estimation. (arXiv:2111.14625v4 [cs.LG] UPDATED)
    Origin-Destination Estimation plays an important role in the era of Intelligent Transportation. Nevertheless, as a under-determined problem, OD estimation confronts many challenges from cross-space inference to non-convex, non-linear optimization. As a powerful nonlinear approximator, deep learning is an ideal data-driven method to provide a novel perspective for OD estimation. However, viewing multi-interval traffic counts as spatial-temporal inputs and OD matrix as heterogeneous graph-structured output, the existing neural network architecture is not suitable for the cross-space inference problem thus a new deep learning architecture is needed. We propose CGAME, short for cyclic graph attentive matching encoder, including bi-directional encoder-decoder networks and a novel graph matcher in the hidden layer with double-layer attention mechanism. It realizes effective information exchange between the forward networks and backward networks and establishes coupling relations across underlying feature space. The proposed model achieves state-of-the-art compared with baselines in the designed experiments and offers a paradigm for inference tasks across representation space.  ( 2 min )
    AutoGML: Fast Automatic Model Selection for Graph Machine Learning. (arXiv:2206.09280v2 [cs.LG] UPDATED)
    Given a graph learning task, such as link prediction, on a new graph dataset, how can we automatically select the best method as well as its hyperparameters (collectively called a model)? Model selection for graph learning has been largely ad hoc. A typical approach has been to apply popular methods to new datasets, but this is often suboptimal. On the other hand, systematically comparing models on the new graph quickly becomes too costly, or even impractical. In this work, we develop the first meta-learning approach for automatic graph machine learning, called AutoGML, which utilizes the prior performances of existing methods on a wide variety of benchmark graph datasets to automatically select an effective model for the new graph, without any model training or evaluations. To capture the similarity across graphs from different domains, we introduce specialized meta-graph features that quantify the structural characteristics of a graph. Then we design a meta-graph that represents the relations among models and graphs, and develop a graph meta-learner operating on the meta-graph, which estimates the relevance of each model to different graphs. Through extensive experiments, we show that using AutoGML to select a method for the new graph significantly outperforms consistently applying popular methods as well as several existing meta-learners, while being extremely fast at test time.  ( 3 min )
    A Survey of Self-Supervised and Few-Shot Object Detection. (arXiv:2110.14711v3 [cs.CV] UPDATED)
    Labeling data is often expensive and time-consuming, especially for tasks such as object detection and instance segmentation, which require dense labeling of the image. While few-shot object detection is about training a model on novel (unseen) object classes with little data, it still requires prior training on many labeled examples of base (seen) classes. On the other hand, self-supervised methods aim at learning representations from unlabeled data which transfer well to downstream tasks such as object detection. Combining few-shot and self-supervised object detection is a promising research direction. In this survey, we review and characterize the most recent approaches on few-shot and self-supervised object detection. Then, we give our main takeaways and discuss future research directions. Project page at https://gabrielhuang.github.io/fsod-survey/  ( 2 min )
    Exploiting auto-encoders and segmentation methods for middle-level explanations of image classification systems. (arXiv:2106.05037v5 [cs.LG] UPDATED)
    A central issue addressed by the rapidly growing research area of eXplainable Artificial Intelligence (XAI) is to provide methods to give explanations for the behaviours of Machine Learning (ML) non-interpretable models after the training. Recently, it is becoming more and more evident that new directions to create better explanations should take into account what a good explanation is to a human user. This paper suggests taking advantage of developing an XAI framework that allows producing multiple explanations for the response of image a classification system in terms of potentially different middle-level input features. To this end, we propose an XAI framework able to construct explanations in terms of input features extracted by auto-encoders. We start from the hypothesis that some autoencoders, relying on standard data representation approaches, could extract more salient and understandable input properties, which we call here \textit{Middle-Level input Features} (MLFs), for a user with respect to raw low-level features. Furthermore, extracting different types of MLFs through different type of autoencoders, different types of explanations for the same ML system behaviour can be returned. We experimentally tested our method on two different image datasets and using three different types of MLFs. The results are encouraging. Although our novel approach was tested in the context of image classification, it can potentially be used on other data types to the extent that auto-encoders to extract humanly understandable representations can be applied.  ( 3 min )
    Multi-Model Federated Learning with Provable Guarantees. (arXiv:2207.04330v5 [cs.LG] UPDATED)
    Federated Learning (FL) is a variant of distributed learning where edge devices collaborate to learn a model without sharing their data with the central server or each other. We refer to the process of training multiple independent models simultaneously in a federated setting using a common pool of clients as multi-model FL. In this work, we propose two variants of the popular FedAvg algorithm for multi-model FL, with provable convergence guarantees. We further show that for the same amount of computation, multi-model FL can have better performance than training each model separately. We supplement our theoretical results with experiments in strongly convex, convex, and non-convex settings.  ( 2 min )
    CLCNet: Rethinking of Ensemble Modeling with Classification Confidence Network. (arXiv:2205.09612v4 [cs.LG] UPDATED)
    In this paper, we propose a Classification Confidence Network (CLCNet) that can determine whether the classification model classifies input samples correctly. It can take a classification result in the form of vector in any dimension, and return a confidence score as output, which represents the probability of an instance being classified correctly. We can utilize CLCNet in a simple cascade structure system consisting of several SOTA (state-of-the-art) classification models, and our experiments show that the system can achieve the following advantages: 1. The system can customize the average computation requirement (FLOPs) per image while inference. 2. Under the same computation requirement, the performance of the system can exceed any model that has identical structure with the model in the system, but different in size. In fact, this is a new type of ensemble modeling. Like general ensemble modeling, it can achieve higher performance than single classification model, yet our system requires much less computation than general ensemble modeling. We have uploaded our code to a github repository: https://github.com/yaoching0/CLCNet-Rethinking-of-Ensemble-Modeling.  ( 2 min )
    MALICE: Manipulation Attacks on Learned Image ComprEssion. (arXiv:2205.13253v2 [cs.CV] UPDATED)
    Deep learning techniques have shown promising results in image compression, with competitive bitrate and image reconstruction quality from compressed latent. However, while image compression has progressed towards a higher peak signal-to-noise ratio (PSNR) and fewer bits per pixel (bpp), their robustness to adversarial images has never received deliberation. In this work, we, for the first time, investigate the robustness of image compression systems where imperceptible perturbation of input images can precipitate a significant increase in the bitrate of their compressed latent. To characterize the robustness of state-of-the-art learned image compression, we mount white-box and black-box attacks. Our white-box attack employs fast gradient sign method on the entropy estimation of the bitstream as its bitrate approximation. We propose DCT-Net simulating JPEG compression with architectural simplicity and lightweight training as the substitute in the black-box attack and enable fast adversarial transferability. Our results on six image compression models, each with six different bitrate qualities (thirty-six models in total), show that they are surprisingly fragile, where the white-box attack achieves up to 56.326x and black-box 1.947x bpp change. To improve robustness, we propose a novel compression architecture factorAtn which incorporates attention modules and a basic factorized entropy model, resulting in a promising trade-off between the rate-distortion performance and robustness to adversarial attacks that surpasses existing learned image compressors.  ( 3 min )
    A Game-theoretic Understanding of Repeated Explanations in ML Models. (arXiv:2202.02659v2 [cs.GT] UPDATED)
    This paper formally models the strategic repeated interactions between a system, comprising of a machine learning (ML) model and associated explanation method, and an end-user who is seeking a prediction/label and its explanation for a query/input, by means of game theory. In this game, a malicious end-user must strategically decide when to stop querying and attempt to compromise the system, while the system must strategically decide how much information (in the form of noisy explanations) it should share with the end-user and when to stop sharing, all without knowing the type (honest/malicious) of the end-user. This paper formally models this trade-off using a continuous-time stochastic Signaling game framework and characterizes the Markov perfect equilibrium state within such a framework.  ( 2 min )
    Efficient Attention-free Video Shift Transformers. (arXiv:2208.11108v1 [cs.CV])
    This paper tackles the problem of efficient video recognition. In this area, video transformers have recently dominated the efficiency (top-1 accuracy vs FLOPs) spectrum. At the same time, there have been some attempts in the image domain which challenge the necessity of the self-attention operation within the transformer architecture, advocating the use of simpler approaches for token mixing. However, there are no results yet for the case of video recognition, where the self-attention operator has a significantly higher impact (compared to the case of images) on efficiency. To address this gap, in this paper, we make the following contributions: (a) we construct a highly efficient \& accurate attention-free block based on the shift operator, coined Affine-Shift block, specifically designed to approximate as closely as possible the operations in the MHSA block of a Transformer layer. Based on our Affine-Shift block, we construct our Affine-Shift Transformer and show that it already outperforms all existing shift/MLP--based architectures for ImageNet classification. (b) We extend our formulation in the video domain to construct Video Affine-Shift Transformer (VAST), the very first purely attention-free shift-based video transformer. (c) We show that VAST significantly outperforms recent state-of-the-art transformers on the most popular action recognition benchmarks for the case of models with low computational and memory footprint. Code will be made available.  ( 2 min )
    Kernel Methods for Multistage Causal Inference: Mediation Analysis and Dynamic Treatment Effects. (arXiv:2111.03950v2 [stat.ME] UPDATED)
    We propose simple estimators for mediation analysis and dynamic treatment effects over short horizons, which preserve the nonlinearity, dependence, and effect modification of identification theory. We allow treatments, mediators, and covariates to be discrete or continuous in general spaces. Across this broad variety of data settings, the estimators have closed form solutions in terms of kernel matrix operations due to our algorithmic innovation: sequential mean embedding of the mediator and covariate conditional distributions given a hypothetical treatment sequence. The simple estimators have strong guarantees. For the continuous treatment case, we prove uniform consistency with finite sample rates that match the minimax optimal rate for standard kernel ridge regression. For the discrete treatment case, we prove $n^{-1/2}$ consistency, finite sample Gaussian approximation, and semiparametric efficiency. We extend the analysis to incremental effects and counterfactual distributions, identifying and estimating new causal estimands. In nonlinear simulations with many covariates, we demonstrate state-of-the-art performance. We estimate mediated and dynamic treatment effects of the US Job Corps program for disadvantaged youth, and share a cleaned data set that may serve as a benchmark in future work.  ( 3 min )
    Locally temporal-spatial pattern learning with graph attention mechanism for EEG-based emotion recognition. (arXiv:2208.11087v1 [eess.SP])
    Technique of emotion recognition enables computers to classify human affective states into discrete categories. However, the emotion may fluctuate instead of maintaining a stable state even within a short time interval. There is also a difficulty to take the full use of the EEG spatial distribution due to its 3-D topology structure. To tackle the above issues, we proposed a locally temporal-spatial pattern learning graph attention network (LTS-GAT) in the present study. In the LTS-GAT, a divide-and-conquer scheme was used to examine local information on temporal and spatial dimensions of EEG patterns based on the graph attention mechanism. A dynamical domain discriminator was added to improve the robustness against inter-individual variations of the EEG statistics to learn robust EEG feature representations across different participants. We evaluated the LTS-GAT on two public datasets for affective computing studies under individual-dependent and independent paradigms. The effectiveness of LTS-GAT model was demonstrated when compared to other existing mainstream methods. Moreover, visualization methods were used to illustrate the relations of different brain regions and emotion recognition. Meanwhile, the weights of different time segments were also visualized to investigate emotion sparsity problems.  ( 2 min )
    TurbuGAN: An Adversarial Learning Approach to Spatially-Varying Multiframe Blind Deconvolution with Applications to Imaging Through Turbulence. (arXiv:2203.06764v2 [cs.CV] UPDATED)
    We present a self-supervised and self-calibrating multi-shot approach to imaging through atmospheric turbulence, called TurbuGAN. Our approach requires no paired training data, adapts itself to the distribution of the turbulence, leverages domain-specific data priors, and can generalize from tens to thousands of measurements. We achieve such functionality through an adversarial sensing framework adapted from CryoGAN, which uses a discriminator network to match the distributions of captured and simulated measurements. Our framework builds on CryoGAN by (1) generalizing the forward measurement model to incorporate physically accurate and computationally efficient models for light propagation through anisoplanatic turbulence, (2) enabling adaptation to slightly misspecified forward models, and (3) leveraging domain-specific prior knowledge using pretrained generative networks, when available. We validate TurbuGAN on both computationally simulated and experimentally captured images distorted with anisoplanatic turbulence.  ( 2 min )
    A reduced-order modeling framework for simulating signatures of faults in a bladed disk. (arXiv:2108.06265v2 [cs.CE] UPDATED)
    This paper reports a reduced-order modeling framework of bladed disks on a rotating shaft to simulate the vibration signature of faults like cracks in different components aiming towards simulated data-driven machine learning. We have employed lumped and one-dimensional analytical models of the subcomponents for better insight into the complex dynamic response. The framework seeks to address some of the challenges encountered in analyzing and optimizing fault detection and identification schemes for health monitoring of rotating turbomachinery, including aero-engines. We model the bladed disks and shafts by combining lumped elements and one-dimensional finite elements, leading to a coupled system. The simulation results are in good agreement with previously published data. We model the cracks in a blade analytically with their effective reduced stiffness approximation. Multiple types of faults are modeled, including cracks in the blades of single and two-stage bladed disks, Fan Blade Off (FBO), and Foreign Object Damage (FOD). We have applied aero-engine operational loading conditions to simulate realistic scenarios of online health monitoring. The proposed reduced-order simulation framework will have applications in probabilistic signal modeling, machine learning toward fault signature identification, and parameter estimation with measured vibration signals.  ( 3 min )
    Active Learning for Computationally Efficient Distribution of Binary Evolution Simulations. (arXiv:2203.16683v2 [astro-ph.SR] UPDATED)
    Binary stars undergo a variety of interactions and evolutionary phases, critical for predicting and explaining observed properties. Binary population synthesis with full stellar-structure and evolution simulations are computationally expensive requiring a large number of mass-transfer sequences. The recently developed binary population synthesis code POSYDON incorporates grids of MESA binary star simulations which are then interpolated to model large-scale populations of massive binaries. The traditional method of computing a high-density rectilinear grid of simulations is not scalable for higher-dimension grids, accounting for a range of metallicities, rotation, and eccentricity. We present a new active learning algorithm, psy-cris, which uses machine learning in the data-gathering process to adaptively and iteratively select targeted simulations to run, resulting in a custom, high-performance training set. We test psy-cris on a toy problem and find the resulting training sets require fewer simulations for accurate classification and regression than either regular or randomly sampled grids. We further apply psy-cris to the target problem of building a dynamic grid of MESA simulations, and we demonstrate that, even without fine tuning, a simulation set of only $\sim 1/4$ the size of a rectilinear grid is sufficient to achieve the same classification accuracy. We anticipate further gains when algorithmic parameters are optimized for the targeted application. We find that optimizing for classification only may lead to performance losses in regression, and vice versa. Lowering the computational cost of producing grids will enable future versions of POSYDON to cover more input parameters while preserving interpolation accuracies.  ( 3 min )
    On the Decision Boundaries of Neural Networks: A Tropical Geometry Perspective. (arXiv:2002.08838v3 [cs.LG] UPDATED)
    This work tackles the problem of characterizing and understanding the decision boundaries of neural networks with piecewise linear non-linearity activations. We use tropical geometry, a new development in the area of algebraic geometry, to characterize the decision boundaries of a simple network of the form (Affine, ReLU, Affine). Our main finding is that the decision boundaries are a subset of a tropical hypersurface, which is intimately related to a polytope formed by the convex hull of two zonotopes. The generators of these zonotopes are functions of the network parameters. This geometric characterization provides new perspectives to three tasks. (i) We propose a new tropical perspective to the lottery ticket hypothesis, where we view the effect of different initializations on the tropical geometric representation of a network's decision boundaries. (ii) Moreover, we propose new tropical based optimization reformulations that directly influence the decision boundaries of the network for the task of network pruning. (iii) At last, we discuss the reformulation of the generation of adversarial attacks in a tropical sense. We demonstrate that one can construct adversaries in a new tropical setting by perturbing a specific set of decision boundaries by perturbing a set of parameters in the network.  ( 3 min )
    Toward smart composites: small-scale, untethered prediction and control for soft sensor/actuator systems. (arXiv:2205.10940v2 [cs.RO] UPDATED)
    We present formulation and open-source tools to achieve in-material model predictive control of sensor/actuator systems using learned forward kinematics and on-device computation. Microcontroller units (MCUs) that compute the prediction and control task while colocated with the sensors and actuators enable in-material untethered behaviors. In this approach, small parameter size neural network models learn forward kinematics offline. Our open-source compiler, nn4mc, generates code to offload these predictions onto MCUs. A Newton-Raphson solver then computes the control input in real time. We first benchmark this nonlinear control approach against a PID controller on a mass-spring-damper simulation. We then study experimental results on two experimental rigs with different sensing, actuation and computational hardware: a tendon-based platform with embedded LightLace sensors and a HASEL-based platform with magnetic sensors. Experimental results indicate effective high-bandwidth tracking of reference paths (greater than or equal to 120 Hz) with a small memory footprint (less than or equal to 6.4% of flash memory). The measured path following error does not exceed 2mm in the tendon-based platform. The simulated path following error does not exceed 1mm in the HASEL-based platform. The mean power consumption of this approach in an ARM Cortex-M4f device is 45.4 mW. This control approach is also compatible with Tensorflow Lite models and equivalent on-device code. In-material intelligence enables a new class of composites that infuse autonomy into structures and systems with refined artificial proprioception.  ( 3 min )
    Joint Privacy Enhancement and Quantization in Federated Learning. (arXiv:2208.10888v1 [cs.LG])
    Federated learning (FL) is an emerging paradigm for training machine learning models using possibly private data available at edge devices. The distributed operation of FL gives rise to challenges that are not encountered in centralized machine learning, including the need to preserve the privacy of the local datasets, and the communication load due to the repeated exchange of updated models. These challenges are often tackled individually via techniques that induce some distortion on the updated models, e.g., local differential privacy (LDP) mechanisms and lossy compression. In this work we propose a method coined joint privacy enhancement and quantization (JoPEQ), which jointly implements lossy compression and privacy enhancement in FL settings. In particular, JoPEQ utilizes vector quantization based on random lattice, a universal compression technique whose byproduct distortion is statistically equivalent to additive noise. This distortion is leveraged to enhance privacy by augmenting the model updates with dedicated multivariate privacy preserving noise. We show that JoPEQ simultaneously quantizes data according to a required bit-rate while holding a desired privacy level, without notably affecting the utility of the learned model. This is shown via analytical LDP guarantees, distortion and convergence bounds derivation, and numerical studies. Finally, we empirically assert that JoPEQ demolishes common attacks known to exploit privacy leakage.  ( 2 min )
    SurvSHAP(t): Time-dependent explanations of machine learning survival models. (arXiv:2208.11080v1 [cs.LG])
    Machine and deep learning survival models demonstrate similar or even improved time-to-event prediction capabilities compared to classical statistical learning methods yet are too complex to be interpreted by humans. Several model-agnostic explanations are available to overcome this issue; however, none directly explain the survival function prediction. In this paper, we introduce SurvSHAP(t), the first time-dependent explanation that allows for interpreting survival black-box models. It is based on SHapley Additive exPlanations with solid theoretical foundations and a broad adoption among machine learning practitioners. The proposed methods aim to enhance precision diagnostics and support domain experts in making decisions. Experiments on synthetic and medical data confirm that SurvSHAP(t) can detect variables with a time-dependent effect, and its aggregation is a better determinant of the importance of variables for a prediction than SurvLIME. SurvSHAP(t) is model-agnostic and can be applied to all models with functional output. We provide an accessible implementation of time-dependent explanations in Python at this http URL .  ( 2 min )
    Adversarial Speaker Distillation for Countermeasure Model on Automatic Speaker Verification. (arXiv:2203.17031v5 [cs.SD] UPDATED)
    The countermeasure (CM) model is developed to protect ASV systems from spoof attacks and prevent resulting personal information leakage in Automatic Speaker Verification (ASV) system. Based on practicality and security considerations, the CM model is usually deployed on edge devices, which have more limited computing resources and storage space than cloud-based systems, confining the model size under a limitation. To better trade off the CM model sizes and performance, we proposed an adversarial speaker distillation method, which is an improved version of knowledge distillation method combined with generalized end-to-end (GE2E) pre-training and adversarial fine-tuning. In the evaluation phase of the ASVspoof 2021 Logical Access task, our proposed adversarial speaker distillation ResNetSE (ASD-ResNetSE) model reaches 0.2695 min t-DCF and 3.54\% EER. ASD-ResNetSE only used 22.5\% of parameters and 19.4\% of multiply and accumulate operands of ResNetSE model.  ( 2 min )
    Adaptation of MobileNetV2 for Face Detection on Ultra-Low Power Platform. (arXiv:2208.11011v1 [cs.CV])
    Designing Deep Neural Networks (DNNs) running on edge hardware remains a challenge. Standard designs have been adopted by the community to facilitate the deployment of Neural Network models. However, not much emphasis is put on adapting the network topology to fit hardware constraints. In this paper, we adapt one of the most widely used architectures for mobile hardware platforms, MobileNetV2, and study the impact of changing its topology and applying post-training quantization. We discuss the impact of the adaptations and the deployment of the model on an embedded hardware platform for face detection.  ( 2 min )
    FocusFormer: Focusing on What We Need via Architecture Sampler. (arXiv:2208.10861v1 [cs.CV])
    Vision Transformers (ViTs) have underpinned the recent breakthroughs in computer vision. However, designing the architectures of ViTs is laborious and heavily relies on expert knowledge. To automate the design process and incorporate deployment flexibility, one-shot neural architecture search decouples the supernet training and architecture specialization for diverse deployment scenarios. To cope with an enormous number of sub-networks in the supernet, existing methods treat all architectures equally important and randomly sample some of them in each update step during training. During architecture search, these methods focus on finding architectures on the Pareto frontier of performance and resource consumption, which forms a gap between training and deployment. In this paper, we devise a simple yet effective method, called FocusFormer, to bridge such a gap. To this end, we propose to learn an architecture sampler to assign higher sampling probabilities to those architectures on the Pareto frontier under different resource constraints during supernet training, making them sufficiently optimized and hence improving their performance. During specialization, we can directly use the well-trained architecture sampler to obtain accurate architectures satisfying the given resource constraint, which significantly improves the search efficiency. Extensive experiments on CIFAR-100 and ImageNet show that our FocusFormer is able to improve the performance of the searched architectures while significantly reducing the search cost. For example, on ImageNet, our FocusFormer-Ti with 1.4G FLOPs outperforms AutoFormer-Ti by 0.5% in terms of the Top-1 accuracy.  ( 3 min )
    Anomaly Attribution with Likelihood Compensation. (arXiv:2208.10679v1 [cs.LG])
    This paper addresses the task of explaining anomalous predictions of a black-box regression model. When using a black-box model, such as one to predict building energy consumption from many sensor measurements, we often have a situation where some observed samples may significantly deviate from their prediction. It may be due to a sub-optimal black-box model, or simply because those samples are outliers. In either case, one would ideally want to compute a ``responsibility score'' indicative of the extent to which an input variable is responsible for the anomalous output. In this work, we formalize this task as a statistical inverse problem: Given model deviation from the expected value, infer the responsibility score of each of the input variables. We propose a new method called likelihood compensation (LC), which is founded on the likelihood principle and computes a correction to each input variable. To the best of our knowledge, this is the first principled framework that computes a responsibility score for real valued anomalous model deviations. We apply our approach to a real-world building energy prediction task and confirm its utility based on expert feedback.  ( 2 min )
    Home Run: Finding Your Way Home by Imagining Trajectories. (arXiv:2208.10914v1 [cs.LG])
    When studying unconstrained behaviour and allowing mice to leave their cage to navigate a complex labyrinth, the mice exhibit foraging behaviour in the labyrinth searching for rewards, returning to their home cage now and then, e.g. to drink. Surprisingly, when executing such a ``home run'', the mice do not follow the exact reverse path, in fact, the entry path and home path have very little overlap. Recent work proposed a hierarchical active inference model for navigation, where the low level model makes inferences about hidden states and poses that explain sensory inputs, whereas the high level model makes inferences about moving between locations, effectively building a map of the environment. However, using this ``map'' for planning, only allows the agent to find trajectories that it previously explored, far from the observed mice's behaviour. In this paper, we explore ways of incorporating before-unvisited paths in the planning algorithm, by using the low level generative model to imagine potential, yet undiscovered paths. We demonstrate a proof of concept in a grid-world environment, showing how an agent can accurately predict a new, shorter path in the map leading to its starting point, using a generative model learnt from pixel-based observations.  ( 2 min )
    Graph Signal Reconstruction Techniques for IoT Air Pollution Monitoring Platforms. (arXiv:2201.00378v4 [eess.SP] UPDATED)
    Air pollution monitoring platforms play a very important role in preventing and mitigating the effects of pollution. Recent advances in the field of graph signal processing have made it possible to describe and analyze air pollution monitoring networks using graphs. One of the main applications is the reconstruction of the measured signal in a graph using a subset of sensors. Reconstructing the signal using information from sensor neighbors can help improve the quality of network data, examples are filling in missing data with correlated neighboring nodes, or correcting a drifting sensor with neighboring sensors that are more accurate. This paper compares the use of various types of graph signal reconstruction methods applied to real data sets of Spanish air pollution reference stations. The methods considered are Laplacian interpolation, graph signal processing low-pass based graph signal reconstruction, and kernel-based graph signal reconstruction, and are compared on actual air pollution data sets measuring O3, NO2, and PM10. The ability of the methods to reconstruct the signal of a pollutant is shown, as well as the computational cost of this reconstruction. The results indicate the superiority of methods based on kernel-based graph signal reconstruction, as well as the difficulties of the methods to scale in an air pollution monitoring network with a large number of low-cost sensors. However, we show that scalability can be overcome with simple methods, such as partitioning the network using a clustering algorithm.  ( 3 min )
    One-Hot Graph Encoder Embedding. (arXiv:2109.13098v2 [cs.LG] UPDATED)
    In this paper we propose a lightning fast graph embedding method called one-hot graph encoder embedding. It has a linear computational complexity and the capacity to process billions of edges within minutes on standard PC -- making it an ideal candidate for huge graph processing. It is applicable to either adjacency matrix or graph Laplacian, and can be viewed as a transformation of the spectral embedding. Under random graph models, the graph encoder embedding is approximately normally distributed per vertex, and asymptotically converges to its mean. We showcase three applications: vertex classification, vertex clustering, and graph bootstrap. In every case, the graph encoder embedding exhibits unrivalled computational advantages.  ( 2 min )
    Cluster Based Secure Multi-Party Computation in Federated Learning for Histopathology Images. (arXiv:2208.10919v1 [cs.CR])
    Federated learning (FL) is a decentralized method enabling hospitals to collaboratively learn a model without sharing private patient data for training. In FL, participant hospitals periodically exchange training results rather than training samples with a central server. However, having access to model parameters or gradients can expose private training data samples. To address this challenge, we adopt secure multiparty computation (SMC) to establish a privacy-preserving federated learning framework. In our proposed method, the hospitals are divided into clusters. After local training, each hospital splits its model weights among other hospitals in the same cluster such that no single hospital can retrieve other hospitals' weights on its own. Then, all hospitals sum up the received weights, sending the results to the central server. Finally, the central server aggregates the results, retrieving the average of models' weights and updating the model without having access to individual hospitals' weights. We conduct experiments on a publicly available repository, The Cancer Genome Atlas (TCGA). We compare the performance of the proposed framework with differential privacy and federated averaging as the baseline. The results reveal that compared to differential privacy, our framework can achieve higher accuracy with no privacy leakage risk at a cost of higher communication overhead.  ( 3 min )
    Evaluation of group fairness measures in student performance prediction problems. (arXiv:2208.10625v1 [cs.LG])
    Predicting students' academic performance is one of the key tasks of educational data mining (EDM). Traditionally, the high forecasting quality of such models was deemed critical. More recently, the issues of fairness and discrimination w.r.t. protected attributes, such as gender or race, have gained attention. Although there are several fairness-aware learning approaches in EDM, a comparative evaluation of these measures is still missing. In this paper, we evaluate different group fairness measures for student performance prediction problems on various educational datasets and fairness-aware learning models. Our study shows that the choice of the fairness measure is important, likewise for the choice of the grade threshold.  ( 2 min )
    QU-BraTS: MICCAI BraTS 2020 Challenge on Quantifying Uncertainty in Brain Tumor Segmentation - Analysis of Ranking Scores and Benchmarking Results. (arXiv:2112.10074v2 [eess.IV] UPDATED)
    Deep learning (DL) models have provided state-of-the-art performance in various medical imaging benchmarking challenges, including the Brain Tumor Segmentation (BraTS) challenges. However, the task of focal pathology multi-compartment segmentation (e.g., tumor and lesion sub-regions) is particularly challenging, and potential errors hinder translating DL models into clinical workflows. Quantifying the reliability of DL model predictions in the form of uncertainties could enable clinical review of the most uncertain regions, thereby building trust and paving the way toward clinical translation. Several uncertainty estimation methods have recently been introduced for DL medical image segmentation tasks. Developing scores to evaluate and compare the performance of uncertainty measures will assist the end-user in making more informed decisions. In this study, we explore and evaluate a score developed during the BraTS 2019 and BraTS 2020 task on uncertainty quantification (QU-BraTS) and designed to assess and rank uncertainty estimates for brain tumor multi-compartment segmentation. This score (1) rewards uncertainty estimates that produce high confidence in correct assertions and those that assign low confidence levels at incorrect assertions, and (2) penalizes uncertainty measures that lead to a higher percentage of under-confident correct assertions. We further benchmark the segmentation uncertainties generated by 14 independent participating teams of QU-BraTS 2020, all of which also participated in the main BraTS segmentation task. Overall, our findings confirm the importance and complementary value that uncertainty estimates provide to segmentation algorithms, highlighting the need for uncertainty quantification in medical image analyses. Finally, in favor of transparency and reproducibility, our evaluation code is made publicly available at: https://github.com/RagMeh11/QU-BraTS.
    Hybrid Far- and Near-Field Channel Estimation for THz Ultra-Massive MIMO via Fixed Point Networks. (arXiv:2205.04944v2 [eess.SP] UPDATED)
    Terahertz ultra-massive multiple-input multiple-output (THz UM-MIMO) is envisioned as one of the key enablers of 6G wireless systems. Due to the joint effect of its array aperture and small wavelength, the near-field region of THz UM-MIMO is greatly enlarged. The high-dimensional channel of such systems thus consists of a stochastic mixture of far and near fields, which renders channel estimation extremely challenging. Previous works based on uni-field assumptions cannot capture the hybrid far- and near-field features, thus suffering significant performance loss. This motivates us to consider hybrid-field channel estimation. We draw inspirations from fixed point theory to develop an efficient deep learning based channel estimator with adaptive complexity and linear convergence guarantee. Built upon classic orthogonal approximate message passing, we transform each iteration into a contractive mapping, comprising a closed-form linear estimator and a neural network based non-linear estimator. A major algorithmic innovation involves applying fixed point iteration to compute the channel estimate while modeling neural networks with arbitrary depth and adapting to the hybrid-field channel conditions. Simulation results verify our theoretical analysis and show significant performance gains over state-of-the-art approaches in the estimation accuracy and convergence rate.
    Stochastic resonance neurons in artificial neural networks. (arXiv:2205.10122v2 [cs.NE] UPDATED)
    Many modern applications of the artificial neural networks ensue large number of layers making traditional digital implementations increasingly complex. Optical neural networks offer parallel processing at high bandwidth, but have the challenge of noise accumulation. We propose here a new type of neural networks using stochastic resonances as an inherent part of the architecture and demonstrate a possibility of significant reduction of the required number of neurons for a given performance accuracy. We also show that such a neural network is more robust against the impact of noise.
    Learning Car Speed Using Inertial Sensors for Dead Reckoning Navigation. (arXiv:2205.07883v2 [cs.LG] UPDATED)
    A deep neural network (DNN) is trained to estimate the speed of a car driving in an urban area using as input a stream of measurements from a low-cost six-axis inertial measurement unit (IMU). Three hours of data was collected by driving through the city of Ashdod, Israel in a car equipped with a global navigation satellite system (GNSS) real time kinematic (RTK) positioning device and a synchronized IMU. Ground truth labels for the car speed were calculated using the position measurements obtained at the high rate of 50 Hz. A DNN architecture with long short-term memory layers is proposed to enable high-frequency speed estimation that accounts for previous inputs history and the nonlinear relation between speed, acceleration and angular velocity. A simplified aided dead reckoning localization scheme is formulated to assess the trained model which provides the speed pseudo-measurement. The trained model is shown to substantially improve the position accuracy during a 4 minutes drive without the use of GNSS position updates.
    Mixup-based Deep Metric Learning Approaches for Incomplete Supervision. (arXiv:2204.13572v2 [cs.LG] UPDATED)
    Deep learning architectures have achieved promising results in different areas (e.g., medicine, agriculture, and security). However, using those powerful techniques in many real applications becomes challenging due to the large labeled collections required during training. Several works have pursued solutions to overcome it by proposing strategies that can learn more for less, e.g., weakly and semi-supervised learning approaches. As these approaches do not usually address memorization and sensitivity to adversarial examples, this paper presents three deep metric learning approaches combined with Mixup for incomplete-supervision scenarios. We show that some state-of-the-art approaches in metric learning might not work well in such scenarios. Moreover, the proposed approaches outperform most of them in different datasets.
    Strategic Decision-Making in the Presence of Information Asymmetry: Provably Efficient RL with Algorithmic Instruments. (arXiv:2208.11040v1 [stat.ML])
    We study offline reinforcement learning under a novel model called strategic MDP, which characterizes the strategic interactions between a principal and a sequence of myopic agents with private types. Due to the bilevel structure and private types, strategic MDP involves information asymmetry between the principal and the agents. We focus on the offline RL problem, where the goal is to learn the optimal policy of the principal concerning a target population of agents based on a pre-collected dataset that consists of historical interactions. The unobserved private types confound such a dataset as they affect both the rewards and observations received by the principal. We propose a novel algorithm, Pessimistic policy Learning with Algorithmic iNstruments (PLAN), which leverages the ideas of instrumental variable regression and the pessimism principle to learn a near-optimal principal's policy in the context of general function approximation. Our algorithm is based on the critical observation that the principal's actions serve as valid instrumental variables. In particular, under a partial coverage assumption on the offline dataset, we prove that PLAN outputs a $1 / \sqrt{K}$-optimal policy with $K$ being the number of collected trajectories. We further apply our framework to some special cases of strategic MDP, including strategic regression, strategic bandit, and noncompliance in recommendation systems.
    SoK: Certified Robustness for Deep Neural Networks. (arXiv:2009.04131v7 [cs.LG] UPDATED)
    Great advances in deep neural networks (DNNs) have led to state-of-the-art performance on a wide range of tasks. However, recent studies have shown that DNNs are vulnerable to adversarial attacks, which have brought great concerns when deploying these models to safety-critical applications such as autonomous driving. Different defense approaches have been proposed against adversarial attacks, including: a) empirical defenses, which can usually be adaptively attacked again without providing robustness certification; and b) certifiably robust approaches, which consist of robustness verification providing the lower bound of robust accuracy against any attacks under certain conditions and corresponding robust training approaches. In this paper, we systematize certifiably robust approaches and related practical and theoretical implications and findings. We also provide the first comprehensive benchmark on existing robustness verification and training approaches on different datasets. In particular, we 1) provide a taxonomy for the robustness verification and training approaches, as well as summarize the methodologies for representative algorithms, 2) reveal the characteristics, strengths, limitations, and fundamental connections among these approaches, 3) discuss current research progresses, theoretical barriers, main challenges, and future directions for certifiably robust approaches for DNNs, and 4) provide an open-sourced unified platform to evaluate 20+ representative certifiably robust approaches.
    Learn Basic Skills and Reuse: Modularized Adaptive Neural Architecture Search (MANAS). (arXiv:2208.11083v1 [cs.LG])
    Human intelligence is able to first learn some basic skills for solving basic problems and then assemble such basic skills into complex skills for solving complex or new problems. For example, the basic skills ``dig hole,'' ``put tree,'' ``backfill'' and ``watering'' compose a complex skill ``plant a tree''. Besides, some basic skills can be reused for solving other problems. For example, the basic skill ``dig hole'' not only can be used for planting a tree, but also can be used for mining treasures, building a drain, or landfilling. The ability to learn basic skills and reuse them for various tasks is very important for humans because it helps to avoid learning too many skills for solving each individual task, and makes it possible to solve a compositional number of tasks by learning just a few number of basic skills, which saves a considerable amount of memory and computation in the human brain. We believe that machine intelligence should also capture the ability of learning basic skills and reusing them by composing into complex skills. In computer science language, each basic skill is a ``module'', which is a reusable network of a concrete meaning and performs a specific basic operation. The modules are assembled into a bigger ``model'' for doing a more complex task. The assembling procedure is adaptive to the input or task, i.e., for a given task, the modules should be assembled into the most suitable model for solving the task. As a result, different inputs or tasks could have different assembled models, which enables self-assembling AI. In this work, we propose Modularized Adaptive Neural Architecture Search (MANAS) to demonstrate the above idea. Experiments on different datasets show that the adaptive architecture assembled by MANAS outperforms static global architectures. Further experiments and empirical analysis provide insights to the effectiveness of MANAS.
    Evolving symbolic density functionals. (arXiv:2203.02540v5 [cs.NE] UPDATED)
    Systematic development of accurate density functionals has been a decades-long challenge for scientists. Despite the emerging application of machine learning (ML) in approximating functionals, the resulting ML functionals usually contain more than tens of thousands parameters, which makes a huge gap in the formulation with the conventional human-designed symbolic functionals. We propose a new framework, Symbolic Functional Evolutionary Search (SyFES), that automatically constructs accurate functionals in the symbolic form, which is more explainable to humans, cheaper to evaluate, and easier to integrate to existing density functional theory codes than other ML functionals. We first show that without prior knowledge, SyFES reconstructed a known functional from scratch. We then demonstrate that evolving from an existing functional $\omega$B97M-V, SyFES found a new functional, GAS22 (Google Accelerated Science 22), that performs better for the majority of molecular types in the test set of Main Group Chemistry Database (MGCDB84). Our framework opens a new direction in leveraging computing power for the systematic development of symbolic density functionals.
    Robot Active Neural Sensing and Planning in Unknown Cluttered Environments. (arXiv:2208.11079v1 [cs.RO])
    Active sensing and planning in unknown, cluttered environments is an open challenge for robots intending to provide home service, search and rescue, narrow-passage inspection, and medical assistance. Although many active sensing methods exist, they often consider open spaces, assume known settings, or mostly do not generalize to real-world scenarios. We present the active neural sensing approach that generates the kinematically feasible viewpoint sequences for the robot manipulator with an in-hand camera to gather the minimum number of observations needed to reconstruct the underlying environment. Our framework actively collects the visual RGBD observations, aggregates them into scene representation, and performs object shape inference to avoid unnecessary robot interactions with the environment. We train our approach on synthetic data with domain randomization and demonstrate its successful execution via sim-to-real transfer in reconstructing narrow, covered, real-world cabinet environments cluttered with unknown objects. The natural cabinet scenarios impose significant challenges for robot motion and scene reconstruction due to surrounding obstacles and low ambient lighting conditions. However, despite unfavorable settings, our method exhibits high performance compared to its baselines in terms of various environment reconstruction metrics, including planning speed, the number of viewpoints, and overall scene coverage.
    Data Determines Distributional Robustness in Contrastive Language Image Pre-training (CLIP). (arXiv:2205.01397v2 [cs.CV] UPDATED)
    Contrastively trained language-image models such as CLIP, ALIGN, and BASIC have demonstrated unprecedented robustness to multiple challenging natural distribution shifts. Since these language-image models differ from previous training approaches in several ways, an important question is what causes the large robustness gains. We answer this question via a systematic experimental investigation. Concretely, we study five different possible causes for the robustness gains: (i) the training set size, (ii) the training distribution, (iii) language supervision at training time, (iv) language supervision at test time, and (v) the contrastive loss function. Our experiments show that the more diverse training distribution is the main cause for the robustness gains, with the other factors contributing little to no robustness. Beyond our experimental results, we also introduce ImageNet-Captions, a version of ImageNet with original text annotations from Flickr, to enable further controlled experiments of language-image training.
    The Lasso with general Gaussian designs with applications to hypothesis testing. (arXiv:2007.13716v2 [math.ST] UPDATED)
    The Lasso is a method for high-dimensional regression, which is now commonly used when the number of covariates $p$ is of the same order or larger than the number of observations $n$. Classical asymptotic normality theory does not apply to this model due to two fundamental reasons: $(1)$ The regularized risk is non-smooth; $(2)$ The distance between the estimator $\widehat{\boldsymbol{\theta}}$ and the true parameters vector $\boldsymbol{\theta}^*$ cannot be neglected. As a consequence, standard perturbative arguments that are the traditional basis for asymptotic normality fail. On the other hand, the Lasso estimator can be precisely characterized in the regime in which both $n$ and $p$ are large and $n/p$ is of order one. This characterization was first obtained in the case of Gaussian designs with i.i.d. covariates: here we generalize it to Gaussian correlated designs with non-singular covariance structure. This is expressed in terms of a simpler ``fixed-design'' model. We establish non-asymptotic bounds on the distance between the distribution of various quantities in the two models, which hold uniformly over signals $\boldsymbol{\theta}^*$ in a suitable sparsity class and over values of the regularization parameter. As an application, we study the distribution of the debiased Lasso and show that a degrees-of-freedom correction is necessary for computing valid confidence intervals.
    VL-InterpreT: An Interactive Visualization Tool for Interpreting Vision-Language Transformers. (arXiv:2203.17247v3 [cs.CV] UPDATED)
    Breakthroughs in transformer-based models have revolutionized not only the NLP field, but also vision and multimodal systems. However, although visualization and interpretability tools have become available for NLP models, internal mechanisms of vision and multimodal transformers remain largely opaque. With the success of these transformers, it is increasingly critical to understand their inner workings, as unraveling these black-boxes will lead to more capable and trustworthy models. To contribute to this quest, we propose VL-InterpreT, which provides novel interactive visualizations for interpreting the attentions and hidden representations in multimodal transformers. VL-InterpreT is a task agnostic and integrated tool that (1) tracks a variety of statistics in attention heads throughout all layers for both vision and language components, (2) visualizes cross-modal and intra-modal attentions through easily readable heatmaps, and (3) plots the hidden representations of vision and language tokens as they pass through the transformer layers. In this paper, we demonstrate the functionalities of VL-InterpreT through the analysis of KD-VLP, an end-to-end pretraining vision-language multimodal transformer-based model, in the tasks of Visual Commonsense Reasoning (VCR) and WebQA, two visual question answering benchmarks. Furthermore, we also present a few interesting findings about multimodal transformer behaviors that were learned through our tool.
    Neural Integro-Differential Equations. (arXiv:2206.14282v3 [cs.LG] UPDATED)
    Modeling continuous dynamical systems from discretely sampled observations is a fundamental problem in data science. Often, such dynamics are the result of non-local processes that present an integral over time. As such, these systems are modeled with Integro-Differential Equations (IDEs); generalizations of differential equations that comprise both an integral and a differential component. For example, brain dynamics are not accurately modeled by differential equations since their behavior is non-Markovian, i.e. dynamics are in part dictated by history. Here, we introduce the Neural IDE (NIDE), a novel deep learning framework based on the theory of IDEs where integral operators are learned using neural networks. We test NIDE on several toy and brain activity datasets and demonstrate that NIDE outperforms other models. These tasks include time extrapolation as well as predicting dynamics from unseen initial conditions, which we test on whole-cortex activity recordings in freely behaving mice. Further, we show that NIDE can decompose dynamics into their Markovian and non-Markovian constituents via the learned integral operator, which we test on fMRI brain activity recordings of people on ketamine. Finally, the integrand of the integral operator provides a latent space that gives insight into the underlying dynamics, which we demonstrate on wide-field brain imaging recordings. Altogether, NIDE is a novel approach that enables modeling of complex non-local dynamics with neural networks.
    Demand-Side Scheduling Based on Multi-Agent Deep Actor-Critic Learning for Smart Grids. (arXiv:2005.01979v2 [cs.LG] UPDATED)
    We consider the problem of demand-side energy management, where each household is equipped with a smart meter that is able to schedule home appliances online. The goal is to minimize the overall cost under a real-time pricing scheme. While previous works have introduced centralized approaches in which the scheduling algorithm has full observability, we propose the formulation of a smart grid environment as a Markov game. Each household is a decentralized agent with partial observability, which allows scalability and privacy-preservation in a realistic setting. The grid operator produces a price signal that varies with the energy demand. We propose an extension to a multi-agent, deep actor-critic algorithm to address partial observability and the perceived non-stationarity of the environment from the agent's viewpoint. This algorithm learns a centralized critic that coordinates training of decentralized agents. Our approach thus uses centralized learning but decentralized execution. Simulation results show that our online deep reinforcement learning method can reduce both the peak-to-average ratio of total energy consumed and the cost of electricity for all households based purely on instantaneous observations and a price signal.
    Opacus: User-Friendly Differential Privacy Library in PyTorch. (arXiv:2109.12298v4 [cs.LG] UPDATED)
    We introduce Opacus, a free, open-source PyTorch library for training deep learning models with differential privacy (hosted at opacus.ai). Opacus is designed for simplicity, flexibility, and speed. It provides a simple and user-friendly API, and enables machine learning practitioners to make a training pipeline private by adding as little as two lines to their code. It supports a wide variety of layers, including multi-head attention, convolution, LSTM, GRU (and generic RNN), and embedding, right out of the box and provides the means for supporting other user-defined layers. Opacus computes batched per-sample gradients, providing higher efficiency compared to the traditional "micro batch" approach. In this paper we present Opacus, detail the principles that drove its implementation and unique features, and benchmark it against other frameworks for training models with differential privacy as well as standard PyTorch.
    A Near-Optimal Algorithm for Debiasing Trained Machine Learning Models. (arXiv:2106.12887v3 [cs.LG] UPDATED)
    We present a scalable post-processing algorithm for debiasing trained models, including deep neural networks (DNNs), which we prove to be near-optimal by bounding its excess Bayes risk. We empirically validate its advantages on standard benchmark datasets across both classical algorithms as well as modern DNN architectures and demonstrate that it outperforms previous post-processing methods while performing on par with in-processing. In addition, we show that the proposed algorithm is particularly effective for models trained at scale where post-processing is a natural and practical choice.
    LNS-Madam: Low-Precision Training in Logarithmic Number System using Multiplicative Weight Update. (arXiv:2106.13914v3 [cs.LG] UPDATED)
    Representing deep neural networks (DNNs) in low-precision is a promising approach to enable efficient acceleration and memory reduction. Previous methods that train DNNs in low-precision typically keep a copy of weights in high-precision during the weight updates. Directly training with low-precision weights leads to accuracy degradation due to complex interactions between the low-precision number systems and the learning algorithms. To address this issue, we develop a co-designed low-precision training framework, termed LNS-Madam, in which we jointly design a logarithmic number system (LNS) and a multiplicative weight update algorithm (Madam). We prove that LNS-Madam results in low quantization error during weight updates, leading to stable performance even if the precision is limited. We further propose a hardware design of LNS-Madam that resolves practical challenges in implementing an efficient datapath for LNS computations. Our implementation effectively reduces energy overhead incurred by LNS-to-integer conversion and partial sum accumulation. Experimental results show that LNS-Madam achieves comparable accuracy to full-precision counterparts with only 8 bits on popular computer vision and natural language tasks. Compared to FP32 and FP8, LNS-Madam reduces the energy consumption by over 90% and 55%, respectively.
    OCR-free Document Understanding Transformer. (arXiv:2111.15664v3 [cs.LG] UPDATED)
    Understanding document images (e.g., invoices) is a core but challenging task since it requires complex functions such as reading text and a holistic understanding of the document. Current Visual Document Understanding (VDU) methods outsource the task of reading text to off-the-shelf Optical Character Recognition (OCR) engines and focus on the understanding task with the OCR outputs. Although such OCR-based approaches have shown promising performance, they suffer from 1) high computational costs for using OCR; 2) inflexibility of OCR models on languages or types of document; 3) OCR error propagation to the subsequent process. To address these issues, in this paper, we introduce a novel OCR-free VDU model named Donut, which stands for Document understanding transformer. As the first step in OCR-free VDU research, we propose a simple architecture (i.e., Transformer) with a pre-training objective (i.e., cross-entropy loss). Donut is conceptually simple yet effective. Through extensive experiments and analyses, we show a simple OCR-free VDU model, Donut, achieves state-of-the-art performances on various VDU tasks in terms of both speed and accuracy. In addition, we offer a synthetic data generator that helps the model pre-training to be flexible in various languages and domains. The code, trained model and synthetic data are available at https://github.com/clovaai/donut.
    Cascaded Debiasing: Studying the Cumulative Effect of Multiple Fairness-Enhancing Interventions. (arXiv:2202.03734v2 [cs.LG] UPDATED)
    Understanding the cumulative effect of multiple fairness enhancing interventions at different stages of the machine learning (ML) pipeline is a critical and underexplored facet of the fairness literature. Such knowledge can be valuable to data scientists/ML practitioners in designing fair ML pipelines. This paper takes the first step in exploring this area by undertaking an extensive empirical study comprising 60 combinations of interventions, 9 fairness metrics, 2 utility metrics (Accuracy and F1 Score) across 4 benchmark datasets. We quantitatively analyze the experimental data to measure the impact of multiple interventions on fairness, utility and population groups. We found that applying multiple interventions results in better fairness and lower utility than individual interventions on aggregate. However, adding more interventions do no always result in better fairness or worse utility. The likelihood of achieving high performance (F1 Score) along with high fairness increases with larger number of interventions. On the downside, we found that fairness-enhancing interventions can negatively impact different population groups, especially the privileged group. This study highlights the need for new fairness metrics that account for the impact on different population groups apart from just the disparity between groups. Lastly, we offer a list of combinations of interventions that perform best for different fairness and utility metrics to aid the design of fair ML pipelines.
    SoK: Explainable Machine Learning for Computer Security Applications. (arXiv:2208.10605v1 [cs.CR])
    Explainable Artificial Intelligence (XAI) is a promising solution to improve the transparency of machine learning (ML) pipelines. We systematize the increasingly growing (but fragmented) microcosm of studies that develop and utilize XAI methods for defensive and offensive cybersecurity tasks. We identify 3 cybersecurity stakeholders, i.e., model users, designers, and adversaries, that utilize XAI for 5 different objectives within an ML pipeline, namely 1) XAI-enabled decision support, 2) applied XAI for security tasks, 3) model verification via XAI, 4) explanation verification & robustness, and 5) offensive use of explanations. We further classify the literature w.r.t. the targeted security domain. Our analysis of the literature indicates that many of the XAI applications are designed with little understanding of how they might be integrated into analyst workflows -- user studies for explanation evaluation are conducted in only 14% of the cases. The literature also rarely disentangles the role of the various stakeholders. Particularly, the role of the model designer is minimized within the security literature. To this end, we present an illustrative use case accentuating the role of model designers. We demonstrate cases where XAI can help in model verification and cases where it may lead to erroneous conclusions instead. The systematization and use case enable us to challenge several assumptions and present open problems that can help shape the future of XAI within cybersecurity
    SHERLock: Self-Supervised Hierarchical Event Representation Learning. (arXiv:2010.02556v2 [cs.LG] UPDATED)
    Temporal event representations are an essential aspect of learning among humans. They allow for succinct encoding of the experiences we have through a variety of sensory inputs. Also, they are believed to be arranged hierarchically, allowing for an efficient representation of complex long-horizon experiences. Additionally, these representations are acquired in a self-supervised manner. Analogously, here we propose a model that learns temporal representations from long-horizon visual demonstration data and associated textual descriptions, without explicit temporal supervision. Our method produces a hierarchy of representations that align more closely with ground-truth human-annotated events (+15.3) than state-of-the-art unsupervised baselines. Our results are comparable to heavily-supervised baselines in complex visual domains such as Chess Openings, YouCook2 and TutorialVQA datasets. Finally, we perform ablation studies illustrating the robustness of our approach. We release our code and demo visualizations in the Supplementary Material.
    Prediction of good reaction coordinates and future evolution of MD trajectories using Regularized Sparse Autoencoders: A novel deep learning approach. (arXiv:2208.10962v1 [physics.chem-ph])
    Identifying reaction coordinates(RCs) is an active area of research, given the crucial role RCs play in determining the progress of a chemical reaction. The choice of the reaction coordinate is often based on heuristic knowledge. However, an essential criterion for the choice is that the coordinate should capture both the reactant and product states unequivocally. Also, the coordinate should be the slowest one so that all the other degrees of freedom can easily equilibrate along the reaction coordinate. Also, the coordinate should be the slowest one so that all the other degrees of freedom can easily equilibrate along the reaction coordinate. We used a regularised sparse autoencoder, an energy-based model, to discover a crucial set of reaction coordinates. Along with discovering reaction coordinates, our model also predicts the evolution of a molecular dynamics(MD) trajectory. We showcased that including sparsity enforcing regularisation helps in choosing a small but important set of reaction coordinates. We used two model systems to demonstrate our approach: alanine dipeptide system and proflavine and DNA system, which exhibited intercalation of proflavine into DNA minor groove in an aqueous environment. We model MD trajectory as a multivariate time series, and our latent variable model performs the task of multi-step time series prediction. This idea is inspired by the popular sparse coding approach - to represent each input sample as a linear combination of few elements taken from a set of representative patterns.
    Link prediction with continuous-time classical and quantum walks. (arXiv:2208.11030v1 [quant-ph])
    Protein-protein interaction (PPI) networks consist of the physical and/or functional interactions between the proteins of an organism. Since the biophysical and high-throughput methods used to form PPI networks are expensive, time-consuming, and often contain inaccuracies, the resulting networks are usually incomplete. In order to infer missing interactions in these networks, we propose a novel class of link prediction methods based on continuous-time classical and quantum random walks. In the case of quantum walks, we examine the usage of both the network adjacency and Laplacian matrices for controlling the walk dynamics. We define a score function based on the corresponding transition probabilities and perform tests on four real-world PPI datasets. Our results show that continuous-time classical random walks and quantum walks using the network adjacency matrix can successfully predict missing protein-protein interactions, with performance rivalling the state of the art.
    The Computational Complexity of ReLU Network Training Parameterized by Data Dimensionality. (arXiv:2105.08675v3 [cs.LG] UPDATED)
    Understanding the computational complexity of training simple neural networks with rectified linear units (ReLUs) has recently been a subject of intensive research. Closing gaps and complementing results from the literature, we present several results on the parameterized complexity of training two-layer ReLU networks with respect to various loss functions. After a brief discussion of other parameters, we focus on analyzing the influence of the dimension $d$ of the training data on the computational complexity. We provide running time lower bounds in terms of W[1]-hardness for parameter $d$ and prove that known brute-force strategies are essentially optimal (assuming the Exponential Time Hypothesis). In comparison with previous work, our results hold for a broad(er) range of loss functions, including $\ell^p$-loss for all $p\in[0,\infty]$. In particular, we extend a known polynomial-time algorithm for constant $d$ and convex loss functions to a more general class of loss functions, matching our running time lower bounds also in these cases.
    Causal Entropy Optimization. (arXiv:2208.10981v1 [cs.LG])
    We study the problem of globally optimizing the causal effect on a target variable of an unknown causal graph in which interventions can be performed. This problem arises in many areas of science including biology, operations research and healthcare. We propose Causal Entropy Optimization (CEO), a framework that generalizes Causal Bayesian Optimization (CBO) to account for all sources of uncertainty, including the one arising from the causal graph structure. CEO incorporates the causal structure uncertainty both in the surrogate models for the causal effects and in the mechanism used to select interventions via an information-theoretic acquisition function. The resulting algorithm automatically trades-off structure learning and causal effect optimization, while naturally accounting for observation noise. For various synthetic and real-world structural causal models, CEO achieves faster convergence to the global optimum compared with CBO while also learning the graph. Furthermore, our joint approach to structure learning and causal optimization improves upon sequential, structure-learning-first approaches.
    Evaluating Out-of-Distribution Detectors Through Adversarial Generation of Outliers. (arXiv:2208.10940v1 [cs.CR])
    A reliable evaluation method is essential for building a robust out-of-distribution (OOD) detector. Current robustness evaluation protocols for OOD detectors rely on injecting perturbations to outlier data. However, the perturbations are unlikely to occur naturally or not relevant to the content of data, providing a limited assessment of robustness. In this paper, we propose Evaluation-via-Generation for OOD detectors (EvG), a new protocol for investigating the robustness of OOD detectors under more realistic modes of variation in outliers. EvG utilizes a generative model to synthesize plausible outliers, and employs MCMC sampling to find outliers misclassified as in-distribution with the highest confidence by a detector. We perform a comprehensive benchmark comparison of the performance of state-of-the-art OOD detectors using EvG, uncovering previously overlooked weaknesses.
    Categoroids: Universal Conditional Independence. (arXiv:2208.11077v1 [cs.AI])
    Conditional independence has been widely used in AI, causal inference, machine learning, and statistics. We introduce categoroids, an algebraic structure for characterizing universal properties of conditional independence. Categoroids are defined as a hybrid of two categories: one encoding a preordered lattice structure defined by objects and arrows between them; the second dual parameterization involves trigonoidal objects and morphisms defining a conditional independence structure, with bridge morphisms providing the interface between the binary and ternary structures. We illustrate categoroids using three well-known examples of axiom sets: graphoids, integer-valued multisets, and separoids. Functoroids map one categoroid to another, preserving the relationships defined by all three types of arrows in the co-domain categoroid. We describe a natural transformation across functoroids, which is natural across regular objects and trigonoidal objects, to construct universal representations of conditional independence.. We use adjunctions and monads between categoroids to abstractly characterize faithfulness of graphical and non-graphical representations of conditional independence.
    Dynamic Causal Collaborative Filtering. (arXiv:2208.11094v1 [cs.IR])
    Causal graph, as an effective and powerful tool for causal modeling, is usually assumed as a Directed Acyclic Graph (DAG). However, recommender systems usually involve feedback loops, defined as the cyclic process of recommending items, incorporating user feedback in model updates, and repeating the procedure. As a result, it is important to incorporate loops into the causal graphs to accurately model the dynamic and iterative data generation process for recommender systems. However, feedback loops are not always beneficial since over time they may encourage more and more narrowed content exposure, which if left unattended, may results in echo chambers. As a result, it is important to understand when the recommendations will lead to echo chambers and how to mitigate echo chambers without hurting the recommendation performance. In this paper, we design a causal graph with loops to describe the dynamic process of recommendation. We then take Markov process to analyze the mathematical properties of echo chamber such as the conditions that lead to echo chambers. Inspired by the theoretical analysis, we propose a Dynamic Causal Collaborative Filtering ($\partial$CCF) model, which estimates users' post-intervention preference on items based on back-door adjustment and mitigates echo chamber with counterfactual reasoning. Multiple experiments are conducted on real-world datasets and results show that our framework can mitigate echo chambers better than other state-of-the-art frameworks while achieving comparable recommendation performance with the base recommendation models.
    Deep Structural Causal Shape Models. (arXiv:2208.10950v1 [cs.CV])
    Causal reasoning provides a language to ask important interventional and counterfactual questions beyond purely statistical association. In medical imaging, for example, we may want to study the causal effect of genetic, environmental, or lifestyle factors on the normal and pathological variation of anatomical phenotypes. However, while anatomical shape models of 3D surface meshes, extracted from automated image segmentation, can be reliably constructed, there is a lack of computational tooling to enable causal reasoning about morphological variations. To tackle this problem, we propose deep structural causal shape models (CSMs), which utilise high-quality mesh generation techniques, from geometric deep learning, within the expressive framework of deep structural causal models. CSMs enable subject-specific prognoses through counterfactual mesh generation ("How would this patient's brain structure change if they were ten years older?"), which is in contrast to most current works on purely population-level statistical shape modelling. We demonstrate the capabilities of CSMs at all levels of Pearl's causal hierarchy through a number of qualitative and quantitative experiments leveraging a large dataset of 3D brain structures.
    Synthetic learner: model-free inference on treatments over time. (arXiv:1904.01490v2 [stat.ME] UPDATED)
    Understanding the effect of a particular treatment or a policy pertains to many areas of interest, ranging from political economics, marketing to healthcare. In this paper, we develop a non-parametric algorithm for detecting the effects of treatment over time in the context of Synthetic Controls. The method builds on counterfactual predictions from many algorithms without necessarily assuming that the algorithms correctly capture the model. We introduce an inferential procedure for detecting treatment effects and show that the testing procedure is asymptotically valid for stationary, beta mixing processes without imposing any restriction on the set of base algorithms under consideration. We discuss consistency guarantees for average treatment effect estimates and derive regret bounds for the proposed methodology. The class of algorithms may include Random Forest, Lasso, or any other machine-learning estimator. Numerical studies and an application illustrate the advantages of the method.
    AniWho : A Quick and Accurate Way to Classify Anime Character Faces in Images. (arXiv:2208.11012v1 [cs.CV])
    This paper aims to dive more deeply into various models available, including; InceptionV3, InceptionResNetV2, MobileNetV2, and EfficientNetB7 using transfer learning, to classify Japanese animation-style character faces. This paper has shown that EfficientNet-B7 has the highest accuracy rate with 85.08\% top-1 Accuracy, followed by MobileNetV2, having a slightly less accurate result but with the benefits of much lower inference time and fewer number of required parameters. This paper also uses a few-shot learning framework, specifically Prototypical Networks, which produces decent results that can be used as an alternative to traditional transfer learning methods.
    Feature Removal Is a Unifying Principle for Model Explanation Methods. (arXiv:2011.03623v2 [cs.LG] UPDATED)
    Researchers have proposed a wide variety of model explanation approaches, but it remains unclear how most methods are related or when one method is preferable to another. We examine the literature and find that many methods are based on a shared principle of explaining by removing - essentially, measuring the impact of removing sets of features from a model. These methods vary in several respects, so we develop a framework for removal-based explanations that characterizes each method along three dimensions: 1) how the method removes features, 2) what model behavior the method explains, and 3) how the method summarizes each feature's influence. Our framework unifies 26 existing methods, including several of the most widely used approaches (SHAP, LIME, Meaningful Perturbations, permutation tests). Exposing the fundamental similarities between these methods empowers users to reason about which tools to use, and suggests promising directions for ongoing model explainability research.
    The Value of Out-of-Distribution Data. (arXiv:2208.10967v1 [cs.LG])
    More data helps us generalize to a task. But real datasets can contain out-of-distribution (OOD) data; this can come in the form of heterogeneity such as intra-class variability but also in the form of temporal shifts or concept drifts. We demonstrate a counter-intuitive phenomenon for such problems: generalization error of the task can be a non-monotonic function of the number of OOD samples; a small number of OOD samples can improve generalization but if the number of OOD samples is beyond a threshold, then the generalization error can deteriorate. We also show that if we know which samples are OOD, then using a weighted objective between the target and OOD samples ensures that the generalization error decreases monotonically. We demonstrate and analyze this issue using linear classifiers on synthetic datasets and medium-sized neural networks on CIFAR-10.
    Graph Embeddings via Tensor Products and Approximately Orthonormal Codes. (arXiv:2208.10917v1 [cs.SI])
    We introduce a method for embedding graphs as vectors in a structure-preserving manner. In this paper, we showcase its rich representational capacity and give some theoretical properties of our method. In particular, our procedure falls under the bind-and-sum approach, and we show that our binding operation -- the tensor product -- is the most general binding operation that respects the principle of superposition. Similarly, we show that the spherical code achieves optimal compression. We then establish some precise results characterizing the performance our method as well as some experimental results showcasing how it can accurately perform various graph operations even when the number of edges is quite large. Finally, we conclude with establishing a link to adjacency matrices, showing that our method is, in some sense, a generalization of adjacency matrices with applications towards large sparse graphs.
    Convex integer optimization with Frank-Wolfe methods. (arXiv:2208.11010v1 [math.OC])
    Mixed-integer nonlinear optimization is a broad class of problems that feature combinatorial structures and nonlinearities. Typical exact methods combine a branch-and-bound scheme with relaxation and separation subroutines. We investigate the properties and advantages of error-adaptive first-order methods based on the Frank-Wolfe algorithm for this setting, requiring only a gradient oracle for the objective function and linear optimization over the feasible set. In particular, we will study the algorithmic consequences of optimizing with a branch-and-bound approach where the subproblem over the convex hull of the mixed-integer feasible set due to Frank-Wolfe linear oracles, compared to solving the subproblems over the continuous relaxation of the same set. This novel approach computes feasible solutions while working on a single representation of the polyhedral constraints, leveraging the full extent of Mixed-Integer Programming (MIP) solvers without an outer approximation scheme.
    Transfer Learning Application of Self-supervised Learning in ARPES. (arXiv:2208.10893v1 [physics.ins-det])
    Recent development in angle-resolved photoemission spectroscopy (ARPES) technique involves spatially resolving samples while maintaining the high-resolution feature of momentum space. This development easily expands the data size and its complexity for data analysis, where one of it is to label similar dispersion cuts and map them spatially. In this work, we demonstrate that the recent development in representational learning (self-supervised learning) model combined with k-means clustering can help automate that part of data analysis and save precious time, albeit with low performance. Finally, we introduce a few-shot learning (k-nearest neighbour or kNN) in representational space where we selectively choose one (k=1) image reference for each known label and subsequently label the rest of the data with respect to the nearest reference image. This last approach demonstrates the strength of the self-supervised learning to automate the image analysis in ARPES in particular and can be generalized into any science data analysis that heavily involves image data.
    Grad-Align+: Empowering Gradual Network Alignment Using Attribute Augmentation. (arXiv:2208.11025v1 [cs.SI])
    Network alignment (NA) is the task of discovering node correspondences across different networks. Although NA methods have achieved remarkable success in a myriad of scenarios, their satisfactory performance is not without prior anchor link information and/or node attributes, which may not always be available. In this paper, we propose Grad-Align+, a novel NA method using node attribute augmentation that is quite robust to the absence of such additional information. Grad-Align+ is built upon a recent state-of-the-art NA method, the so-called Grad-Align, that gradually discovers only a part of node pairs until all node pairs are found. Specifically, Grad-Align+ is composed of the following key components: 1) augmenting node attributes based on nodes' centrality measures, 2) calculating an embedding similarity matrix extracted from a graph neural network into which the augmented node attributes are fed, and 3) gradually discovering node pairs by calculating similarities between cross-network nodes with respect to the aligned cross-network neighbor-pair. Experimental results demonstrate that Grad-Align+ exhibits (a) superiority over benchmark NA methods, (b) empirical validation of our theoretical findings, and (c) the effectiveness of our attribute augmentation module.
    Application of Causal Inference to Analytical Customer Relationship Management in Banking and Insurance. (arXiv:2208.10916v1 [cs.LG])
    Of late, in order to have better acceptability among various domain, researchers have argued that machine intelligence algorithms must be able to provide explanations that humans can understand causally. This aspect, also known as causability, achieves a specific level of human-level explainability. A specific class of algorithms known as counterfactuals may be able to provide causability. In statistics, causality has been studied and applied for many years, but not in great detail in artificial intelligence (AI). In a first-of-its-kind study, we employed the principles of causal inference to provide explainability for solving the analytical customer relationship management (ACRM) problems. In the context of banking and insurance, current research on interpretability tries to address causality-related questions like why did this model make such decisions, and was the model's choice influenced by a particular factor? We propose a solution in the form of an intervention, wherein the effect of changing the distribution of features of ACRM datasets is studied on the target feature. Subsequently, a set of counterfactuals is also obtained that may be furnished to any customer who demands an explanation of the decision taken by the bank/insurance company. Except for the credit card churn prediction dataset, good quality counterfactuals were generated for the loan default, insurance fraud detection, and credit card fraud detection datasets, where changes in no more than three features are observed.
    RAB: Provable Robustness Against Backdoor Attacks. (arXiv:2003.08904v7 [cs.LG] UPDATED)
    Recent studies have shown that deep neural networks (DNNs) are vulnerable to adversarial attacks, including evasion and backdoor (poisoning) attacks. On the defense side, there have been intensive efforts on improving both empirical and provable robustness against evasion attacks; however, the provable robustness against backdoor attacks still remains largely unexplored. In this paper, we focus on certifying the machine learning model robustness against general threat models, especially backdoor attacks. We first provide a unified framework via randomized smoothing techniques and show how it can be instantiated to certify the robustness against both evasion and backdoor attacks. We then propose the first robust training process, RAB, to smooth the trained model and certify its robustness against backdoor attacks. We prove the robustness bound for machine learning models trained with RAB and prove that our robustness bound is tight. In addition, we theoretically show that it is possible to train the robust smoothed models efficiently for simple models such as K-nearest neighbor classifiers, and we propose an exact smooth-training algorithm that eliminates the need to sample from a noise distribution for such models. Empirically, we conduct comprehensive experiments for different machine learning (ML) models such as DNNs, support vector machines, and K-NN models on MNIST, CIFAR-10, and ImageNette datasets and provide the first benchmark for certified robustness against backdoor attacks. In addition, we evaluate K-NN models on a spambase tabular dataset to demonstrate the advantages of the proposed exact algorithm. Both the theoretic analysis and the comprehensive evaluation on diverse ML models and datasets shed light on further robust learning strategies against general training time attacks.
    Challenges and Complexities in Machine Learning based Credit Card Fraud Detection. (arXiv:2208.10943v1 [cs.CR])
    Credit cards play an exploding role in modern economies. Its popularity and ubiquity have created a fertile ground for fraud, assisted by the cross boarder reach and instantaneous confirmation. While transactions are growing, the fraud percentages are also on the rise as well as the true cost of a dollar fraud. Volume of transactions, uniqueness of frauds and ingenuity of the fraudster are main challenges in detecting frauds. The advent of machine learning, artificial intelligence and big data has opened up new tools in the fight against frauds. Given past transactions, a machine learning algorithm has the ability to 'learn' infinitely complex characteristics in order to identify frauds in real-time, surpassing the best human investigators. However, the developments in fraud detection algorithms has been challenging and slow due the massively unbalanced nature of fraud data, absence of benchmarks and standard evaluation metrics to identify better performing classifiers, lack of sharing and disclosure of research findings and the difficulties in getting access to confidential transaction data for research. This work investigates the properties of typical massively imbalanced fraud data sets, their availability, suitability for research use while exploring the widely varying nature of fraud distributions. Furthermore, we show how human annotation errors compound with machine classification errors. We also carry out experiments to determine the effect of PCA obfuscation (as a means of disseminating sensitive transaction data for research and machine learning) on algorithmic performance of classifiers and show that while PCA does not significantly degrade performance, care should be taken to use the appropriate principle component size (dimensions) to avoid overfitting.
    Large-Scale Traffic Congestion Prediction based on Multimodal Fusion and Representation Mapping. (arXiv:2208.11061v1 [cs.LG])
    With the progress of the urbanisation process, the urban transportation system is extremely critical to the development of cities and the quality of life of the citizens. Among them, it is one of the most important tasks to judge traffic congestion by analysing the congestion factors. Recently, various traditional and machine-learning-based models have been introduced for predicting traffic congestion. However, these models are either poorly aggregated for massive congestion factors or fail to make accurate predictions for every precise location in large-scale space. To alleviate these problems, a novel end-to-end framework based on convolutional neural networks is proposed in this paper. With learning representations, the framework proposes a novel multimodal fusion module and a novel representation mapping module to achieve traffic congestion predictions on arbitrary query locations on a large-scale map, combined with various global reference information. The proposed framework achieves significant results and efficient inference on real-world large-scale datasets.
    A Stochastic Variance Reduced Gradient using Barzilai-Borwein Techniques as Second Order Information. (arXiv:2208.11075v1 [math.OC])
    In this paper, we consider to improve the stochastic variance reduce gradient (SVRG) method via incorporating the curvature information of the objective function. We propose to reduce the variance of stochastic gradients using the computationally efficient Barzilai-Borwein (BB) method by incorporating it into the SVRG. We also incorporate a BB-step size as its variant. We prove its linear convergence theorem that works not only for the proposed method but also for the other existing variants of SVRG with second-order information. We conduct the numerical experiments on the benchmark datasets and show that the proposed method with constant step size performs better than the existing variance reduced methods for some test problems.
    LogLG: Weakly Supervised Log Anomaly Detection via Log-Event Graph Construction. (arXiv:2208.10833v1 [cs.SE])
    Fully supervised log anomaly detection methods require a lot of labeled data to achieve promising performance. Thus, how to alleviate the heavy burden of annotating massive unlabeled log data has received much attention. Recently, many semi-supervised log anomaly detection methods have been proposed to reduce the annotation costs with the help of templates parsed from labeled normal data. However, these methods usually consider each keyword independently, which disregard the correlation among keywords in log events and the contextual relationships among log sequences. In this paper, we propose a novel weakly supervised log anomaly detection framework, named LogLG, to explore the semantic connections among keywords from sequences. Specifically, we design an iterative process, where the keywords of unlabeled logs are first extracted to construct a log-event graph in each iteration. Then, we build a subgraph annotator to alter the purpose of generating pseudo labels for unlabeled log sequences into annotating corresponding log-subgraphs. To ameliorate the annotation quality, we adopt a self-supervised task to pre-train a subgraph annotator. After that, a log anomaly detection model is trained with the pseudo labels generated by the subgraph annotator. Conditioned on the classification results, we re-extract the keywords from the classified log sequences and update the log-event graph for the next iteration. Experiments on five benchmarks validate the effectiveness of LogLG for detecting anomalies on unlabeled log data, and demonstrate that LogLG, as the state-of-the-art weakly supervised method, achieves significant improvements compared to existing semi-supervised methods.
    Latent Variable Models in the Era of Industrial Big Data: Extension and Beyond. (arXiv:2208.10847v1 [eess.SY])
    A rich supply of data and innovative algorithms have made data-driven modeling a popular technique in modern industry. Among various data-driven methods, latent variable models (LVMs) and their counterparts account for a major share and play a vital role in many industrial modeling areas. LVM can be generally divided into statistical learning-based classic LVM and neural networks-based deep LVM (DLVM). We first discuss the definitions, theories and applications of classic LVMs in detail, which serves as both a comprehensive tutorial and a brief application survey on classic LVMs. Then we present a thorough introduction to current mainstream DLVMs with emphasis on their theories and model architectures, soon afterwards provide a detailed survey on industrial applications of DLVMs. The aforementioned two types of LVM have obvious advantages and disadvantages. Specifically, classic LVMs have concise principles and good interpretability, but their model capacity cannot address complicated tasks. Neural networks-based DLVMs have sufficient model capacity to achieve satisfactory performance in complex scenarios, but it comes at sacrifices in model interpretability and efficiency. Aiming at combining the virtues and mitigating the drawbacks of these two types of LVMs, as well as exploring non-neural-network manners to build deep models, we propose a novel concept called lightweight deep LVM (LDLVM). After proposing this new idea, the article first elaborates the motivation and connotation of LDLVM, then provides two novel LDLVMs, along with thorough descriptions on their principles, architectures and merits. Finally, outlooks and opportunities are discussed, including important open questions and possible research directions.
    Can you recommend content to creatives instead of final consumers? A RecSys based on user's preferred visual styles. (arXiv:2208.10902v1 [cs.CV])
    Providing meaningful recommendations in a content marketplace is challenging due to the fact that users are not the final content consumers. Instead, most users are creatives whose interests, linked to the projects they work on, change rapidly and abruptly. To address the challenging task of recommending images to content creators, we design a RecSys that learns visual styles preferences transversal to the semantics of the projects users work on. We analyze the challenges of the task compared to content-based recommendations driven by semantics, propose an evaluation setup, and explain its applications in a global image marketplace. This technical report is an extension of the paper "Learning Users' Preferred Visual Styles in an Image Marketplace", presented at ACM RecSys '22.
    Efficient Self-Supervision using Patch-based Contrastive Learning for Histopathology Image Segmentation. (arXiv:2208.10779v1 [cs.CV])
    Learning discriminative representations of unlabelled data is a challenging task. Contrastive self-supervised learning provides a framework to learn meaningful representations using learned notions of similarity measures from simple pretext tasks. In this work, we propose a simple and efficient framework for self-supervised image segmentation using contrastive learning on image patches, without using explicit pretext tasks or any further labeled fine-tuning. A fully convolutional neural network (FCNN) is trained in a self-supervised manner to discern features in the input images and obtain confidence maps which capture the network's belief about the objects belonging to the same class. Positive- and negative- patches are sampled based on the average entropy in the confidence maps for contrastive learning. Convergence is assumed when the information separation between the positive patches is small, and the positive-negative pairs is large. We evaluate this method for the task of segmenting nuclei from multiple histopathology datasets, and show comparable performance with relevant self-supervised and supervised methods. The proposed model only consists of a simple FCNN with 10.8k parameters and requires about 5 minutes to converge on the high resolution microscopy datasets, which is orders of magnitude smaller than the relevant self-supervised methods to attain similar performance.
    Gaussian Process Boosting. (arXiv:2004.02653v4 [cs.LG] UPDATED)
    We introduce a novel way to combine boosting with Gaussian process and mixed effects models. This allows for relaxing, first, the zero or linearity assumption for the prior mean function in Gaussian process and grouped random effects models in a flexible non-parametric way and, second, the independence assumption made in most boosting algorithms. The former is advantageous for prediction accuracy and for avoiding model misspecifications. The latter is important for efficient learning of the fixed effects predictor function and for obtaining probabilistic predictions. Our proposed algorithm is also a novel solution for handling high-cardinality categorical variables in tree-boosting. In addition, we present an extension that scales to large data using a Vecchia approximation for the Gaussian process model relying on novel results for covariance parameter inference. We obtain increased prediction accuracy compared to existing approaches on several simulated and real-world data sets.
    Lottery Pools: Winning More by Interpolating Tickets without Increasing Training or Inference Cost. (arXiv:2208.10842v1 [cs.LG])
    Lottery tickets (LTs) is able to discover accurate and sparse subnetworks that could be trained in isolation to match the performance of dense networks. Ensemble, in parallel, is one of the oldest time-proven tricks in machine learning to improve performance by combining the output of multiple independent models. However, the benefits of ensemble in the context of LTs will be diluted since ensemble does not directly lead to stronger sparse subnetworks, but leverages their predictions for a better decision. In this work, we first observe that directly averaging the weights of the adjacent learned subnetworks significantly boosts the performance of LTs. Encouraged by this observation, we further propose an alternative way to perform an 'ensemble' over the subnetworks identified by iterative magnitude pruning via a simple interpolating strategy. We call our method Lottery Pools. In contrast to the naive ensemble which brings no performance gains to each single subnetwork, Lottery Pools yields much stronger sparse subnetworks than the original LTs without requiring any extra training or inference cost. Across various modern architectures on CIFAR-10/100 and ImageNet, we show that our method achieves significant performance gains in both, in-distribution and out-of-distribution scenarios. Impressively, evaluated with VGG-16 and ResNet-18, the produced sparse subnetworks outperform the original LTs by up to 1.88% on CIFAR-100 and 2.36% on CIFAR-100-C; the resulting dense network surpasses the pre-trained dense-model up to 2.22% on CIFAR-100 and 2.38% on CIFAR-100-C.
    The Effect of Modeling Human Rationality Level on Learning Rewards from Multiple Feedback Types. (arXiv:2208.10687v1 [cs.LG])
    When inferring reward functions from human behavior (be it demonstrations, comparisons, physical corrections, or e-stops), it has proven useful to model the human as making noisy-rational choices, with a "rationality coefficient" capturing how much noise or entropy we expect to see in the human behavior. Many existing works have opted to fix this coefficient regardless of the type, or quality, of human feedback. However, in some settings, giving a demonstration may be much more difficult than answering a comparison query. In this case, we should expect to see more noise or suboptimality in demonstrations than in comparisons, and should interpret the feedback accordingly. In this work, we advocate that grounding the rationality coefficient in real data for each feedback type, rather than assuming a default value, has a significant positive effect on reward learning. We test this in experiments with both simulated feedback, as well a user study. We find that when learning from a single feedback type, overestimating human rationality can have dire effects on reward accuracy and regret. Further, we find that the rationality level affects the informativeness of each feedback type: surprisingly, demonstrations are not always the most informative -- when the human acts very suboptimally, comparisons actually become more informative, even when the rationality level is the same for both. Moreover, when the robot gets to decide which feedback type to ask for, it gets a large advantage from accurately modeling the rationality level of each type. Ultimately, our results emphasize the importance of paying attention to the assumed rationality level, not only when learning from a single feedback type, but especially when agents actively learn from multiple feedback types.
    Adversarial Vulnerability of Temporal Feature Networks for Object Detection. (arXiv:2208.10773v1 [cs.CV])
    Taking into account information across the temporal domain helps to improve environment perception in autonomous driving. However, it has not been studied so far whether temporally fused neural networks are vulnerable to deliberately generated perturbations, i.e. adversarial attacks, or whether temporal history is an inherent defense against them. In this work, we study whether temporal feature networks for object detection are vulnerable to universal adversarial attacks. We evaluate attacks of two types: imperceptible noise for the whole image and locally-bound adversarial patch. In both cases, perturbations are generated in a white-box manner using PGD. Our experiments confirm, that attacking even a portion of a temporal input suffices to fool the network. We visually assess generated perturbations to gain insights into the functioning of attacks. To enhance the robustness, we apply adversarial training using 5-PGD. Our experiments on KITTI and nuScenes datasets demonstrate, that a model robustified via K-PGD is able to withstand the studied attacks while keeping the mAP-based performance comparable to that of an unattacked model.
    An intelligent algorithmic trading based on a risk-return reinforcement learning algorithm. (arXiv:2208.10707v1 [cs.LG])
    This scientific paper propose a novel portfolio optimization model using an improved deep reinforcement learning algorithm. The objective function of the optimization model is the weighted sum of the expectation and value at risk(VaR) of portfolio cumulative return. The proposed algorithm is based on actor-critic architecture, in which the main task of critical network is to learn the distribution of portfolio cumulative return using quantile regression, and actor network outputs the optimal portfolio weight by maximizing the objective function mentioned above. Meanwhile, we exploit a linear transformation function to realize asset short selling. Finally, A multi-process method is used, called Ape-x, to accelerate the speed of deep reinforcement learning training. To validate our proposed approach, we conduct backtesting for two representative portfolios and observe that the proposed model in this work is superior to the benchmark strategies.
    Cardinality-Regularized Hawkes-Granger Model. (arXiv:2208.10671v1 [cs.LG])
    We propose a new sparse Granger-causal learning framework for temporal event data. We focus on a specific class of point processes called the Hawkes process. We begin by pointing out that most of the existing sparse causal learning algorithms for the Hawkes process suffer from a singularity in maximum likelihood estimation. As a result, their sparse solutions can appear only as numerical artifacts. In this paper, we propose a mathematically well-defined sparse causal learning framework based on a cardinality-regularized Hawkes process, which remedies the pathological issues of existing approaches. We leverage the proposed algorithm for the task of instance-wise causal event analysis, where sparsity plays a critical role. We validate the proposed framework with two real use-cases, one from the power grid and the other from the cloud data center management domain.
    FedMCSA: Personalized Federated Learning via Model Components Self-Attention. (arXiv:2208.10731v1 [cs.LG])
    Federated learning (FL) facilitates multiple clients to jointly train a machine learning model without sharing their private data. However, Non-IID data of clients presents a tough challenge for FL. Existing personalized FL approaches rely heavily on the default treatment of one complete model as a basic unit and ignore the significance of different layers on Non-IID data of clients. In this work, we propose a new framework, federated model components self-attention (FedMCSA), to handle Non-IID data in FL, which employs model components self-attention mechanism to granularly promote cooperation between different clients. This mechanism facilitates collaboration between similar model components while reducing interference between model components with large differences. We conduct extensive experiments to demonstrate that FedMCSA outperforms the previous methods on four benchmark datasets. Furthermore, we empirically show the effectiveness of the model components self-attention mechanism, which is complementary to existing personalized FL and can significantly improve the performance of FL.
    StyleTalker: One-shot Style-based Audio-driven Talking Head Video Generation. (arXiv:2208.10922v1 [cs.CV])
    We propose StyleTalker, a novel audio-driven talking head generation model that can synthesize a video of a talking person from a single reference image with accurately audio-synced lip shapes, realistic head poses, and eye blinks. Specifically, by leveraging a pretrained image generator and an image encoder, we estimate the latent codes of the talking head video that faithfully reflects the given audio. This is made possible with several newly devised components: 1) A contrastive lip-sync discriminator for accurate lip synchronization, 2) A conditional sequential variational autoencoder that learns the latent motion space disentangled from the lip movements, such that we can independently manipulate the motions and lip movements while preserving the identity. 3) An auto-regressive prior augmented with normalizing flow to learn a complex audio-to-motion multi-modal latent space. Equipped with these components, StyleTalker can generate talking head videos not only in a motion-controllable way when another motion source video is given but also in a completely audio-driven manner by inferring realistic motions from the input audio. Through extensive experiments and user studies, we show that our model is able to synthesize talking head videos with impressive perceptual quality which are accurately lip-synced with the input audios, largely outperforming state-of-the-art baselines.
    Regularized impurity reduction: Accurate decision trees with complexity guarantees. (arXiv:2208.10949v1 [cs.LG])
    Decision trees are popular classification models, providing high accuracy and intuitive explanations. However, as the tree size grows the model interpretability deteriorates. Traditional tree-induction algorithms, such as C4.5 and CART, rely on impurity-reduction functions that promote the discriminative power of each split. Thus, although these traditional methods are accurate in practice, there has been no theoretical guarantee that they will produce small trees. In this paper, we justify the use of a general family of impurity functions, including the popular functions of entropy and Gini-index, in scenarios where small trees are desirable, by showing that a simple enhancement can equip them with complexity guarantees. We consider a general setting, where objects to be classified are drawn from an arbitrary probability distribution, classification can be binary or multi-class, and splitting tests are associated with non-uniform costs. As a measure of tree complexity, we adopt the expected cost to classify an object drawn from the input distribution, which, in the uniform-cost case, is the expected number of tests. We propose a tree-induction algorithm that gives a logarithmic approximation guarantee on the tree complexity. This approximation factor is tight up to a constant factor under mild assumptions. The algorithm recursively selects a test that maximizes a greedy criterion defined as a weighted sum of three components. The first two components encourage the selection of tests that improve the balance and the cost-efficiency of the tree, respectively, while the third impurity-reduction component encourages the selection of more discriminative tests. As shown in our empirical evaluation, compared to the original heuristics, the enhanced algorithms strike an excellent balance between predictive accuracy and tree complexity.
    Application of federated learning techniques for arrhythmia classification using 12-lead ECG signals. (arXiv:2208.10993v1 [cs.LG])
    Background: AI-based analysis of sufficiently large, curated medical datasets has been shown to be promising for providing early detection, faster diagnosis, better decision-making, and more effective treatment. However, accessing such highly confidential and very sensitive medical data, obtained from a variety of sources, is usually highly restricted since improper use, unsafe storage, data leakage or abuse could violate a person's privacy. In this work we apply a federated learning paradigm over a heterogeneous, siloed sets of high-definition electrocardiogram arriving from 12-leads ECG sensors arrays to train AI models. We evaluated the capacity of the resulting models to achieve equivalent performance when compared to state-of-the-art models trained when the same data is collected in a central place. Methods: We propose a privacy preserving methodology for training AI models based on the federated learning paradigm over a heterogeneous, distributed, dataset. The methodology is applied to a broad range of machine learning techniques based on gradient boosting, convolutional neural network and recurrent neural networks with long short-term memory. The models were trained over a ECG dataset containing 12-leads recordings collected from 43,059 patients from six geographically separate and heterogeneous sources. Findings: The resulting set of AI models for detecting cardiovascular abnormalities achieved comparable predictive performances against models trained using a centralised learning approach. Interpretation: The approach of compute parameters contributing to the global model locally and then exchange only such parameters instead of the whole sensitive data as in ML contributes to preserve medical data privacy.
    Generating people flow from architecture of real unseen environments. (arXiv:2208.10851v1 [cs.RO])
    Mapping people dynamics is a crucial skill, because it enables robots to coexist in human-inhabited environments. However, learning a model of people dynamics is a time consuming process which requires observation of large amount of people moving in an environment. Moreover, approaches for mapping dynamics are unable to transfer the learned models across environments: each model only able to describe the dynamics of the environment it has been built in. However, the effect of architectural geometry on people movement can be used to estimate their dynamics, and recent work has looked into learning maps of dynamics from geometry. So far however, these methods have evaluated their performance only on small-size synthetic data, leaving the actual ability of these approaches to generalize to real conditions unexplored. In this work we propose a novel approach to learn people dynamics from geometry, where a model is trained and evaluated on real human trajectories in large-scale environments. We then show the ability of our method to generalize to unseen environments, which is unprecedented for maps of dynamics.
    String-based Molecule Generation via Multi-decoder VAE. (arXiv:2208.10718v1 [cs.LG])
    In this paper, we investigate the problem of string-based molecular generation via variational autoencoders (VAEs) that have served a popular generative approach for various tasks in artificial intelligence. We propose a simple, yet effective idea to improve the performance of VAE for the task. Our main idea is to maintain multiple decoders while sharing a single encoder, i.e., it is a type of ensemble techniques. Here, we first found that training each decoder independently may not be effective as the bias of the ensemble decoder increases severely under its auto-regressive inference. To maintain both small bias and variance of the ensemble model, our proposed technique is two-fold: (a) a different latent variable is sampled for each decoder (from estimated mean and variance offered by the shared encoder) to encourage diverse characteristics of decoders and (b) a collaborative loss is used during training to control the aggregated quality of decoders using different latent variables. In our experiments, the proposed VAE model particularly performs well for generating a sample from out-of-domain distribution.
    Evaluating Machine Unlearning via Epistemic Uncertainty. (arXiv:2208.10836v1 [cs.LG])
    There has been a growing interest in Machine Unlearning recently, primarily due to legal requirements such as the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act. Thus, multiple approaches were presented to remove the influence of specific target data points from a trained model. However, when evaluating the success of unlearning, current approaches either use adversarial attacks or compare their results to the optimal solution, which usually incorporates retraining from scratch. We argue that both ways are insufficient in practice. In this work, we present an evaluation metric for Machine Unlearning algorithms based on epistemic uncertainty. This is the first definition of a general evaluation metric for Machine Unlearning to our best knowledge.
    A Provably Efficient Model-Free Posterior Sampling Method for Episodic Reinforcement Learning. (arXiv:2208.10904v1 [cs.LG])
    Thompson Sampling is one of the most effective methods for contextual bandits and has been generalized to posterior sampling for certain MDP settings. However, existing posterior sampling methods for reinforcement learning are limited by being model-based or lack worst-case theoretical guarantees beyond linear MDPs. This paper proposes a new model-free formulation of posterior sampling that applies to more general episodic reinforcement learning problems with theoretical guarantees. We introduce novel proof techniques to show that under suitable conditions, the worst-case regret of our posterior sampling method matches the best known results of optimization based methods. In the linear MDP setting with dimension, the regret of our algorithm scales linearly with the dimension as compared to a quadratic dependence of the existing posterior sampling-based exploration algorithms.
    Naive Penalized Spline Estimators of Derivatives Achieve Optimal Rates of Convergence. (arXiv:2208.10664v1 [math.ST])
    This paper studies the asymptotic behavior of penalized spline estimates of derivatives. In particular, we show that simply differentiating the penalized spline estimator of the mean regression function itself to estimate the corresponding derivative achieves the optimal L2 rate of convergence.
    Deepfake: Definitions, Performance Metrics and Standards, Datasets and Benchmarks, and a Meta-Review. (arXiv:2208.10913v1 [cs.CV])
    Recent advancements in AI, especially deep learning, have contributed to a significant increase in the creation of new realistic-looking synthetic media (video, image, and audio) and manipulation of existing media, which has led to the creation of the new term ``deepfake''. Based on both the research literature and resources in English and in Chinese, this paper gives a comprehensive overview of deepfake, covering multiple important aspects of this emerging concept, including 1) different definitions, 2) commonly used performance metrics and standards, and 3) deepfake-related datasets, challenges, competitions and benchmarks. In addition, the paper also reports a meta-review of 12 selected deepfake-related survey papers published in 2020 and 2021, focusing not only on the mentioned aspects, but also on the analysis of key challenges and recommendations. We believe that this paper is the most comprehensive review of deepfake in terms of aspects covered, and the first one covering both the English and Chinese literature and sources.
    A Review of Federated Learning in Energy Systems. (arXiv:2208.10941v1 [cs.CR])
    With increasing concerns for data privacy and ownership, recent years have witnessed a paradigm shift in machine learning (ML). An emerging paradigm, federated learning (FL), has gained great attention and has become a novel design for machine learning implementations. FL enables the ML model training at data silos under the coordination of a central server, eliminating communication overhead and without sharing raw data. In this paper, we conduct a review of the FL paradigm and, in particular, compare the types, the network structures, and the global model aggregation methods. Then, we conducted a comprehensive review of FL applications in the energy domain (refer to the smart grid in this paper). We provide a thematic classification of FL to address a variety of energy-related problems, including demand response, identification, prediction, and federated optimizations. We describe the taxonomy in detail and conclude with a discussion of various aspects, including challenges, opportunities, and limitations in its energy informatics applications, such as energy system modeling and design, privacy, and evolution.
    Doc-GCN: Heterogeneous Graph Convolutional Networks for Document Layout Analysis. (arXiv:2208.10970v1 [cs.CV])
    Recognizing the layout of unstructured digital documents is crucial when parsing the documents into the structured, machine-readable format for downstream applications. Recent studies in Document Layout Analysis usually rely on computer vision models to understand documents while ignoring other information, such as context information or relation of document components, which are vital to capture. Our Doc-GCN presents an effective way to harmonize and integrate heterogeneous aspects for Document Layout Analysis. We first construct graphs to explicitly describe four main aspects, including syntactic, semantic, density, and appearance/visual information. Then, we apply graph convolutional networks for representing each aspect of information and use pooling to integrate them. Finally, we aggregate each aspect and feed them into 2-layer MLPs for document layout component classification. Our Doc-GCN achieves new state-of-the-art results in three widely used DLA datasets.
    Decentralized Collaborative Learning with Probabilistic Data Protection. (arXiv:2208.10674v1 [cs.LG])
    We discuss future directions of Blockchain as a collaborative value co-creation platform, in which network participants can gain extra insights that cannot be accessed when disconnected from the others. As such, we propose a decentralized machine learning framework that is carefully designed to respect the values of democracy, diversity, and privacy. Specifically, we propose a federated multi-task learning framework that integrates a privacy-preserving dynamic consensus algorithm. We show that a specific network topology called the expander graph dramatically improves the scalability of global consensus building. We conclude the paper by making some remarks on open problems.
    Probabilistic Safe Online Learning with Control Barrier Functions. (arXiv:2208.10733v1 [eess.SY])
    Learning-based control schemes have recently shown great efficacy performing complex tasks. However, in order to deploy them in real systems, it is of vital importance to guarantee that the system will remain safe during online training and execution. We therefore need safe online learning frameworks able to autonomously reason about whether the current information at their disposal is enough to ensure safety or, in contrast, new measurements are required. In this paper, we present a framework consisting of two parts: first, an out-of-distribution detection mechanism actively collecting measurements when needed to guarantee that at least one safety backup direction is always available for use; and second, a Gaussian Process-based probabilistic safety-critical controller that ensures the system stays safe at all times with high probability. Our method exploits model knowledge through the use of Control Barrier Functions, and collects measurements from the stream of online data in an event-triggered fashion to guarantee recursive feasibility of the learned safety-critical controller. This, in turn, allows us to provide formal results of forward invariance of a safe set with high probability, even in a priori unexplored regions. Finally, we validate the proposed framework in numerical simulations of an adaptive cruise control system.
    Scalable Hybrid Classification-Regression Solution for High-Frequency Nonintrusive Load Monitoring. (arXiv:2208.10638v1 [cs.LG])
    Residential buildings with the ability to monitor and control their net-load (sum of load and generation) can provide valuable flexibility to power grid operators. We present a novel multiclass nonintrusive load monitoring (NILM) approach that enables effective net-load monitoring capabilities at high-frequency with minimal additional equipment and cost. The proposed machine learning based solution provides accurate multiclass state predictions while operating at a faster timescale (able to provide a prediction for each 60-Hz ac cycle used in US power grid) without relying on event-detection techniques. We also introduce an innovative hybrid classification-regression method that allows for the prediction of not only load on/off states via classification but also individual load operating power levels via regression. A test bed with eight residential appliances is used for validating the NILM approach. Results show that the overall method has high accuracy and, good scaling and generalization properties. Furthermore, the method is shown to have sufficient response time (within 160ms, corresponding to 10 ac cycles) to support building grid-interactive control at fast timescales relevant to the provision of grid frequency support services.
    CAPER: Coarsen, Align, Project, Refine - A General Multilevel Framework for Network Alignment. (arXiv:2208.10682v1 [cs.SI])
    Network alignment, or the task of finding corresponding nodes in different networks, is an important problem formulation in many application domains. We propose CAPER, a multilevel alignment framework that Coarsens the input graphs, Aligns the coarsened graphs, Projects the alignment solution to finer levels and Refines the alignment solution. We show that CAPER can improve upon many different existing network alignment algorithms by enforcing alignment consistency across multiple graph resolutions: nodes matched at finer levels should also be matched at coarser levels. CAPER also accelerates the use of slower network alignment methods, at the modest cost of linear-time coarsening and refinement steps, by allowing them to be run on smaller coarsened versions of the input graphs. Experiments show that CAPER can improve upon diverse network alignment methods by an average of 33% in accuracy and/or an order of magnitude faster in runtime.
    Bag of Tricks for Out-of-Distribution Generalization. (arXiv:2208.10722v1 [cs.CV])
    Recently, out-of-distribution (OOD) generalization has attracted attention to the robustness and generalization ability of deep learning based models, and accordingly, many strategies have been made to address different aspects related to this issue. However, most existing algorithms for OOD generalization are complicated and specifically designed for certain dataset. To alleviate this problem, nicochallenge-2022 provides NICO++, a large-scale dataset with diverse context information. In this paper, based on systematic analysis of different schemes on NICO++ dataset, we propose a simple but effective learning framework via coupling bag of tricks, including multi-objective framework design, data augmentations, training and inference strategies. Our algorithm is memory-efficient and easily-equipped, without complicated modules and does not require for large pre-trained models. It achieves an excellent performance with Top-1 accuracy of 88.16% on public test set and 75.65% on private test set, and ranks 1st in domain generalization task of nicochallenge-2022.
    What deep reinforcement learning tells us about human motor learning and vice-versa. (arXiv:2208.10892v1 [q-bio.NC])
    Machine learning and specifically reinforcement learning (RL) has been extremely successful in helping us to understand neural decision making processes. However, RL's role in understanding other neural processes especially motor learning is much less well explored. To explore this connection, we investigated how recent deep RL methods correspond to the dominant motor learning framework in neuroscience, error-based learning. Error-based learning can be probed using a mirror reversal adaptation paradigm, where it produces distinctive qualitative predictions that are observed in humans. We therefore tested three major families of modern deep RL algorithm on a mirror reversal perturbation. Surprisingly, all of the algorithms failed to mimic human behaviour and indeed displayed qualitatively different behaviour from that predicted by error-based learning. To fill this gap, we introduce a novel deep RL algorithm: model-based deterministic policy gradients (MB-DPG). MB-DPG draws inspiration from error-based learning by explicitly relying on the observed outcome of actions. We show MB-DPG captures (human) error-based learning under mirror-reversal and rotational perturbation. Next, we demonstrate error-based learning in the form of MB-DPG learns faster than canonical model-free algorithms on complex arm-based reaching tasks, while being more robust to (forward) model misspecification than model-based RL. These findings highlight the gap between current deep RL methods and human motor adaptation and offer a route to closing this gap, facilitating future beneficial interaction between between the two fields.
    DualVoice: Speech Interaction that Discriminates between Normal and Whispered Voice Input. (arXiv:2208.10499v1 [cs.HC])
    Interactions based on automatic speech recognition (ASR) have become widely used, with speech input being increasingly utilized to create documents. However, as there is no easy way to distinguish between commands being issued and text required to be input in speech, misrecognitions are difficult to identify and correct, meaning that documents need to be manually edited and corrected. The input of symbols and commands is also challenging because these may be misrecognized as text letters. To address these problems, this study proposes a speech interaction method called DualVoice, by which commands can be input in a whispered voice and letters in a normal voice. The proposed method does not require any specialized hardware other than a regular microphone, enabling a complete hands-free interaction. The method can be used in a wide range of situations where speech recognition is already available, ranging from text input to mobile/wearable computing. Two neural networks were designed in this study, one for discriminating normal speech from whispered speech, and the second for recognizing whisper speech. A prototype of a text input system was then developed to show how normal and whispered voice can be used in speech text input. Other potential applications using DualVoice are also discussed.
    Error Correction in ASR using Sequence-to-Sequence Models. (arXiv:2202.01157v2 [cs.CL] UPDATED)
    Post-editing in Automatic Speech Recognition (ASR) entails automatically correcting common and systematic errors produced by the ASR system. The outputs of an ASR system are largely prone to phonetic and spelling errors. In this paper, we propose to use a powerful pre-trained sequence-to-sequence model, BART, further adaptively trained to serve as a denoising model, to correct errors of such types. The adaptive training is performed on an augmented dataset obtained by synthetically inducing errors as well as by incorporating actual errors from an existing ASR system. We also propose a simple approach to rescore the outputs using word level alignments. Experimental results on accented speech data demonstrate that our strategy effectively rectifies a significant number of ASR errors and produces improved WER results when compared against a competitive baseline. We also highlight a negative result obtained on the related grammatical error correction task in Hindi language showing the limitation in capturing wider context by our proposed model.
    Fall Detection from Audios with Audio Transformers. (arXiv:2208.10659v1 [cs.SD])
    Fall detection for the elderly is a well-researched problem with several proposed solutions, including wearable and non-wearable techniques. While the existing techniques have excellent detection rates, their adoption by the target population is lacking due to the need for wearing devices and user privacy concerns. Our paper provides a novel, non-wearable, non-intrusive, and scalable solution for fall detection, deployed on an autonomous mobile robot equipped with a microphone. The proposed method uses ambient sound input recorded in people's homes. We specifically target the bathroom environment as it is highly prone to falls and where existing techniques cannot be deployed without jeopardizing user privacy. The present work develops a solution based on a Transformer architecture that takes noisy sound input from bathrooms and classifies it into fall/no-fall class with an accuracy of 0.8673. Further, the proposed approach is extendable to other indoor environments, besides bathrooms and is suitable for deploying in elderly homes, hospitals, and rehabilitation facilities without requiring the user to wear any device or be constantly "watched" by the sensors.
    GANs and Closures: Micro-Macro Consistency in Multiscale Modeling. (arXiv:2208.10715v1 [cs.LG])
    Sampling the phase space of molecular systems -- and, more generally, of complex systems effectively modeled by stochastic differential equations -- is a crucial modeling step in many fields, from protein folding to materials discovery. These problems are often multiscale in nature: they can be described in terms of low-dimensional effective free energy surfaces parametrized by a small number of "slow" reaction coordinates; the remaining "fast" degrees of freedom populate an equilibrium measure on the reaction coordinate values. Sampling procedures for such problems are used to estimate effective free energy differences as well as ensemble averages with respect to the conditional equilibrium distributions; these latter averages lead to closures for effective reduced dynamic models. Over the years, enhanced sampling techniques coupled with molecular simulation have been developed. An intriguing analogy arises with the field of Machine Learning (ML), where Generative Adversarial Networks can produce high dimensional samples from low dimensional probability distributions. This sample generation returns plausible high dimensional space realizations of a model state, from information about its low-dimensional representation. In this work, we present an approach that couples physics-based simulations and biasing methods for sampling conditional distributions with ML-based conditional generative adversarial networks for the same task. The "coarse descriptors" on which we condition the fine scale realizations can either be known a priori, or learned through nonlinear dimensionality reduction. We suggest that this may bring out the best features of both approaches: we demonstrate that a framework that couples cGANs with physics-based enhanced sampling techniques can improve multiscale SDE dynamical systems sampling, and even shows promise for systems of increasing complexity.
    Exponential concentration and untrainability in quantum kernel methods. (arXiv:2208.11060v1 [quant-ph])
    Kernel methods in Quantum Machine Learning (QML) have recently gained significant attention as a potential candidate for achieving a quantum advantage in data analysis. Among other attractive properties, when training a kernel-based model one is guaranteed to find the optimal model's parameters due to the convexity of the training landscape. However, this is based on the assumption that the quantum kernel can be efficiently obtained from a quantum hardware. In this work we study the trainability of quantum kernels from the perspective of the resources needed to accurately estimate kernel values. We show that, under certain conditions, values of quantum kernels over different input data can be exponentially concentrated (in the number of qubits) towards some fixed value, leading to an exponential scaling of the number of measurements required for successful training. We identify four sources that can lead to concentration including: the expressibility of data embedding, global measurements, entanglement and noise. For each source, an associated concentration bound of quantum kernels is analytically derived. Lastly, we show that when dealing with classical data, training a parametrized data embedding with a kernel alignment method is also susceptible to exponential concentration. Our results are verified through numerical simulations for several QML tasks. Altogether, we provide guidelines indicating that certain features should be avoided to ensure the efficient evaluation and the trainability of quantum kernel methods.
    Asynchronous Execution of Heterogeneous Tasks in AI-coupled HPC Workflows. (arXiv:2208.11069v1 [cs.DC])
    Heterogeneous scientific workflows consist of numerous types of tasks and dependencies between them. Middleware capable of scheduling and submitting different task types across heterogeneous platforms must permit asynchronous execution of tasks for improved resource utilization, task throughput, and reduced makespan. In this paper we present an analysis of an important class of heterogeneous workflows, viz., AI-driven HPC workflows, to investigate asynchronous task execution requirements and properties. We model the degree of asynchronicity permitted for arbitrary workflows, and propose key metrics that can be used to determine qualitative benefits when employing asynchronous execution. Our experiments represent important scientific drivers, are performed at scale on Summit, and performance enhancements due to asynchronous execution are consistent with our model.
    Predicting microsatellite instability and key biomarkers in colorectal cancer from H&E-stained images: Achieving SOTA with Less Data using Swin Transformer. (arXiv:2208.10495v1 [q-bio.QM])
    Artificial intelligence (AI) models have been developed for predicting clinically relevant biomarkers, including microsatellite instability (MSI), for colorectal cancers (CRC). However, the current deep-learning networks are data-hungry and require large training datasets, which are often lacking in the medical domain. In this study, based on the latest Hierarchical Vision Transformer using Shifted Windows (Swin-T), we developed an efficient workflow for biomarkers in CRC (MSI, hypermutation, chromosomal instability, CpG island methylator phenotype, BRAF, and TP53 mutation) that only required relatively small datasets, but achieved the state-of-the-art (SOTA) predictive performance. Our Swin-T workflow not only substantially outperformed published models in an intra-study cross-validation experiment using TCGA-CRC-DX dataset (N = 462), but also showed excellent generalizability in cross-study external validation and delivered a SOTA AUROC of 0.90 for MSI using the MCO dataset for training (N = 1065) and the same TCGA-CRC-DX for testing. Similar performance (AUROC=0.91) was achieved by Echle and colleagues using 8000 training samples (ResNet18) on the same testing dataset. Swin-T was extremely efficient using small training datasets and exhibits robust predictive performance with only 200-500 training samples. These data indicate that Swin-T may be 5-10 times more efficient than the current state-of-the-art algorithms for MSI based on ResNet18 and ShuffleNet. Furthermore, the Swin-T models showed promise as pre-screening tests for MSI status and BRAF mutation status, which could exclude and reduce the samples before the subsequent standard testing in a cascading diagnostic workflow to allow turnaround time reduction and cost saving.
    Improving Sample Efficiency in Evolutionary RL Using Off-Policy Ranking. (arXiv:2208.10583v1 [cs.LG])
    Evolution Strategy (ES) is a powerful black-box optimization technique based on the idea of natural evolution. In each of its iterations, a key step entails ranking candidate solutions based on some fitness score. For an ES method in Reinforcement Learning (RL), this ranking step requires evaluating multiple policies. This is presently done via on-policy approaches: each policy's score is estimated by interacting several times with the environment using that policy. This leads to a lot of wasteful interactions since, once the ranking is done, only the data associated with the top-ranked policies is used for subsequent learning. To improve sample efficiency, we propose a novel off-policy alternative for ranking, based on a local approximation for the fitness function. We demonstrate our idea in the context of a state-of-the-art ES method called the Augmented Random Search (ARS). Simulations in MuJoCo tasks show that, compared to the original ARS, our off-policy variant has similar running times for reaching reward thresholds but needs only around 70% as much data. It also outperforms the recent Trust Region ES. We believe our ideas should be extendable to other ES methods as well.
    META-CODE: Community Detection via Exploratory Learning in Topologically Unknown Networks. (arXiv:2208.11015v1 [cs.SI])
    The discovery of community structures in social networks has gained considerable attention as a fundamental problem for various network analysis tasks. However, due to privacy concerns or access restrictions, the network structure is often unknown, thereby rendering established community detection approaches ineffective without costly data acquisition. To tackle this challenge, we present META-CODE, a novel end-to-end solution for detecting overlapping communities in networks with unknown topology via exploratory learning aided by easy-to-collect node metadata. Specifically, META-CODE consists of three steps: 1) initial network inference, 2) node-level community-affiliation embedding based on graph neural networks (GNNs) trained by our new reconstruction loss, and 3) network exploration via community-affiliation-based node queries, where Steps 2 and 3 are performed iteratively. Experimental results demonstrate that META-CODE exhibits (a) superiority over benchmark methods for overlapping community detection, (b) the effectiveness of our training model, and (c) fast network exploration.
    Global Concept-Based Interpretability for Graph Neural Networks via Neuron Analysis. (arXiv:2208.10609v1 [cs.LG])
    Graph neural networks (GNNs) are highly effective on a variety of graph-related tasks; however, they lack interpretability and transparency. Current explainability approaches are typically local and treat GNNs as black-boxes. They do not look inside the model, inhibiting human trust in the model and explanations. Motivated by the ability of neurons to detect high-level semantic concepts in vision models, we perform a novel analysis on the behaviour of individual GNN neurons to answer questions about GNN interpretability, and propose new metrics for evaluating the interpretability of GNN neurons. We propose a novel approach for producing global explanations for GNNs using neuron-level concepts to enable practitioners to have a high-level view of the model. Specifically, (i) to the best of our knowledge, this is the first work which shows that GNN neurons act as concept detectors and have strong alignment with concepts formulated as logical compositions of node degree and neighbourhood properties; (ii) we quantitatively assess the importance of detected concepts, and identify a trade-off between training duration and neuron-level interpretability; (iii) we demonstrate that our global explainability approach has advantages over the current state-of-the-art -- we can disentangle the explanation into individual interpretable concepts backed by logical descriptions, which reduces potential for bias and improves user-friendliness.
    Laplacian Autoencoders for Learning Stochastic Representations. (arXiv:2206.15078v3 [cs.LG] UPDATED)
    Established methods for unsupervised representation learning such as variational autoencoders produce none or poorly calibrated uncertainty estimates making it difficult to evaluate if learned representations are stable and reliable. In this work, we present a Bayesian autoencoder for unsupervised representation learning, which is trained using a novel variational lower-bound of the autoencoder evidence. This is maximized using Monte Carlo EM with a variational distribution that takes the shape of a Laplace approximation. We develop a new Hessian approximation that scales linearly with data size allowing us to model high-dimensional data. Empirically, we show that our Laplacian autoencoder estimates well-calibrated uncertainties in both latent and output space. We demonstrate that this results in improved performance across a multitude of downstream tasks.
    Building Robust Machine Learning Models for Small Chemical Science Data: The Case of Shear Viscosity. (arXiv:2208.10784v1 [physics.chem-ph])
    Shear viscosity, though being a fundamental property of all liquids, is computationally expensive to estimate from equilibrium molecular dynamics simulations. Recently, Machine Learning (ML) methods have been used to augment molecular simulations in many contexts, thus showing promise to estimate viscosity too in a relatively inexpensive manner. However, ML methods face significant challenges like overfitting when the size of the data set is small, as is the case with viscosity. In this work, we train several ML models to predict the shear viscosity of a Lennard-Jones (LJ) fluid, with particular emphasis on addressing issues arising from a small data set. Specifically, the issues related to model selection, performance estimation and uncertainty quantification were investigated. First, we show that the widely used performance estimation procedure of using a single unseen data set shows a wide variability on small data sets. In this context, the common practice of using Cross validation (CV) to select the hyperparameters (model selection) can be adapted to estimate the generalization error (performance estimation) as well. We compare two simple CV procedures for their ability to do both model selection and performance estimation, and find that k-fold CV based procedure shows a lower variance of error estimates. We discuss the role of performance metrics in training and evaluation. Finally, Gaussian Process Regression (GPR) and ensemble methods were used to estimate the uncertainty on individual predictions. The uncertainty estimates from GPR were also used to construct an applicability domain using which the ML models provided more reliable predictions on another small data set generated in this work. Overall, the procedures prescribed in this work, together, lead to robust ML models for small data sets.
    Multi-Modal Representation Learning with SAT for Commodity Verification. (arXiv:2208.11064v1 [cs.LG])
    In this paper, we propose a method to identify identical commodities. In e-commerce scenarios, commodities are usually described by both images and text. By definition, identical commodities are those that have identical key attributes and are cognitively identical to consumers. There are two main challenges: 1) The extraction and fusion of multi-modal representation. 2) The ability to verify whether two commodities are identical by comparing the distance between representations with a threshold. To address the above problems, we propose an end-to-end identical commodity verification method based on self-adaptive thresholds. We use a dual-stream network to extract commodity embeddings and threshold embeddings separately and then concatenate them to obtain commodity representation. Our method is able to obtain different thresholds according to different commodities while maintaining the indexability of the entire commodity representation. We experimentally validate the effectiveness of our multimodal feature fusion and the advantages of self-adaptive thresholds. Besides, our method achieves an F1 score of 0.8936 and takes the 3rd place on the leaderboard for the second task of the CCKS-2022 Knowledge Graph Evaluation for Digital Commerce Competition. Code and pretrained models are available at https://github.com/hanchenchen/CCKS2022-track2-solution.
    Gated recurrent units and temporal convolutional network for multilabel classification. (arXiv:2110.04414v3 [cs.LG] UPDATED)
    Multilabel learning tackles the problem of associating a sample with multiple class labels. This work proposes a new ensemble method for managing multilabel classification: the core of the proposed approach combines a set of gated recurrent units and temporal convolutional neural networks trained with variants of the Adam optimization approach. Multiple Adam variants, including novel one proposed here, are compared and tested; these variants are based on the difference between present and past gradients, with step size adjusted for each parameter. The proposed neural network approach is also combined with Incorporating Multiple Clustering Centers (IMCC), which further boosts classification performance. Multiple experiments on nine data sets representing a wide variety of multilabel tasks demonstrate the robustness of our best ensemble, which is shown to outperform the state-of-the-art. The MATLAB code for generating the best ensembles in the experimental section will be available at https://github.com/LorisNanni.
    LEAPER: Modeling Cloud FPGA-based Systems via Transfer Learning. (arXiv:2208.10606v1 [cs.AR])
    Machine-learning-based models have recently gained traction as a way to overcome the slow downstream implementation process of FPGAs by building models that provide fast and accurate performance predictions. However, these models suffer from two main limitations: (1) training requires large amounts of data (features extracted from FPGA synthesis and implementation reports), which is cost-inefficient because of the time-consuming FPGA design cycle; (2) a model trained for a specific environment cannot predict for a new, unknown environment. In a cloud system, where getting access to platforms is typically costly, data collection for ML models can significantly increase the total cost-ownership (TCO) of a system. To overcome these limitations, we propose LEAPER, a transfer learning-based approach for FPGA-based systems that adapts an existing ML-based model to a new, unknown environment to provide fast and accurate performance and resource utilization predictions. Experimental results show that our approach delivers, on average, 85% accuracy when we use our transferred model for prediction in a cloud environment with 5-shot learning and reduces design-space exploration time by 10x, from days to only a few hours.
    Interaction Modeling with Multiplex Attention. (arXiv:2208.10660v1 [cs.LG])
    Modeling multi-agent systems requires understanding how agents interact. Such systems are often difficult to model because they can involve a variety of types of interactions that layer together to drive rich social behavioral dynamics. Here we introduce a method for accurately modeling multi-agent systems. We present Interaction Modeling with Multiplex Attention (IMMA), a forward prediction model that uses a multiplex latent graph to represent multiple independent types of interactions and attention to account for relations of different strengths. We also introduce Progressive Layer Training, a training strategy for this architecture. We show that our approach outperforms state-of-the-art models in trajectory forecasting and relation inference, spanning three multi-agent scenarios: social navigation, cooperative task achievement, and team sports. We further demonstrate that our approach can improve zero-shot generalization and allows us to probe how different interactions impact agent behavior.
    Survival Mixture Density Networks. (arXiv:2208.10759v1 [cs.LG])
    Survival analysis, the art of time-to-event modeling, plays an important role in clinical treatment decisions. Recently, continuous time models built from neural ODEs have been proposed for survival analysis. However, the training of neural ODEs is slow due to the high computational complexity of neural ODE solvers. Here, we propose an efficient alternative for flexible continuous time models, called Survival Mixture Density Networks (Survival MDNs). Survival MDN applies an invertible positive function to the output of Mixture Density Networks (MDNs). While MDNs produce flexible real-valued distributions, the invertible positive function maps the model into the time-domain while preserving a tractable density. Using four datasets, we show that Survival MDN performs better than, or similarly to continuous and discrete time baselines on concordance, integrated Brier score and integrated binomial log-likelihood. Meanwhile, Survival MDNs are also faster than ODE-based models and circumvent binning issues in discrete models.
    Improving Computed Tomography (CT) Reconstruction via 3D Shape Induction. (arXiv:2208.10937v1 [eess.IV])
    Chest computed tomography (CT) imaging adds valuable insight in the diagnosis and management of pulmonary infectious diseases, like tuberculosis (TB). However, due to the cost and resource limitations, only X-ray images may be available for initial diagnosis or follow up comparison imaging during treatment. Due to their projective nature, X-rays images may be more difficult to interpret by clinicians. The lack of publicly available paired X-ray and CT image datasets makes it challenging to train a 3D reconstruction model. In addition, Chest X-ray radiology may rely on different device modalities with varying image quality and there may be variation in underlying population disease spectrum that creates diversity in inputs. We propose shape induction, that is, learning the shape of 3D CT from X-ray without CT supervision, as a novel technique to incorporate realistic X-ray distributions during training of a reconstruction model. Our experiments demonstrate that this process improves both the perceptual quality of generated CT and the accuracy of down-stream classification of pulmonary infectious diseases.
    Estimation Contracts for Outlier-Robust Geometric Perception. (arXiv:2208.10521v1 [stat.ML])
    Outlier-robust estimation is a fundamental problem and has been extensively investigated by statisticians and practitioners. The last few years have seen a convergence across research fields towards "algorithmic robust statistics", which focuses on developing tractable outlier-robust techniques for high-dimensional estimation problems. Despite this convergence, research efforts across fields have been mostly disconnected from one another. This paper bridges recent work on certifiable outlier-robust estimation for geometric perception in robotics and computer vision with parallel work in robust statistics. In particular, we adapt and extend recent results on robust linear regressions (applicable to the low-outlier case with > 50% outliers) to the setup commonly found in robotics and vision, where (i) variables (e.g., rotations, poses) belong to a non-convex domain, (ii) measurements are vector-valued, and (iii) the number of outliers is not known a priori. The emphasis here is on performance guarantees: rather than proposing new algorithms, we provide conditions on the input measurements under which modern estimation algorithms are guaranteed to recover an estimate close to the ground truth in the presence of outliers. These conditions are what we call an "estimation contract". Besides the proposed extensions of existing results, we believe the main contributions of this paper are (i) to unify parallel research lines by pointing out commonalities and differences, (ii) to introduce advanced material (e.g., sum-of-squares proofs) in an accessible and self-contained presentation for the practitioner, and (iii) to point out a few immediate opportunities and open questions in outlier-robust geometric perception.
    A Meta-Analysis of Solar Forecasting Based on Skill Score. (arXiv:2208.10536v1 [stat.AP])
    We conduct the first comprehensive meta-analysis of deterministic solar forecasting based on skill score, screening 1,447 papers from Google Scholar and reviewing the full texts of 320 papers for data extraction. A database of 4,758 points was built and analyzed with multivariate adaptive regression spline modelling, partial dependence plots, and linear regression. Notably, the analysis accounts for the most important non-linear relationships and interaction terms in the data. We quantify the impacts on forecast accuracy of important variables such as forecast horizon, resolution, climate conditions, regions' annual solar irradiance level, power system size and capacity, forecast models, train and test sets, and the use of different techniques and inputs. By controlling for the key differences between forecasts, including location variables, the findings from the analysis can be applied globally. An overview of scientific progress in the field is also provided.
    Semi-Supervised Manifold Learning with Complexity Decoupled Chart Autoencoders. (arXiv:2208.10570v1 [cs.LG])
    Autoencoding is a popular method in representation learning. Conventional autoencoders employ symmetric encoding-decoding procedures and a simple Euclidean latent space to detect hidden low-dimensional structures in an unsupervised way. This work introduces a chart autoencoder with an asymmetric encoding-decoding process that can incorporate additional semi-supervised information such as class labels. Besides enhancing the capability for handling data with complicated topological and geometric structures, these models can successfully differentiate nearby but disjoint manifolds and intersecting manifolds with only a small amount of supervision. Moreover, this model only requires a low complexity encoder, such as local linear projection. We discuss the theoretical approximation power of such networks that essentially depends on the intrinsic dimension of the data manifold and not the dimension of the observations. Our numerical experiments on synthetic and real-world data verify that the proposed model can effectively manage data with multi-class nearby but disjoint manifolds of different classes, overlapping manifolds, and manifolds with non-trivial topology.
    Targeted Advertising on Social Networks Using Online Variational Tensor Regression. (arXiv:2208.10627v1 [cs.SI])
    This paper is concerned with online targeted advertising on social networks. The main technical task we address is to estimate the activation probability for user pairs, which quantifies the influence one user may have on another towards purchasing decisions. This is a challenging task because one marketing episode typically involves a multitude of marketing campaigns/strategies of different products for highly diverse customers. In this paper, we propose what we believe is the first tensor-based contextual bandit framework for online targeted advertising. The proposed framework is designed to accommodate any number of feature vectors in the form of multi-mode tensor, thereby enabling to capture the heterogeneity that may exist over user preferences, products, and campaign strategies in a unified manner. To handle inter-dependency of tensor modes, we introduce an online variational algorithm with a mean-field approximation. We empirically confirm that the proposed TensorUCB algorithm achieves a significant improvement in influence maximization tasks over the benchmarks, which is attributable to its capability of capturing the user-product heterogeneity.
    Transferability Ranking of Adversarial Examples. (arXiv:2208.10878v1 [cs.LG])
    Adversarial examples can be used to maliciously and covertly change a model's prediction. It is known that an adversarial example designed for one model can transfer to other models as well. This poses a major threat because it means that attackers can target systems in a blackbox manner. In the domain of transferability, researchers have proposed ways to make attacks more transferable and to make models more robust to transferred examples. However, to the best of our knowledge, there are no works which propose a means for ranking the transferability of an adversarial example in the perspective of a blackbox attacker. This is an important task because an attacker is likely to use only a select set of examples, and therefore will want to select the samples which are most likely to transfer. In this paper we suggest a method for ranking the transferability of adversarial examples without access to the victim's model. To accomplish this, we define and estimate the expected transferability of a sample given limited information about the victim. We also explore practical scenarios: where the adversary can select the best sample to attack and where the adversary must use a specific sample but can choose different perturbations. Through our experiments, we found that our ranking method can increase an attacker's success rate by up to 80% compared to the baseline (random selection without ranking).
    Convolutional Neural Networks with A Topographic Representation Module for EEG-Based Brain-Computer Interfaces. (arXiv:2208.10708v1 [eess.SP])
    Objective: Convolutional Neural Networks (CNNs) have shown great potential in the field of Brain-Computer Interface (BCI) due to their ability to directly process the raw Electroencephalogram (EEG) without artificial feature extraction. The raw EEG signal is usually represented as 2-Dimensional (2-D) matrix composed of channels and time points, which ignores the spatial topological information of EEG. Our goal is to make the CNN with the raw EEG signal as input have the ability to learn the EEG spatial topological features, and improve its classification performance while essentially maintaining its original structure. Methods: We propose an EEG Topographic Representation Module (TRM). This module consists of (1) a mapping block from the raw EEG signal to a 3-D topographic map and (2) a convolution block from the topographic map to an output of the same size as the input. We embed the TRM to 3 widely used CNNs, and tested them on 2 different types of publicly available datasets. Results: The results show that the classification accuracies of the 3 CNNs are improved on both datasets after using TRM. The average classification accuracies of DeepConvNet, EEGNet and ShallowConvNet with TRM are improved by 4.70\%, 1.29\% and 0.91\% on Emergency Braking During Simulated Driving Dataset (EBDSDD), and 2.83\%, 2.17\% and 2.00\% on High Gamma Dataset (HGD), respectively. Significance: By using TRM to mine the spatial topological features of EEG, we improve the classification performance of 3 CNNs on 2 datasets. In addition,since the output of TRM has the same size as the input, any CNN with the raw EEG signal as input can use this module without changing the original structure.
    Event-Triggered Time-Varying Bayesian Optimization. (arXiv:2208.10790v1 [cs.LG])
    We consider the problem of sequentially optimizing a time-varying objective function using time-varying Bayesian optimization (TVBO). Here, the key challenge is to cope with old data. Current approaches to TVBO require prior knowledge of a constant rate of change. However, the rate of change is usually neither known nor constant. We propose an event-triggered algorithm, ET-GP-UCB, that detects changes in the objective function online. The event-trigger is based on probabilistic uniform error bounds used in Gaussian process regression. The trigger automatically detects when significant change in the objective functions occurs. The algorithm then adapts to the temporal change by resetting the accumulated dataset. We provide regret bounds for ET-GP-UCB and show in numerical experiments that it is competitive with state-of-the-art algorithms even though it requires no knowledge about the temporal changes. Further, ET-GP-UCB outperforms these competitive baselines if the rate of change is misspecified and we demonstrate that it is readily applicable to various settings without tuning hyperparameters.
    Neural PCA for Flow-Based Representation Learning. (arXiv:2208.10753v1 [cs.CV])
    Of particular interest is to discover useful representations solely from observations in an unsupervised generative manner. However, the question of whether existing normalizing flows provide effective representations for downstream tasks remains mostly unanswered despite their strong ability for sample generation and density estimation. This paper investigates this problem for such a family of generative models that admits exact invertibility. We propose Neural Principal Component Analysis (Neural-PCA) that operates in full dimensionality while capturing principal components in \emph{descending} order. Without exploiting any label information, the principal components recovered store the most informative elements in their \emph{leading} dimensions and leave the negligible in the \emph{trailing} ones, allowing for clear performance improvements of $5\%$-$10\%$ in downstream tasks. Such improvements are empirically found consistent irrespective of the number of latent trailing dimensions dropped. Our work suggests that necessary inductive bias be introduced into generative modelling when representation quality is of interest.
    A differentiable short-time Fourier transform with respect to the window length. (arXiv:2208.10886v1 [cs.LG])
    In this paper, we revisit the use of spectrograms in neural networks, by making the window length a continuous parameter optimizable by gradient descent instead of an empirically tuned integer-valued hyperparameter. The contribution is mostly theoretical at this point, but plugging the modified STFT into any existing neural network is straightforward. We first define a differentiable version of the STFT in the case where local bins centers are fixed and independent of the window length parameter. We then discuss the more difficult case where the window length affects the position and number of bins. We illustrate the benefits of this new tool on an estimation and a classification problems, showing it can be of interest not only to neural networks but to any STFT-based signal processing algorithm.
    Learning linear modules in a dynamic network with missing node observations. (arXiv:2208.10995v1 [eess.SY])
    In order to identify a system (module) embedded in a dynamic network, one has to formulate a multiple-input estimation problem that necessitates certain nodes to be measured and included as predictor inputs. However, some of these nodes may not be measurable in many practical cases due to sensor selection and placement issues. This may result in biased estimates of the target module. Furthermore, the identification problem associated with the multiple-input structure may require determining a large number of parameters that are not of particular interest to the experimenter, with increased computational complexity in large-sized networks. In this paper, we tackle these problems by using a data augmentation strategy that allows us to reconstruct the missing node measurements and increase the accuracy of the estimated target module. To this end, we develop a system identification method using regularized kernel-based methods coupled with approximate inference methods. Keeping a parametric model for the module of interest, we model the other modules as Gaussian Processes (GP) with a kernel given by the so-called stable spline kernel. An Empirical Bayes (EB) approach is used to estimate the parameters of the target module. The related optimization problem is solved using an Expectation-Maximization (EM) method, where we employ a Markov-chain Monte Carlo (MCMC) technique to reconstruct the unknown missing node information and the network dynamics. Numerical simulations on dynamic network examples illustrate the potentials of the developed method.
    Enhancement Encoding: A New Imbalanced Classification Approach via Encoding the Labels. (arXiv:2208.11056v1 [cs.LG])
    Class imbalance, which is also called long-tailed distribution, is a common problem in classification tasks based on machine learning. If it happens, the minority data will be overwhelmed by the majority, which presents quite a challenge for data science. To address the class imbalance problem, researchers have proposed lots of methods: some people make the data set balanced (SMOTE), some others refine the loss function (Focal Loss), and even someone has noticed the value of labels influences class-imbalanced learning (Yang and Xu. Rethinking the value of labels for improving class-imbalanced learning. In NeurIPS 2020), but no one changes the way to encode the labels of data yet. Nowadays, the most prevailing technique to encode labels is the one-hot encoding due to its nice performance in the general situation. However, it is not a good choice for imbalanced data, because the classifier will treat majority and minority samples equally. In this paper, we innovatively propose the enhancement encoding technique, which is specially designed for the imbalanced classification. The enhancement encoding combines re-weighting and cost-sensitiveness, which can reflect the difference between hard and easy (or minority and majority) classes. In order to reduce the number of validation samples and the computation cost, we also replace the confusion matrix with the novel soft-confusion matrix which works better with a small validation set. In the experiments, we evaluate the enhancement encoding with three different types of loss. And the results show that enhancement encoding is very effective to improve the performance of the network trained with imbalanced data. Particularly, the performance on minority classes is much better.
    Solving Royal Game of Ur Using Reinforcement Learning. (arXiv:2208.10669v1 [cs.LG])
    Reinforcement Learning has recently surfaced as a very powerful tool to solve complex problems in the domain of board games, wherein an agent is generally required to learn complex strategies and moves based on its own experiences and rewards received. While RL has outperformed existing state-of-the-art methods used for playing simple video games and popular board games, it is yet to demonstrate its capability on ancient games. Here, we solve one such problem, where we train our agents using different methods namely Monte Carlo, Qlearning and Expected Sarsa to learn optimal policy to play the strategic Royal Game of Ur. The state space for our game is complex and large, but our agents show promising results at playing the game and learning important strategic moves. Although it is hard to conclude that when trained with limited resources which algorithm performs better overall, but Expected Sarsa shows promising results when it comes to fastest learning.
    ECU Identification using Neural Network Classification and Hyperparameter Tuning. (arXiv:2208.10651v1 [cs.CR])
    Intrusion detection for Controller Area Network (CAN) protocol requires modern methods in order to compete with other electrical architectures. Fingerprint Intrusion Detection Systems (IDS) provide a promising new approach to solve this problem. By characterizing network traffic from known ECUs, hazardous messages can be discriminated. In this article, a modified version of Fingerprint IDS is employed utilizing both step response and spectral characterization of network traffic via neural network training. With the addition of feature set reduction and hyperparameter tuning, this method accomplishes a 99.4% detection rate of trusted ECU traffic.
    Smoothness Analysis for Probabilistic Programs with Application to Optimised Variational Inference. (arXiv:2208.10530v1 [cs.PL])
    We present a static analysis for discovering differentiable or more generally smooth parts of a given probabilistic program, and show how the analysis can be used to improve the pathwise gradient estimator, one of the most popular methods for posterior inference and model learning. Our improvement increases the scope of the estimator from differentiable models to non-differentiable ones without requiring manual intervention of the user; the improved estimator automatically identifies differentiable parts of a given probabilistic program using our static analysis, and applies the pathwise gradient estimator to the identified parts while using a more general but less efficient estimator, called score estimator, for the rest of the program. Our analysis has a surprisingly subtle soundness argument, partly due to the misbehaviours of some target smoothness properties when viewed from the perspective of program analysis designers. For instance, some smoothness properties are not preserved by function composition, and this makes it difficult to analyse sequential composition soundly without heavily sacrificing precision. We formulate five assumptions on a target smoothness property, prove the soundness of our analysis under those assumptions, and show that our leading examples satisfy these assumptions. We also show that by using information from our analysis, our improved gradient estimator satisfies an important differentiability requirement and thus, under a mild regularity condition, computes the correct estimate on average, i.e., it returns an unbiased estimate. Our experiments with representative probabilistic programs in the Pyro language show that our static analysis is capable of identifying smooth parts of those programs accurately, and making our improved pathwise gradient estimator exploit all the opportunities for high performance in those programs.
    Low Complexity Classification Approach for Faster-than-Nyquist (FTN) Signalling Detection. (arXiv:2208.10637v1 [cs.IT])
    Faster-than-Nyquist (FTN) signaling can improve the spectral efficiency (SE); however, at the expense of high computational complexity to remove the introduced intersymbol interference (ISI). Motivated by the recent success of ML in physical layer (PHY) problems, in this paper we investigate the use of ML in reducing the detection complexity of FTN signaling. In particular, we view the FTN signaling detection problem as a classification task, where the received signal is considered as an unlabeled class sample that belongs to a set of all possible classes samples. If we use an off-shelf classifier, then the set of all possible classes samples belongs to an $N$-dimensional space, where $N$ is the transmission block length, which has a huge computational complexity. We propose a low-complexity classifier (LCC) that exploits the ISI structure of FTN signaling to perform the classification task in $N_p \ll N$-dimension space. The proposed LCC consists of two stages: 1) offline pre-classification that constructs the labeled classes samples in the $N_p$-dimensional space and 2) online classification where the detection of the received samples occurs. The proposed LCC is extended to produce soft-outputs as well. Simulation results show the effectiveness of the proposed LCC in balancing performance and complexity.
    Atrial Fibrillation Recurrence Risk Prediction from 12-lead ECG Recorded Pre- and Post-Ablation Procedure. (arXiv:2208.10550v1 [cs.LG])
    Introduction: 12-lead electrocardiogram (ECG) is recorded during atrial fibrillation (AF) catheter ablation procedure (CAP). It is not easy to determine if CAP was successful without a long follow-up assessing for AF recurrence (AFR). Therefore, an AFR risk prediction algorithm could enable a better management of CAP patients. In this research, we extracted features from 12-lead ECG recorded before and after CAP and train an AFR risk prediction machine learning model. Methods: Pre- and post-CAP segments were extracted from 112 patients. The analysis included a signal quality criterion, heart rate variability and morphological biomarkers engineered from the 12-lead ECG (804 features overall). 43 out of the 112 patients (n) had AFR clinical endpoint available. These were utilized to assess the feasibility of AFR risk prediction, using either pre or post CAP features. A random forest classifier was trained within a nested cross validation framework. Results: 36 features were found statistically significant for distinguishing between the pre and post surgery states (n=112). For the classification, an area under the receiver operating characteristic (AUROC) curve was reported with AUROC_pre=0.64 and AUROC_post=0.74 (n=43). Discussion and conclusions: This preliminary analysis showed the feasibility of AFR risk prediction. Such a model could be used to improve CAP management.
    Towards an AI-based Early Warning System for Bridge Scour. (arXiv:2208.10500v1 [cs.LG])
    Scour is the number one cause of bridge failure in many parts of the world. Considering the lack of reliability in existing empirical equations for scour depth estimation and the complexity and uncertainty of scour as a physical phenomenon, it is essential to develop more reliable solutions for scour risk assessment. This study introduces a novel AI approach for early forecast of scour based on real-time monitoring data obtained from sonar and stage sensors installed at bridge piers. Long-short Term Memory networks (LSTMs), a prominent Deep Learning algorithm successfully used for time-series forecasting in other fields, were developed and trained using river stage and bed elevation readings for more than 11 years obtained from Alaska scour monitoring program. The capability of the AI models in scour prediction is shown for three case-study bridges. Results show that LSTMs can capture the temporal and seasonal patterns of both flow and river bed variations around bridge piers, through cycles of scour and filling and can provide reasonable predictions of upcoming scour depth as early as seven days in advance. It is expected that the proposed solution can be implemented by transportation authorities for development of emerging AI-based early warning systems, enabling superior bridge scour management.  ( 2 min )
    Are disentangled representations all you need to build speaker anonymization systems?. (arXiv:2208.10497v1 [cs.SD])
    Speech signals contain a lot of sensitive information, such as the speaker's identity, which raises privacy concerns when speech data get collected. Speaker anonymization aims to transform a speech signal to remove the source speaker's identity while leaving the spoken content unchanged. Current methods perform the transformation by relying on content/speaker disentanglement and voice conversion. Usually, an acoustic model from an automatic speech recognition system extracts the content representation while an x-vector system extracts the speaker representation. Prior work has shown that the extracted features are not perfectly disentangled. This paper tackles how to improve features disentanglement, and thus the converted anonymized speech. We propose enhancing the disentanglement by removing speaker information from the acoustic model using vector quantization. Evaluation done using the VoicePrivacy 2022 toolkit showed that vector quantization helps conceal the original speaker identity while maintaining utility for speech recognition.  ( 2 min )
    Some Supervision Required: Incorporating Oracle Policies in Reinforcement Learning via Epistemic Uncertainty Metrics. (arXiv:2208.10533v1 [cs.LG])
    An inherent problem in reinforcement learning is coping with policies that are uncertain about what action to take (or the value of a state). Model uncertainty, more formally known as epistemic uncertainty, refers to the expected prediction error of a model beyond the sampling noise. In this paper, we propose a metric for epistemic uncertainty estimation in Q-value functions, which we term pathwise epistemic uncertainty. We further develop a method to compute its approximate upper bound, which we call F -value. We experimentally apply the latter to Deep Q-Networks (DQN) and show that uncertainty estimation in reinforcement learning serves as a useful indication of learning progress. We then propose a new approach to improving sample efficiency in actor-critic algorithms by learning from an existing (previously learned or hard-coded) oracle policy while uncertainty is high, aiming to avoid unproductive random actions during training. We term this Critic Confidence Guided Exploration (CCGE). We implement CCGE on Soft Actor-Critic (SAC) using our F-value metric, which we apply to a handful of popular Gym environments and show that it achieves better sample efficiency and total episodic reward than vanilla SAC in limited contexts.  ( 2 min )
    Design Automation for Fast, Lightweight, and Effective Deep Learning Models: A Survey. (arXiv:2208.10498v1 [cs.LG])
    Deep learning technologies have demonstrated remarkable effectiveness in a wide range of tasks, and deep learning holds the potential to advance a multitude of applications, including in edge computing, where deep models are deployed on edge devices to enable instant data processing and response. A key challenge is that while the application of deep models often incurs substantial memory and computational costs, edge devices typically offer only very limited storage and computational capabilities that may vary substantially across devices. These characteristics make it difficult to build deep learning solutions that unleash the potential of edge devices while complying with their constraints. A promising approach to addressing this challenge is to automate the design of effective deep learning models that are lightweight, require only a little storage, and incur only low computational overheads. This survey offers comprehensive coverage of studies of design automation techniques for deep learning models targeting edge computing. It offers an overview and comparison of key metrics that are used commonly to quantify the proficiency of models in terms of effectiveness, lightness, and computational costs. The survey then proceeds to cover three categories of the state-of-the-art of deep model design automation techniques: automated neural architecture search, automated model compression, and joint automated design and compression. Finally, the survey covers open issues and directions for future research.  ( 3 min )
    Toward Better Target Representation for Source-Free and Black-Box Domain Adaptation. (arXiv:2208.10531v1 [cs.CV])
    Domain adaptation aims at aligning the labeled source domain and the unlabeled target domain, and most existing approaches assume the source data is accessible. Unfortunately, this paradigm raises concerns in data privacy and security. Recent studies try to dispel these concerns by the Source-Free setting, which adapts the source-trained model towards target domain without exposing the source data. However, the Source-Free paradigm is still at risk of data leakage due to adversarial attacks to the source model. Hence, the Black-Box setting is proposed, where only the outputs of source model can be utilized. In this paper, we address both the Source-Free adaptation and the Black-Box adaptation, proposing a novel method named better target representation from Frequency Mixup and Mutual Learning (FMML). Specifically, we introduce a new data augmentation technique as Frequency MixUp, which highlights task-relevant objects in the interpolations, thus enhancing class-consistency and linear behavior for target models. Moreover, we introduce a network regularization method called Mutual Learning to the domain adaptation problem. It transfers knowledge inside the target model via self-knowledge distillation and thus alleviates overfitting on the source domain by learning multi-scale target representations. Extensive experiments show that our method achieves state-of-the-art performance on several benchmark datasets under both settings.  ( 3 min )
    Different Spectral Representations in Optimized Artificial Neural Networks and Brains. (arXiv:2208.10576v1 [cs.LG])
    Recent studies suggest that artificial neural networks (ANNs) that match the spectral properties of the mammalian visual cortex -- namely, the $\sim 1/n$ eigenspectrum of the covariance matrix of neural activities -- achieve higher object recognition performance and robustness to adversarial attacks than those that do not. To our knowledge, however, no previous work systematically explored how modifying the ANN's spectral properties affects performance. To fill this gap, we performed a systematic search over spectral regularizers, forcing the ANN's eigenspectrum to follow $1/n^\alpha$ power laws with different exponents $\alpha$. We found that larger powers (around 2--3) lead to better validation accuracy and more robustness to adversarial attacks on dense networks. This surprising finding applied to both shallow and deep networks and it overturns the notion that the brain-like spectrum (corresponding to $\alpha \sim 1$) always optimizes ANN performance and/or robustness. For convolutional networks, the best $\alpha$ values depend on the task complexity and evaluation metric: lower $\alpha$ values optimized validation accuracy and robustness to adversarial attack for networks performing a simple object recognition task (categorizing MNIST images of handwritten digits); for a more complex task (categorizing CIFAR-10 natural images), we found that lower $\alpha$ values optimized validation accuracy whereas higher $\alpha$ values optimized adversarial robustness. These results have two main implications. First, they cast doubt on the notion that brain-like spectral properties ($\alpha \sim 1$) \emph{always} optimize ANN performance. Second, they demonstrate the potential for fine-tuned spectral regularizers to optimize a chosen design metric, i.e., accuracy and/or robustness.  ( 3 min )
    Friendliness Of Stack Overflow Towards Newbies. (arXiv:2208.10488v1 [cs.HC])
    In today's modern digital world, we have a number of online Question and Answer platforms like Stack Exchange, Quora, and GFG that serve as a medium for people to communicate and help each other. In this paper, we analyzed the effectiveness of Stack Overflow in helping newbies to programming. Every user on this platform goes through a journey. For the first 12 months, we consider them to be a newbie. Post 12 months they come under one of the following categories: Experienced, Lurkers, or Inquisitive. Each question asked has tags assigned to it and we observe that questions with some specific tags have a faster response time indicating an active community in that field over others. The platform had a steady growth up to 2013 after which it started declining, but recently during the pandemic 2020, we can see rejuvenated activity on the platform.
    Improving Speech Emotion Recognition Through Focus and Calibration Attention Mechanisms. (arXiv:2208.10491v1 [cs.SD])
    Attention has become one of the most commonly used mechanisms in deep learning approaches. The attention mechanism can help the system focus more on the feature space's critical regions. For example, high amplitude regions can play an important role for Speech Emotion Recognition (SER). In this paper, we identify misalignments between the attention and the signal amplitude in the existing multi-head self-attention. To improve the attention area, we propose to use a Focus-Attention (FA) mechanism and a novel Calibration-Attention (CA) mechanism in combination with the multi-head self-attention. Through the FA mechanism, the network can detect the largest amplitude part in the segment. By employing the CA mechanism, the network can modulate the information flow by assigning different weights to each attention head and improve the utilization of surrounding contexts. To evaluate the proposed method, experiments are performed with the IEMOCAP and RAVDESS datasets. Experimental results show that the proposed framework significantly outperforms the state-of-the-art approaches on both datasets.  ( 2 min )
    Representation Learning of Knowledge Graph for Wireless Communication Networks. (arXiv:2208.10496v1 [cs.LG])
    With the application of the fifth-generation wireless communication technologies, more smart terminals are being used and generating huge amounts of data, which has prompted extensive research on how to handle and utilize these wireless data. Researchers currently focus on the research on the upper-layer application data or studying the intelligent transmission methods concerning a specific problem based on a large amount of data generated by the Monte Carlo simulations. This article aims to understand the endogenous relationship of wireless data by constructing a knowledge graph according to the wireless communication protocols, and domain expert knowledge and further investigating the wireless endogenous intelligence. We firstly construct a knowledge graph of the endogenous factors of wireless core network data collected via a 5G/B5G testing network. Then, a novel model based on graph convolutional neural networks is designed to learn the representation of the graph, which is used to classify graph nodes and simulate the relation prediction. The proposed model realizes the automatic nodes classification and network anomaly cause tracing. It is also applied to the public datasets in an unsupervised manner. Finally, the results show that the classification accuracy of the proposed model is better than the existing unsupervised graph neural network models, such as VGAE and ARVGE.  ( 3 min )
    Relational Self-Supervised Learning on Graphs. (arXiv:2208.10493v1 [cs.LG])
    Over the past few years, graph representation learning (GRL) has been a powerful strategy for analyzing graph-structured data. Recently, GRL methods have shown promising results by adopting self-supervised learning methods developed for learning representations of images. Despite their success, existing GRL methods tend to overlook an inherent distinction between images and graphs, i.e., images are assumed to be independently and identically distributed, whereas graphs exhibit relational information among data instances, i.e., nodes. To fully benefit from the relational information inherent in the graph-structured data, we propose a novel GRL method, called RGRL, that learns from the relational information generated from the graph itself. RGRL learns node representations such that the relationship among nodes is invariant to augmentations, i.e., augmentation-invariant relationship, which allows the node representations to vary as long as the relationship among the nodes is preserved. By considering the relationship among nodes in both global and local perspectives, RGRL overcomes limitations of previous contrastive and non-contrastive methods, and achieves the best of both worlds. Extensive experiments on fourteen benchmark datasets over various downstream tasks demonstrate the superiority of RGRL over state-of-the-art baselines. The source code for RGRL is available at https://github.com/Namkyeong/RGRL.  ( 2 min )
    Dataset Condensation with Latent Space Knowledge Factorization and Sharing. (arXiv:2208.10494v1 [cs.LG])
    In this paper, we introduce a novel approach for systematically solving dataset condensation problem in an efficient manner by exploiting the regularity in a given dataset. Instead of condensing the dataset directly in the original input space, we assume a generative process of the dataset with a set of learnable codes defined in a compact latent space followed by a set of tiny decoders which maps them differently to the original input space. By combining different codes and decoders interchangeably, we can dramatically increase the number of synthetic examples with essentially the same parameter count, because the latent space is much lower dimensional and since we can assume as many decoders as necessary to capture different styles represented in the dataset with negligible cost. Such knowledge factorization allows efficient sharing of information between synthetic examples in a systematic way, providing far better trade-off between compression ratio and quality of the generated examples. We experimentally show that our method achieves new state-of-the-art records by significant margins on various benchmark datasets such as SVHN, CIFAR10, CIFAR100, and TinyImageNet.  ( 2 min )
    DIDER: Discovering Interpretable Dynamically Evolving Relations. (arXiv:2208.10592v1 [cs.RO])
    Effective understanding of dynamically evolving multiagent interactions is crucial to capturing the underlying behavior of agents in social systems. It is usually challenging to observe these interactions directly, and therefore modeling the latent interactions is essential for realizing the complex behaviors. Recent work on Dynamic Neural Relational Inference (DNRI) captures explicit inter-agent interactions at every step. However, prediction at every step results in noisy interactions and lacks intrinsic interpretability without post-hoc inspection. Moreover, it requires access to ground truth annotations to analyze the predicted interactions, which are hard to obtain. This paper introduces DIDER, Discovering Interpretable Dynamically Evolving Relations, a generic end-to-end interaction modeling framework with intrinsic interpretability. DIDER discovers an interpretable sequence of inter-agent interactions by disentangling the task of latent interaction prediction into sub-interaction prediction and duration estimation. By imposing the consistency of a sub-interaction type over an extended time duration, the proposed framework achieves intrinsic interpretability without requiring any post-hoc inspection. We evaluate DIDER on both synthetic and real-world datasets. The experimental results demonstrate that modeling disentangled and interpretable dynamic relations improves performance on trajectory forecasting tasks.  ( 2 min )
    ZerO Initialization: Initializing Residual Networks with only Zeros and Ones. (arXiv:2110.12661v2 [cs.LG] UPDATED)
    Deep neural networks are usually initialized with random weights, with adequately selected initial variance to ensure stable signal propagation during training. However, selecting the appropriate variance becomes challenging especially as the number of layers grows. In this work, we replace random weight initialization with a fully deterministic initialization scheme, viz., ZerO, which initializes the weights of networks with only zeros and ones (up to a normalization factor), based on identity and Hadamard transforms. Through both theoretical and empirical studies, we demonstrate that ZerO is able to train networks without damaging their expressivity. Applying ZerO on ResNet achieves state-of-the-art performance on various datasets, including ImageNet, which suggests random weights may be unnecessary for network initialization. In addition, ZerO has many benefits, such as training ultra deep networks (without batch-normalization), exhibiting low-rank learning trajectories that result in low-rank and sparse solutions, and improving training reproducibility.  ( 2 min )
    Kernel Methods for Causal Functions: Dose, Heterogeneous, and Incremental Response Curves. (arXiv:2010.04855v6 [econ.EM] UPDATED)
    We propose estimators based on kernel ridge regression for nonparametric causal functions such as dose, heterogeneous, and incremental response curves. Treatment and covariates may be discrete or continuous in general spaces. Due to a decomposition property specific to the RKHS, our estimators have simple closed form solutions. We prove uniform consistency with improved finite sample rates, via original analysis of generalized kernel ridge regression. We extend our main results to counterfactual distributions and to causal functions identified by front and back door criteria. In nonlinear simulations with many covariates, we achieve state-of-the-art performance.  ( 2 min )
  • Open

    Synthetic learner: model-free inference on treatments over time. (arXiv:1904.01490v2 [stat.ME] UPDATED)
    Understanding the effect of a particular treatment or a policy pertains to many areas of interest, ranging from political economics, marketing to healthcare. In this paper, we develop a non-parametric algorithm for detecting the effects of treatment over time in the context of Synthetic Controls. The method builds on counterfactual predictions from many algorithms without necessarily assuming that the algorithms correctly capture the model. We introduce an inferential procedure for detecting treatment effects and show that the testing procedure is asymptotically valid for stationary, beta mixing processes without imposing any restriction on the set of base algorithms under consideration. We discuss consistency guarantees for average treatment effect estimates and derive regret bounds for the proposed methodology. The class of algorithms may include Random Forest, Lasso, or any other machine-learning estimator. Numerical studies and an application illustrate the advantages of the method.
    Strategic Decision-Making in the Presence of Information Asymmetry: Provably Efficient RL with Algorithmic Instruments. (arXiv:2208.11040v1 [stat.ML])
    We study offline reinforcement learning under a novel model called strategic MDP, which characterizes the strategic interactions between a principal and a sequence of myopic agents with private types. Due to the bilevel structure and private types, strategic MDP involves information asymmetry between the principal and the agents. We focus on the offline RL problem, where the goal is to learn the optimal policy of the principal concerning a target population of agents based on a pre-collected dataset that consists of historical interactions. The unobserved private types confound such a dataset as they affect both the rewards and observations received by the principal. We propose a novel algorithm, Pessimistic policy Learning with Algorithmic iNstruments (PLAN), which leverages the ideas of instrumental variable regression and the pessimism principle to learn a near-optimal principal's policy in the context of general function approximation. Our algorithm is based on the critical observation that the principal's actions serve as valid instrumental variables. In particular, under a partial coverage assumption on the offline dataset, we prove that PLAN outputs a $1 / \sqrt{K}$-optimal policy with $K$ being the number of collected trajectories. We further apply our framework to some special cases of strategic MDP, including strategic regression, strategic bandit, and noncompliance in recommendation systems.
    RAB: Provable Robustness Against Backdoor Attacks. (arXiv:2003.08904v7 [cs.LG] UPDATED)
    Recent studies have shown that deep neural networks (DNNs) are vulnerable to adversarial attacks, including evasion and backdoor (poisoning) attacks. On the defense side, there have been intensive efforts on improving both empirical and provable robustness against evasion attacks; however, the provable robustness against backdoor attacks still remains largely unexplored. In this paper, we focus on certifying the machine learning model robustness against general threat models, especially backdoor attacks. We first provide a unified framework via randomized smoothing techniques and show how it can be instantiated to certify the robustness against both evasion and backdoor attacks. We then propose the first robust training process, RAB, to smooth the trained model and certify its robustness against backdoor attacks. We prove the robustness bound for machine learning models trained with RAB and prove that our robustness bound is tight. In addition, we theoretically show that it is possible to train the robust smoothed models efficiently for simple models such as K-nearest neighbor classifiers, and we propose an exact smooth-training algorithm that eliminates the need to sample from a noise distribution for such models. Empirically, we conduct comprehensive experiments for different machine learning (ML) models such as DNNs, support vector machines, and K-NN models on MNIST, CIFAR-10, and ImageNette datasets and provide the first benchmark for certified robustness against backdoor attacks. In addition, we evaluate K-NN models on a spambase tabular dataset to demonstrate the advantages of the proposed exact algorithm. Both the theoretic analysis and the comprehensive evaluation on diverse ML models and datasets shed light on further robust learning strategies against general training time attacks.
    Kernel Methods for Causal Functions: Dose, Heterogeneous, and Incremental Response Curves. (arXiv:2010.04855v6 [econ.EM] UPDATED)
    We propose estimators based on kernel ridge regression for nonparametric causal functions such as dose, heterogeneous, and incremental response curves. Treatment and covariates may be discrete or continuous in general spaces. Due to a decomposition property specific to the RKHS, our estimators have simple closed form solutions. We prove uniform consistency with improved finite sample rates, via original analysis of generalized kernel ridge regression. We extend our main results to counterfactual distributions and to causal functions identified by front and back door criteria. In nonlinear simulations with many covariates, we achieve state-of-the-art performance.
    Kernel Methods for Multistage Causal Inference: Mediation Analysis and Dynamic Treatment Effects. (arXiv:2111.03950v2 [stat.ME] UPDATED)
    We propose simple estimators for mediation analysis and dynamic treatment effects over short horizons, which preserve the nonlinearity, dependence, and effect modification of identification theory. We allow treatments, mediators, and covariates to be discrete or continuous in general spaces. Across this broad variety of data settings, the estimators have closed form solutions in terms of kernel matrix operations due to our algorithmic innovation: sequential mean embedding of the mediator and covariate conditional distributions given a hypothetical treatment sequence. The simple estimators have strong guarantees. For the continuous treatment case, we prove uniform consistency with finite sample rates that match the minimax optimal rate for standard kernel ridge regression. For the discrete treatment case, we prove $n^{-1/2}$ consistency, finite sample Gaussian approximation, and semiparametric efficiency. We extend the analysis to incremental effects and counterfactual distributions, identifying and estimating new causal estimands. In nonlinear simulations with many covariates, we demonstrate state-of-the-art performance. We estimate mediated and dynamic treatment effects of the US Job Corps program for disadvantaged youth, and share a cleaned data set that may serve as a benchmark in future work.
    A Stochastic Variance Reduced Gradient using Barzilai-Borwein Techniques as Second Order Information. (arXiv:2208.11075v1 [math.OC])
    In this paper, we consider to improve the stochastic variance reduce gradient (SVRG) method via incorporating the curvature information of the objective function. We propose to reduce the variance of stochastic gradients using the computationally efficient Barzilai-Borwein (BB) method by incorporating it into the SVRG. We also incorporate a BB-step size as its variant. We prove its linear convergence theorem that works not only for the proposed method but also for the other existing variants of SVRG with second-order information. We conduct the numerical experiments on the benchmark datasets and show that the proposed method with constant step size performs better than the existing variance reduced methods for some test problems.
    Gaussian Process Boosting. (arXiv:2004.02653v4 [cs.LG] UPDATED)
    We introduce a novel way to combine boosting with Gaussian process and mixed effects models. This allows for relaxing, first, the zero or linearity assumption for the prior mean function in Gaussian process and grouped random effects models in a flexible non-parametric way and, second, the independence assumption made in most boosting algorithms. The former is advantageous for prediction accuracy and for avoiding model misspecifications. The latter is important for efficient learning of the fixed effects predictor function and for obtaining probabilistic predictions. Our proposed algorithm is also a novel solution for handling high-cardinality categorical variables in tree-boosting. In addition, we present an extension that scales to large data using a Vecchia approximation for the Gaussian process model relying on novel results for covariance parameter inference. We obtain increased prediction accuracy compared to existing approaches on several simulated and real-world data sets.
    Distribution-free Prediction Sets Adaptive to Unknown Covariate Shift. (arXiv:2203.06126v4 [stat.ME] UPDATED)
    Predicting sets of outcomes -- instead of unique outcomes -- is a promising solution to uncertainty quantification in statistical learning. Despite a rich literature on constructing prediction sets with statistical guarantees, adapting to unknown covariate shift -- a prevalent issue in practice -- poses a serious unsolved challenge. In this paper, we show that prediction sets with finite-sample coverage guarantee are uninformative and propose a novel flexible distribution-free method, PredSet-1Step, to efficiently construct prediction sets with an asymptotic coverage guarantee under unknown covariate shift. We formally show that our method is \textit{asymptotically probably approximately correct}, having well-calibrated coverage error with high confidence for large samples. We illustrate that it achieves nominal coverage in a number of experiments and a data set concerning HIV risk prediction in a South African cohort study. Our theory hinges on a new bound for the convergence rate of the coverage of Wald confidence intervals based on general asymptotically linear estimators.
    Multi-Model Federated Learning with Provable Guarantees. (arXiv:2207.04330v5 [cs.LG] UPDATED)
    Federated Learning (FL) is a variant of distributed learning where edge devices collaborate to learn a model without sharing their data with the central server or each other. We refer to the process of training multiple independent models simultaneously in a federated setting using a common pool of clients as multi-model FL. In this work, we propose two variants of the popular FedAvg algorithm for multi-model FL, with provable convergence guarantees. We further show that for the same amount of computation, multi-model FL can have better performance than training each model separately. We supplement our theoretical results with experiments in strongly convex, convex, and non-convex settings.
    The Lasso with general Gaussian designs with applications to hypothesis testing. (arXiv:2007.13716v2 [math.ST] UPDATED)
    The Lasso is a method for high-dimensional regression, which is now commonly used when the number of covariates $p$ is of the same order or larger than the number of observations $n$. Classical asymptotic normality theory does not apply to this model due to two fundamental reasons: $(1)$ The regularized risk is non-smooth; $(2)$ The distance between the estimator $\widehat{\boldsymbol{\theta}}$ and the true parameters vector $\boldsymbol{\theta}^*$ cannot be neglected. As a consequence, standard perturbative arguments that are the traditional basis for asymptotic normality fail. On the other hand, the Lasso estimator can be precisely characterized in the regime in which both $n$ and $p$ are large and $n/p$ is of order one. This characterization was first obtained in the case of Gaussian designs with i.i.d. covariates: here we generalize it to Gaussian correlated designs with non-singular covariance structure. This is expressed in terms of a simpler ``fixed-design'' model. We establish non-asymptotic bounds on the distance between the distribution of various quantities in the two models, which hold uniformly over signals $\boldsymbol{\theta}^*$ in a suitable sparsity class and over values of the regularization parameter. As an application, we study the distribution of the debiased Lasso and show that a degrees-of-freedom correction is necessary for computing valid confidence intervals.
    The Computational Complexity of ReLU Network Training Parameterized by Data Dimensionality. (arXiv:2105.08675v3 [cs.LG] UPDATED)
    Understanding the computational complexity of training simple neural networks with rectified linear units (ReLUs) has recently been a subject of intensive research. Closing gaps and complementing results from the literature, we present several results on the parameterized complexity of training two-layer ReLU networks with respect to various loss functions. After a brief discussion of other parameters, we focus on analyzing the influence of the dimension $d$ of the training data on the computational complexity. We provide running time lower bounds in terms of W[1]-hardness for parameter $d$ and prove that known brute-force strategies are essentially optimal (assuming the Exponential Time Hypothesis). In comparison with previous work, our results hold for a broad(er) range of loss functions, including $\ell^p$-loss for all $p\in[0,\infty]$. In particular, we extend a known polynomial-time algorithm for constant $d$ and convex loss functions to a more general class of loss functions, matching our running time lower bounds also in these cases.
    Integrative conformal p-values for powerful out-of-distribution testing with labeled outliers. (arXiv:2208.11111v1 [stat.ME])
    This paper develops novel conformal methods to test whether a new observation was sampled from the same distribution as a reference set. Blending inductive and transductive conformal inference in an innovative way, the described methods can re-weight standard conformal p-values based on dependent side information from known out-of-distribution data in a principled way, and can automatically take advantage of the most powerful model from any collection of one-class and binary classifiers. The solution can be implemented either through sample splitting or via a novel transductive cross-validation+ scheme which may also be useful in other applications of conformal inference, due to tighter guarantees compared to existing cross-validation approaches. After studying false discovery rate control and power within a multiple testing framework with several possible outliers, the proposed solution is shown to outperform standard conformal p-values through simulations as well as applications to image recognition and tabular data.
    A differentiable short-time Fourier transform with respect to the window length. (arXiv:2208.10886v1 [cs.LG])
    In this paper, we revisit the use of spectrograms in neural networks, by making the window length a continuous parameter optimizable by gradient descent instead of an empirically tuned integer-valued hyperparameter. The contribution is mostly theoretical at this point, but plugging the modified STFT into any existing neural network is straightforward. We first define a differentiable version of the STFT in the case where local bins centers are fixed and independent of the window length parameter. We then discuss the more difficult case where the window length affects the position and number of bins. We illustrate the benefits of this new tool on an estimation and a classification problems, showing it can be of interest not only to neural networks but to any STFT-based signal processing algorithm.
    Gradient-Variation Bound for Online Convex Optimization with Constraints. (arXiv:2006.12455v2 [math.OC] UPDATED)
    We study online convex optimization with constraints consisting of multiple functional constraints and a relatively simple constraint set, such as a Euclidean ball. As enforcing the constraints at each time step through projections is computationally challenging in general, we allow decisions to violate the functional constraints but aim to achieve a low regret and cumulative violation of the constraints over a horizon of $T$ time steps. First-order methods achieve an $\mathcal{O}(\sqrt{T})$ regret and an $\mathcal{O}(1)$ constraint violation, which is the best-known bound, but do not take into account the structural information of the problem. Furthermore, the existing algorithms and analysis are limited to Euclidean space. In this paper, we provide an \emph{instance-dependent} bound for online convex optimization with complex constraints obtained by a novel online primal-dual mirror-prox algorithm. Our instance-dependent regret is quantified by the total gradient variation $V_*(T)$ in the sequence of loss functions. The proposed algorithm works in \emph{general} non-Euclidean spaces and simultaneously achieves an $\mathcal{O}(\sqrt{V_*(T)})$ regret and an $\mathcal{O}(1)$ constraint violation, which is never worse than the best-known $( \mathcal{O}(\sqrt{T}), \mathcal{O}(1) )$ result and improves over previous works that applied mirror-prox-type algorithms for this problem achieving $\mathcal{O}(T^{2/3})$ regret and constraint violation. Finally, our algorithm is computationally efficient, as it only performs mirror descent steps in each iteration instead of solving a general Lagrangian minimization problem.
    Limits of Entrainment of Circadian Neuronal Networks. (arXiv:2208.11119v1 [q-bio.NC])
    Circadian rhythmicity lies at the center of various important physiological and behavioral processes in mammals, such as sleep, metabolism, homeostasis, mood changes and more. It has been shown that this rhythm arises from self-sustained biomolecular oscillations of a neuronal network located in the Suprachiasmatic Nucleus (SCN). Under normal circumstances, this network remains synchronized to the day-night cycle due to signaling from the retina. Misalignment of these neuronal oscillations with the external light signal can disrupt numerous physiological functions and take a long-lasting toll on health and well-being. In this work, we study a modern computational neuroscience model to determine the limits of circadian synchronization to external light signals of different frequency and duty cycle. We employ a matrix-free approach to locate periodic steady states of the high-dimensional model for various driving conditions. Our algorithmic pipeline enables numerical continuation and construction of bifurcation diagrams w.r.t. forcing parameters. We computationally explore the effect of heterogeneity in the circadian neuronal network, as well as the effect of corrective therapeutic interventions, such as that of the drug molecule Longdaysin. Lastly, we employ unsupervised learning to construct a data-driven embedding space for representing neuronal heterogeneity.
    On the Decision Boundaries of Neural Networks: A Tropical Geometry Perspective. (arXiv:2002.08838v3 [cs.LG] UPDATED)
    This work tackles the problem of characterizing and understanding the decision boundaries of neural networks with piecewise linear non-linearity activations. We use tropical geometry, a new development in the area of algebraic geometry, to characterize the decision boundaries of a simple network of the form (Affine, ReLU, Affine). Our main finding is that the decision boundaries are a subset of a tropical hypersurface, which is intimately related to a polytope formed by the convex hull of two zonotopes. The generators of these zonotopes are functions of the network parameters. This geometric characterization provides new perspectives to three tasks. (i) We propose a new tropical perspective to the lottery ticket hypothesis, where we view the effect of different initializations on the tropical geometric representation of a network's decision boundaries. (ii) Moreover, we propose new tropical based optimization reformulations that directly influence the decision boundaries of the network for the task of network pruning. (iii) At last, we discuss the reformulation of the generation of adversarial attacks in a tropical sense. We demonstrate that one can construct adversaries in a new tropical setting by perturbing a specific set of decision boundaries by perturbing a set of parameters in the network.
    Communication-Constrained Distributed Quantile Regression with Optimal Statistical Guarantees. (arXiv:2110.13113v2 [stat.ME] UPDATED)
    We address the problem of how to achieve optimal inference in distributed quantile regression without stringent scaling conditions. This is challenging due to the non-smooth nature of the quantile regression (QR) loss function, which invalidates the use of existing methodology. The difficulties are resolved through a double-smoothing approach that is applied to the local (at each data source) and global objective functions. Despite the reliance on a delicate combination of local and global smoothing parameters, the quantile regression model is fully parametric, thereby facilitating interpretation. In the low-dimensional regime, we establish a finite-sample theoretical framework for the sequentially defined distributed QR estimators. This reveals a trade-off between the communication cost and statistical error. We further discuss and compare several alternative confidence set constructions, based on inversion of Wald and score-type tests and resampling techniques, detailing an improvement that is effective for more extreme quantile coefficients. In high dimensions, a sparse framework is adopted, where the proposed doubly-smoothed objective function is complemented with an $\ell_1$-penalty. We show that the corresponding distributed penalized QR estimator achieves the global convergence rate after a near-constant number of communication rounds. A thorough simulation study further elucidates our findings.
    SurvSHAP(t): Time-dependent explanations of machine learning survival models. (arXiv:2208.11080v1 [cs.LG])
    Machine and deep learning survival models demonstrate similar or even improved time-to-event prediction capabilities compared to classical statistical learning methods yet are too complex to be interpreted by humans. Several model-agnostic explanations are available to overcome this issue; however, none directly explain the survival function prediction. In this paper, we introduce SurvSHAP(t), the first time-dependent explanation that allows for interpreting survival black-box models. It is based on SHapley Additive exPlanations with solid theoretical foundations and a broad adoption among machine learning practitioners. The proposed methods aim to enhance precision diagnostics and support domain experts in making decisions. Experiments on synthetic and medical data confirm that SurvSHAP(t) can detect variables with a time-dependent effect, and its aggregation is a better determinant of the importance of variables for a prediction than SurvLIME. SurvSHAP(t) is model-agnostic and can be applied to all models with functional output. We provide an accessible implementation of time-dependent explanations in Python at this http URL .
    pystacked: Stacking generalization and machine learning in Stata. (arXiv:2208.10896v1 [econ.EM])
    pystacked implements stacked generalization (Wolpert, 1992) for regression and binary classification via Python's scikit-lear}. Stacking combines multiple supervised machine learners -- the "base" or "level-0" learners -- into a single learner. The currently supported base learners include regularized regression, random forest, gradient boosted trees, support vector machines, and feed-forward neural nets (multi-layer perceptron). pystacked can also be used with as a `regular' machine learning program to fit a single base learner and, thus, provides an easy-to-use API for scikit-learn's machine learning algorithms.
    Graph Embeddings via Tensor Products and Approximately Orthonormal Codes. (arXiv:2208.10917v1 [cs.SI])
    We introduce a method for embedding graphs as vectors in a structure-preserving manner. In this paper, we showcase its rich representational capacity and give some theoretical properties of our method. In particular, our procedure falls under the bind-and-sum approach, and we show that our binding operation -- the tensor product -- is the most general binding operation that respects the principle of superposition. Similarly, we show that the spherical code achieves optimal compression. We then establish some precise results characterizing the performance our method as well as some experimental results showcasing how it can accurately perform various graph operations even when the number of edges is quite large. Finally, we conclude with establishing a link to adjacency matrices, showing that our method is, in some sense, a generalization of adjacency matrices with applications towards large sparse graphs.
    The Value of Out-of-Distribution Data. (arXiv:2208.10967v1 [cs.LG])
    More data helps us generalize to a task. But real datasets can contain out-of-distribution (OOD) data; this can come in the form of heterogeneity such as intra-class variability but also in the form of temporal shifts or concept drifts. We demonstrate a counter-intuitive phenomenon for such problems: generalization error of the task can be a non-monotonic function of the number of OOD samples; a small number of OOD samples can improve generalization but if the number of OOD samples is beyond a threshold, then the generalization error can deteriorate. We also show that if we know which samples are OOD, then using a weighted objective between the target and OOD samples ensures that the generalization error decreases monotonically. We demonstrate and analyze this issue using linear classifiers on synthetic datasets and medium-sized neural networks on CIFAR-10.
    Convergence bounds for nonlinear least squares for tensor recovery. (arXiv:2208.10954v1 [math.NA])
    We consider the problem of approximating a function in general nonlinear subsets of L2 when only a weighted Monte Carlo estimate of the L2-norm can be computed. Of particular interest in this setting is the concept of sample complexity, the number of sample points that are necessary to achieve a prescribed error with high probability. Reasonable worst-case bounds for this quantity exist only for particular subsets of L2, like linear spaces or sets of sparse vectors. For more general subsets, like tensor networks, the currently existing bounds are very pessimistic. By restricting the model class to a neighbourhood of the best approximation, we can derive improved worst-case bounds for the sample complexity. When the considered neighbourhood is a manifold with positive local reach, the sample complexity can be estimated by the sample complexity of the tangent space and the product of the sample complexity of the normal space and the manifold's curvature.
    Variable importance without impossible data. (arXiv:2205.15750v2 [cs.LG] UPDATED)
    The most popular methods for measuring importance of the variables in a black box prediction algorithm make use of synthetic inputs that combine predictor variables from multiple subjects. These inputs can be unlikely, physically impossible, or even logically impossible. As a result, the predictions for such cases can be based on data very unlike any the black box was trained on. We think that users cannot trust an explanation of the decision of a prediction algorithm when the explanation uses such values. Instead we advocate a method called Cohort Shapley that is grounded in economic game theory and unlike most other game theoretic methods, it uses only actually observed data to quantify variable importance. Cohort Shapley works by narrowing the cohort of subjects judged to be similar to a target subject on one or more features. A feature is important if using it to narrow the cohort makes a large difference to the cohort mean. We illustrate it on an algorithmic fairness problem where it is essential to attribute importance to protected variables that the model was not trained on. For every subject and every predictor variable, we can compute the importance of that predictor to the subject's predicted response or to their actual response. These values can be aggregated, for example over all Black subjects, and we propose a Bayesian bootstrap to quantify uncertainty in both individual and aggregate Shapley values.
    SoK: Certified Robustness for Deep Neural Networks. (arXiv:2009.04131v7 [cs.LG] UPDATED)
    Great advances in deep neural networks (DNNs) have led to state-of-the-art performance on a wide range of tasks. However, recent studies have shown that DNNs are vulnerable to adversarial attacks, which have brought great concerns when deploying these models to safety-critical applications such as autonomous driving. Different defense approaches have been proposed against adversarial attacks, including: a) empirical defenses, which can usually be adaptively attacked again without providing robustness certification; and b) certifiably robust approaches, which consist of robustness verification providing the lower bound of robust accuracy against any attacks under certain conditions and corresponding robust training approaches. In this paper, we systematize certifiably robust approaches and related practical and theoretical implications and findings. We also provide the first comprehensive benchmark on existing robustness verification and training approaches on different datasets. In particular, we 1) provide a taxonomy for the robustness verification and training approaches, as well as summarize the methodologies for representative algorithms, 2) reveal the characteristics, strengths, limitations, and fundamental connections among these approaches, 3) discuss current research progresses, theoretical barriers, main challenges, and future directions for certifiably robust approaches for DNNs, and 4) provide an open-sourced unified platform to evaluate 20+ representative certifiably robust approaches.
    Event-Triggered Time-Varying Bayesian Optimization. (arXiv:2208.10790v1 [cs.LG])
    We consider the problem of sequentially optimizing a time-varying objective function using time-varying Bayesian optimization (TVBO). Here, the key challenge is to cope with old data. Current approaches to TVBO require prior knowledge of a constant rate of change. However, the rate of change is usually neither known nor constant. We propose an event-triggered algorithm, ET-GP-UCB, that detects changes in the objective function online. The event-trigger is based on probabilistic uniform error bounds used in Gaussian process regression. The trigger automatically detects when significant change in the objective functions occurs. The algorithm then adapts to the temporal change by resetting the accumulated dataset. We provide regret bounds for ET-GP-UCB and show in numerical experiments that it is competitive with state-of-the-art algorithms even though it requires no knowledge about the temporal changes. Further, ET-GP-UCB outperforms these competitive baselines if the rate of change is misspecified and we demonstrate that it is readily applicable to various settings without tuning hyperparameters.
    Deriving time-averaged active inference from control principles. (arXiv:2208.10601v1 [eess.SY])
    Active inference offers a principled account of behavior as minimizing average sensory surprise over time. Applications of active inference to control problems have heretofore tended to focus on finite-horizon or discounted-surprise problems, despite deriving from the infinite-horizon, average-surprise imperative of the free-energy principle. Here we derive an infinite-horizon, average-surprise formulation of active inference from optimal control principles. Our formulation returns to the roots of active inference in neuroanatomy and neurophysiology, formally reconnecting active inference to optimal feedback control. Our formulation provides a unified objective functional for sensorimotor control and allows for reference states to vary over time.
    Demand-Side Scheduling Based on Multi-Agent Deep Actor-Critic Learning for Smart Grids. (arXiv:2005.01979v2 [cs.LG] UPDATED)
    We consider the problem of demand-side energy management, where each household is equipped with a smart meter that is able to schedule home appliances online. The goal is to minimize the overall cost under a real-time pricing scheme. While previous works have introduced centralized approaches in which the scheduling algorithm has full observability, we propose the formulation of a smart grid environment as a Markov game. Each household is a decentralized agent with partial observability, which allows scalability and privacy-preservation in a realistic setting. The grid operator produces a price signal that varies with the energy demand. We propose an extension to a multi-agent, deep actor-critic algorithm to address partial observability and the perceived non-stationarity of the environment from the agent's viewpoint. This algorithm learns a centralized critic that coordinates training of decentralized agents. Our approach thus uses centralized learning but decentralized execution. Simulation results show that our online deep reinforcement learning method can reduce both the peak-to-average ratio of total energy consumed and the cost of electricity for all households based purely on instantaneous observations and a price signal.  ( 3 min )
    A flexible empirical Bayes approach to multiple linear regression and connections with penalized regression. (arXiv:2208.10910v1 [stat.ME])
    We introduce a new empirical Bayes approach for large-scale multiple linear regression. Our approach combines two key ideas: (i) the use of flexible "adaptive shrinkage" priors, which approximate the nonparametric family of scale mixture of normal distributions by a finite mixture of normal distributions; and (ii) the use of variational approximations to efficiently estimate prior hyperparameters and compute approximate posteriors. Combining these two ideas results in fast and flexible methods, with computational speed comparable to fast penalized regression methods such as the Lasso, and with superior prediction accuracy across a wide range of scenarios. Furthermore, we show that the posterior mean from our method can be interpreted as solving a penalized regression problem, with the precise form of the penalty function being learned from the data by directly solving an optimization problem (rather than being tuned by cross-validation). Our methods are implemented in an R package, mr.ash.alpha, available from https://github.com/stephenslab/mr.ash.alpha  ( 2 min )
    One-Hot Graph Encoder Embedding. (arXiv:2109.13098v2 [cs.LG] UPDATED)
    In this paper we propose a lightning fast graph embedding method called one-hot graph encoder embedding. It has a linear computational complexity and the capacity to process billions of edges within minutes on standard PC -- making it an ideal candidate for huge graph processing. It is applicable to either adjacency matrix or graph Laplacian, and can be viewed as a transformation of the spectral embedding. Under random graph models, the graph encoder embedding is approximately normally distributed per vertex, and asymptotically converges to its mean. We showcase three applications: vertex classification, vertex clustering, and graph bootstrap. In every case, the graph encoder embedding exhibits unrivalled computational advantages.  ( 2 min )
    Feature Removal Is a Unifying Principle for Model Explanation Methods. (arXiv:2011.03623v2 [cs.LG] UPDATED)
    Researchers have proposed a wide variety of model explanation approaches, but it remains unclear how most methods are related or when one method is preferable to another. We examine the literature and find that many methods are based on a shared principle of explaining by removing - essentially, measuring the impact of removing sets of features from a model. These methods vary in several respects, so we develop a framework for removal-based explanations that characterizes each method along three dimensions: 1) how the method removes features, 2) what model behavior the method explains, and 3) how the method summarizes each feature's influence. Our framework unifies 26 existing methods, including several of the most widely used approaches (SHAP, LIME, Meaningful Perturbations, permutation tests). Exposing the fundamental similarities between these methods empowers users to reason about which tools to use, and suggests promising directions for ongoing model explainability research.  ( 2 min )
    Survival Mixture Density Networks. (arXiv:2208.10759v1 [cs.LG])
    Survival analysis, the art of time-to-event modeling, plays an important role in clinical treatment decisions. Recently, continuous time models built from neural ODEs have been proposed for survival analysis. However, the training of neural ODEs is slow due to the high computational complexity of neural ODE solvers. Here, we propose an efficient alternative for flexible continuous time models, called Survival Mixture Density Networks (Survival MDNs). Survival MDN applies an invertible positive function to the output of Mixture Density Networks (MDNs). While MDNs produce flexible real-valued distributions, the invertible positive function maps the model into the time-domain while preserving a tractable density. Using four datasets, we show that Survival MDN performs better than, or similarly to continuous and discrete time baselines on concordance, integrated Brier score and integrated binomial log-likelihood. Meanwhile, Survival MDNs are also faster than ODE-based models and circumvent binning issues in discrete models.  ( 2 min )
    A Near-Optimal Algorithm for Debiasing Trained Machine Learning Models. (arXiv:2106.12887v3 [cs.LG] UPDATED)
    We present a scalable post-processing algorithm for debiasing trained models, including deep neural networks (DNNs), which we prove to be near-optimal by bounding its excess Bayes risk. We empirically validate its advantages on standard benchmark datasets across both classical algorithms as well as modern DNN architectures and demonstrate that it outperforms previous post-processing methods while performing on par with in-processing. In addition, we show that the proposed algorithm is particularly effective for models trained at scale where post-processing is a natural and practical choice.  ( 2 min )
    CitySim: A Drone-Based Vehicle Trajectory Dataset for Safety Oriented Research and Digital Twins. (arXiv:2208.11036v1 [cs.CV])
    The development of safety-oriented research ideas and applications requires fine-grained vehicle trajectory data that not only has high accuracy but also captures a substantial number of critical safety events. This paper introduces the CitySim Dataset, which was devised with a core objective of facilitating safety-based research and applications. CitySim has vehicle trajectories extracted from 1140-minutes of drone videos recorded at 12 different locations. It covers a variety of road geometries including freeway basic segments, weaving segments, expressway merge/diverge segments, signalized intersections, stop-controlled intersections, and intersections without sign/signal control. CitySim trajectories were generated through a five-step procedure which ensured the trajectory accuracy. Furthermore, the dataset provides vehicle rotated bounding box information which is demonstrated to improve safety evaluation. Compared to other video-based trajectory datasets, the CitySim Dataset has significantly more critical safety events with higher severity including cut-in, merge, and diverge events. In addition, CitySim facilitates research towards digital twin applications by providing relevant assets like the recording locations'3D base maps and signal timings. These features enable more comprehensive conditions for safety research and applications such as autonomous vehicle safety and location-based safety analysis. The dataset is available online at https://github.com/ozheng1993/UCF-SST-CitySim-Dataset.  ( 2 min )
    Prediction of good reaction coordinates and future evolution of MD trajectories using Regularized Sparse Autoencoders: A novel deep learning approach. (arXiv:2208.10962v1 [physics.chem-ph])
    Identifying reaction coordinates(RCs) is an active area of research, given the crucial role RCs play in determining the progress of a chemical reaction. The choice of the reaction coordinate is often based on heuristic knowledge. However, an essential criterion for the choice is that the coordinate should capture both the reactant and product states unequivocally. Also, the coordinate should be the slowest one so that all the other degrees of freedom can easily equilibrate along the reaction coordinate. Also, the coordinate should be the slowest one so that all the other degrees of freedom can easily equilibrate along the reaction coordinate. We used a regularised sparse autoencoder, an energy-based model, to discover a crucial set of reaction coordinates. Along with discovering reaction coordinates, our model also predicts the evolution of a molecular dynamics(MD) trajectory. We showcased that including sparsity enforcing regularisation helps in choosing a small but important set of reaction coordinates. We used two model systems to demonstrate our approach: alanine dipeptide system and proflavine and DNA system, which exhibited intercalation of proflavine into DNA minor groove in an aqueous environment. We model MD trajectory as a multivariate time series, and our latent variable model performs the task of multi-step time series prediction. This idea is inspired by the popular sparse coding approach - to represent each input sample as a linear combination of few elements taken from a set of representative patterns.  ( 3 min )
    Exponential concentration and untrainability in quantum kernel methods. (arXiv:2208.11060v1 [quant-ph])
    Kernel methods in Quantum Machine Learning (QML) have recently gained significant attention as a potential candidate for achieving a quantum advantage in data analysis. Among other attractive properties, when training a kernel-based model one is guaranteed to find the optimal model's parameters due to the convexity of the training landscape. However, this is based on the assumption that the quantum kernel can be efficiently obtained from a quantum hardware. In this work we study the trainability of quantum kernels from the perspective of the resources needed to accurately estimate kernel values. We show that, under certain conditions, values of quantum kernels over different input data can be exponentially concentrated (in the number of qubits) towards some fixed value, leading to an exponential scaling of the number of measurements required for successful training. We identify four sources that can lead to concentration including: the expressibility of data embedding, global measurements, entanglement and noise. For each source, an associated concentration bound of quantum kernels is analytically derived. Lastly, we show that when dealing with classical data, training a parametrized data embedding with a kernel alignment method is also susceptible to exponential concentration. Our results are verified through numerical simulations for several QML tasks. Altogether, we provide guidelines indicating that certain features should be avoided to ensure the efficient evaluation and the trainability of quantum kernel methods.  ( 3 min )
    Estimation Contracts for Outlier-Robust Geometric Perception. (arXiv:2208.10521v1 [stat.ML])
    Outlier-robust estimation is a fundamental problem and has been extensively investigated by statisticians and practitioners. The last few years have seen a convergence across research fields towards "algorithmic robust statistics", which focuses on developing tractable outlier-robust techniques for high-dimensional estimation problems. Despite this convergence, research efforts across fields have been mostly disconnected from one another. This paper bridges recent work on certifiable outlier-robust estimation for geometric perception in robotics and computer vision with parallel work in robust statistics. In particular, we adapt and extend recent results on robust linear regressions (applicable to the low-outlier case with > 50% outliers) to the setup commonly found in robotics and vision, where (i) variables (e.g., rotations, poses) belong to a non-convex domain, (ii) measurements are vector-valued, and (iii) the number of outliers is not known a priori. The emphasis here is on performance guarantees: rather than proposing new algorithms, we provide conditions on the input measurements under which modern estimation algorithms are guaranteed to recover an estimate close to the ground truth in the presence of outliers. These conditions are what we call an "estimation contract". Besides the proposed extensions of existing results, we believe the main contributions of this paper are (i) to unify parallel research lines by pointing out commonalities and differences, (ii) to introduce advanced material (e.g., sum-of-squares proofs) in an accessible and self-contained presentation for the practitioner, and (iii) to point out a few immediate opportunities and open questions in outlier-robust geometric perception.  ( 3 min )

  • Open

    looking to build a team for Ai market project [P]
    I am looking to work with 5 or so people who would be interested in working on a collaborative project that will result in an app that can be used on our mobile devices that provides a signal service based off of an Ai based analysis of key market analysis points as well as a few other factors that I have managed to pick up over the years that if implemented correctly can ensure a very lucrative end product for us to use and share with family or friends. I have been working towards picking up python 3 as of late myself but still have plenty to learn and am hoping to bring people with some legitimate knowledge on board to roll out a working model within a reasonable time frame and to avoid any potential personal bias pitfalls that may end up in it if I were to just build this as a personal project alone. Participation on this project may also lead to a head position at the pre seed startup company that I founded that is focused on designing and producing advanced technologies of various natures. If that interests you we can definitely talk on it. This project is intended to be a side project and therefore not have any particular time frame requirements, nor is it a paid position beyond the resulting program that we build and your subsiquent freedom to use it as you please. submitted by /u/throwawayc117 [link] [comments]  ( 113 min )
    [D] ALiBi enables transformer LMs to extrapolate to longer inputs (Video Lecture)
    A year ago we presented a new position embedding method. It's now used in BigScience's BLOOM model and in a few other models. I just uploaded a video lecture where I explain this ALiBi method and also talk about lots of other topics related to training large language models https://www.youtube.com/watch?v=Pp61ShI9VGc I discuss why transformers overfit to the commonly used absolute (learned/sinusoidal) position embeddings and I discuss paths for future work in long-sequence modeling. Feel free to ask any questions here :) submitted by /u/ofirpress [link] [comments]  ( 107 min )
    Dataset required for model training [PROJECT], [P], #[PROJECT], #[P]
    Ara ara I'm working on a ML model for detecting and analysing morphological data of polymer blends, via ATR-IR Hyperspectral images. I need a dataset...database of these labled Hyperspectral images for training a ML model... submitted by /u/sauronium [link] [comments]  ( 111 min )
    [D] Benchmarks across different papers from almost the same authors
    This post will present a specific example of two different papers from the same research group that shows the same approach with difference performance. My intent with this post is to know whether I am missing something/don't understand the results they are presenting, or validate that for some unknown reason the same method seems to have (significantly) different performance in different papers. The first paper in question is: MAML, the original formulation by Chelsea Finn, Sergey Levine et al. Figure 5 shows a plot of their results. I will focus on the HalfCheetah Rand Vel task, whose results are illustrated on the left. It can be seen that with one gradient step the loss is about -80, with two about -70, and with three about -65. ​ MAML Result (Green is MAML) Now, the second paper is PEARL (Efficient Off-Policy Meta-RL via probabilistic context variables) by Kate Rakelly as first author, but also Chelsea Finn and Sergey Levine. In figure 9 they presumably consider the same problem. ​ PEARL (Pearl method is blue; MAML is orange) It can be seen MAML peformance in the PEARL paper is under -100, which disagrees with what presented in the original paper. One plausible explanation I see is that Fig 9 of PEARL is with 0 gradient steps (Preupdate), but in that case it would not be a first comparison. Another explanation is that they define the task slightly different, .e.g., for the original MAML the target velocity lies between 0.0 and 2.0, but if you define it between 0.0 and 3.0, you will certainly get different performance, but again, why would they do that? I find it a little strange because it is the same research group/same authors partly. Does anyone know another explanation why the MAML results would be such in the PEARL paper? submitted by /u/carlml [link] [comments]  ( 90 min )
    [P] CSP with Class scheduling program using Python
    I want to create a python application that generates possible university course schedules given the names of a few courses. The student would determine the courses they would like to take (e.g. CS50, LITU101, PHY102, etc..) and the program would look at all possible sections for each course and generate a table with all the wanted courses in it. The goal: • Find possible schedules given course without time-conflicts. • If multiple schedules are possible, determine the best one by determining least breaks in between courses (i.e a compact schedule where classes are right after each other) and by other metrics a user chooses like preferring early morning schedules over late noon ones etc. I know this is a constraint satisfaction problem but I dont want to brute force it and I also want to use the heuristic i mentioned above. Any help, pointers, or anything that can help me with this? Thanks! submitted by /u/NaifAlqahtani [link] [comments]  ( 109 min )
    [D] question about ensembles learning in multi class classification
    Hi all ! , Im kindda new to the field of Ai and Image processing. I have a question, what van be the best ensemble learning method for multiclass classification, i thought of majority voting but it can be ineffective due to the number of classes at hand (if we want to create a model and classify set of images into 9 or 10 classes ) submitted by /u/Fox_Rey [link] [comments]  ( 88 min )
    [D] macbook vs nvidia laptop: details in text
    Requirement for: Data scientist / bioinformatics / computational chemistry / machine and deep learning Current utility: datasets from clinical trials, gene expression datasets for 20000 genes, structural biology: crystallography data/pdb/pubchem data/drug bank data/chemoinformatics data(rdkit). Standard libraries used so far: numpy sklearn pandas seabon matplotlib statsmodels scipy tensorflow and then some. Interested in: cuda development, using clara discovery (a drug discovery platform by nvidia), 3D molecular modeling and simulations, generative models (GAN) on 3d molecular descriptors and crystallographic data to simulate molecular configurations/poses/protein atructir prediction I see many people claiming that the new MacBook can do such things but then I also hear how people say that making python work on macbook is cumbersome as is. But at the same time I am not interested in carrying a super heavy hard to carry device that's gonna cook at high heat anytime I use it. Price being no issue can you provide macbook vs nvidia or any laptops which will do well with loading massive datasets, execute code with low latency for python libraries I mentioned, perform machine learning well, have an ability to handle docking/3d simulations etc, have a decent battery life, not have a cheap fragile body, won't get too hot or be too heavy to carry. Also, to leverage gpu computing, any egpu setups you might recommend that I can plug and play with said laptop Thank you submitted by /u/macORnvidia [link] [comments]  ( 92 min )
    Variational Autoencoder reconstruction loss hyper-parameter [D]
    For a VAE, the total loss computation (usually) is: total_loss = (alpha * recon_loss) + (beta * kl_loss) Here alpha and beta are hyper-parameters for recon_loss (reconstruction loss) and kl_loss (KL-divergence loss). One common way to make the VAE's synthesis/generation better approximate/produce the original data is by increasing alpha hyper-parameter. However, as you keep increasing alpha, you are deviating away from a multivariate, standard Gaussian distribution. After a VAE has been trained, you generate new samples by sampling z from a standard Gaussian distribution as- z = tf.random.normal( shape = (batch_size, latent_space_dim), mean = 0.0, stddev = 1.0, dtype = tf.float32 ) This z is then fed through the VAE's decoder to get the generated samples. But, since the encoded latent space distribution has diverged from a standard Gaussian distribution, sampling z as above would lead to worse synthesis. How can you avoid this? Or, put differently, what's the appropriate way to sample from your latent space distribution for a high value of alpha? submitted by /u/grid_world [link] [comments]  ( 89 min )
    [R] Anomalies are serious business!
    https://preview.redd.it/sta831kbhhj91.jpg?width=985&format=pjpg&auto=webp&s=e724233456c816544bc4eaac919ff8b084f7a7fe ​ • Real-world dataset often contains extreme events and anomalies that tend to be rare and random • Why not learn and retain information from rare extreme events of the past? • Developing models for these critical moments of extreme events will remain a puzzle • Unless the long-term effects of anomalies are well captured and utilized Paper 📜: https://arxiv.org/abs/2208.09933 submitted by /u/afarhangi [link] [comments]  ( 89 min )
    [D] How to handle absurd batch sizes in SimCLR / OpenAI's CLIP?
    OpenAI's CLIP has a loss function that requires around 32K samples to calculate. Normally, we can deal with large loss functions by using micro-batches. However, we can't do that here because the loss function requires all samples, so micro-batching isn't really an option. Has anyone here implemented a CLIP / SimCLR sized model? submitted by /u/vanilla-acc [link] [comments]  ( 89 min )
    [D] How to Run Stable Diffusion (Locally and in Colab)
    As most of you know, weights for Stable Diffusion were released yesterday. I've been spending the last day or so playing around with it and it's amazing - I put a few examples below! I also put together this guide on How to Run Stable Diffusion - it goes through setup both for local machines and Colab notebooks. Setup only takes a few minutes! ​ "A Spartan warrior in the style of Salvador Dali" https://preview.redd.it/0nvzi9fj9hj91.png?width=512&format=png&auto=webp&s=a2ba8aec7745af9ba9ae747b124a3d9f4925a93d "a vaporwave image of a black hole" https://preview.redd.it/ehmyln1k9hj91.png?width=512&format=png&auto=webp&s=9a07769a150da598e4c626b7cb2b1494ffc11971 "a photorealistic image of Iron Man making breakfast" https://preview.redd.it/v8nkwc5m9hj91.png?width=512&format=png&auto=webp&s=758a9d34c4cd068d6cc85fb5b6d53600d8866bdc submitted by /u/SleekEagle [link] [comments]  ( 110 min )
    [D] Is there a more general analytic way to determine why a neural network tweak hurts performance a few percent points?
    I add a new module to my neural network and it reduces accuracy by about 2-3%. I could figure out why through by retraining with/without certain modules, but that would take a long time. Is there a simpler method (gradient analysis somehow?) that would apply to a broad set of neural network engineering to figure out what's going on or do I have to design a custom framework for hypothesis falsification for each net (e.g. understanding how specific modules interact with each other and checking for problem/network specific markers)? submitted by /u/danscarafoni [link] [comments]  ( 89 min )
    Auto tagging speech [D]
    Hi, I am trying to build a model where I can tag a speech based on certain intents/selections. For eg, consider the following sentence Hi, I am throwing a party, can you suggest some songs I want it to return party songs Similarly for the following statement Hi, Can you get me a cup of coffee with anything but biscuits to return coffee, not biscuits I was thinking of implementing NER but unsure if that will help detect negative statements (not something) Any guidance will be helpful submitted by /u/1NobodyPeople [link] [comments]  ( 88 min )
    [D] Neurodivergent folks, how do you cope with job rejections and improve?
    I have ADHD and Anxiety disorder. I take prescribed meds everyday. I never mention about my disorder in my applications or the interviews because I feel I am playing the 'pity card'. I am a fresh grad (Masters) and I have been constantly getting rejection emails from FAANG to startups for the past 3-4 months. I only got 3 interviews out of the uncountable applications I have submitted with cover letters and curated resumes/CVs. Out of the three interviews, I got straight up rejections after the first interview from the recruiter where they told me I am not the kind of person they are looking for and that my interest does not align with the company. The third interview I got rejected from was at the second stage. From my analysis of that interview I did a 6/10. I was honest and told them …  ( 101 min )
    [D] Should I include Latitude and Longitude values in my housing regression model?
    Building a regression model to predict housing prices in my area submitted by /u/OkBuddyArian [link] [comments]  ( 91 min )
    [D] Loss function in Diffusion models
    In denoising diffusion models, we generally predict eps (noise) and use MSE as loss function. Instead of the eps we can also predict x_0. The same MSE will be applicable here. I am wondering if I can add an auxiliary loss along with the MSE loss. The auxiliary loss will enforce some additional supervision on x_0 (or x_t-1) in the framework. Would that be mathematically incorrect? submitted by /u/ankanbhunia [link] [comments]  ( 89 min )
    [D] Why is tanh almost always used as the output layer of the generator network in GANs?
    Silly question of the day: I’ve been working with GANs for the past 4-5 months both for my bachelor thesis and my new job. From most of the papers I’ve seen, it looks like tanh at the output layer is always almost a given, but I still need to find an argument/paper explaining why that is a good choice. I would assume it’s because it has average around 0, so that should help the network converge faster, hence help with training stabilisation. However if you have any more detailed insight it would be much appreciated. Thanks! submitted by /u/ats678 [link] [comments]  ( 97 min )
    [R] Azure Chatbot Services - All You Need to Know
    The following article zooms in on the main Azure chatbot services and how they interact together. https://exadel.com/news/a-comprehensive-overview-of-azure-chatbot-services submitted by /u/lklimusheuskaja [link] [comments]  ( 88 min )
    [D] The Illustrated Retrieval Transformer [Video]
    Hi folks, This video is a gentle intro to RETRO and retrieval-enhanced language models. It provides a high-level overview of DeepMind's RETRO paper from 2021. https://www.youtube.com/watch?v=sMPq4cVS4kg ​ More details are in the post: https://jalammar.github.io/illustrated-retrieval-transformer/ And the paper: https://www.deepmind.com/publications/improving-language-models-by-retrieving-from-trillions-of-tokens submitted by /u/jayalammar [link] [comments]  ( 107 min )
    [D] Getting super-level table extraction
    Recently, I've been researching extracting tables from image documents. First I tried with pdfs, however, the data extraction libraries like camelot are inconsistent. I found a deep learning model called CascadeTabNet. The detection results are okay but cell recognition is poor. I even found Multi-Type-TD-TSR for table extraction. It uses image processing techniques to find the grids. It performs well on structured and bordered tables. However, it messes up if the cell is not properly aligned. Even if extraction is successful, aggregation of multi-line cells, i.e post-processing, is not very obvious. However, I found this site ExtractTable. The results are super accurate. The site says it uses AI. However, I couldn't find out what kind of deep learning model it uses. submitted by /u/Melodic_Stomach_2704 [link] [comments]  ( 89 min )
    [D] ML in education, what technologies should I focus on?
    I have very little understanding when it comes to ML but I am willing to learn, I have had some practice building models with Computer Vision OpenCV but that's it. What I asking for is some guidance on where to start with this because I'm pretty sure a CV is not the right tool. Note that I'm not asking for help on how to build a model, I would like to know where to start, I remember reading a paper about this exact topic a long time ago but I cannot find it now. What are we trying to do: We have a list of ~10k multi-choice questions, students have full access to these questions and on the final exam a group of questions will be selected and the user must get X right in order to pass. I want a model that is capable of learning from student answers by detecting in which category the student has more difficulties, working to improve the student's knowledge With this data, we hope to provide the student with an insight into how well he/she is prepared to take the final exam. We save a very detailed report of every single exam simulation made by a unique student, each student has its own average data, and there is also a global table of all student's exam reports with averages with non-identifiable data. It would be great if the model is capable of detecting if the user answered incorrectly due to a trap or due to a lack of knowledge, for instance, if the student's answer is very similar to the right answer, and the time taken to answer is below average, it may show a lack of attention and not a lack of knowledge. I would truly appreciate any guidance! submitted by /u/4fucksakem8 [link] [comments]  ( 113 min )
    [D] machine learning vision
    Good evening, I have a concern, I am working on an image classification model that focuses on detecting if the handles of a plastic box are well placed or not. I have been making changes in the training network, adding images in the dataset, trying different configurations in the data augmentation and I have not been able to get it to work correctly. All the training images I have put a size of 90 x 90 pixels, I do not know if the image is too small or directly what my mistake is, so I would like to know if you could give me tips to train more accurately the model I'm doing. Thank you very much and I will be attentive to any comment. submitted by /u/liebheartx [link] [comments]  ( 91 min )
  • Open

    This is what Inferkit generated
    I first added "It was a stormy night and a woman" and then I just kept generating and It turned in to something about god. It was a stormy night and a woman was standing on the pier. She saw a boat approach and begin to list. The water began to rush in, and the boat began to sink. It sank quickly, and the waves and wind began to push and pull the woman, and pull her into the dark sea. She was scared and scared and scared. She began to call out to God. "Why?" she cried. "Why, why me?" "Save me," she cried again. There was silence. No answer. All around her there was darkness, and the rain was falling down so hard. And then, suddenly, a great splash. Suddenly a great big splash. She turned around and saw a bright light approaching. She reached her arms out to it, and it cam…  ( 89 min )
    AI Dream 49 - A real GEM: Early Space-Trip (RAW)
    submitted by /u/LordPewPew777 [link] [comments]  ( 87 min )
    DALL-E 2 open-source alternative Stable Diffusion is now available for download
    submitted by /u/much_successes [link] [comments]  ( 87 min )
    Healing Sounds Mandala: Cleans Aura and Space
    submitted by /u/LordPewPew777 [link] [comments]  ( 87 min )
    AI upscaled Wild West Nic Cage portrait
    Upscaled in Photoshop Beta submitted by /u/ZoNeS_v2 [link] [comments]  ( 87 min )
    AI/AGI in FinTech workshop video
    submitted by /u/akolonin [link] [comments]  ( 87 min )
    New AI Humanoid Robot Tech At 2022 World Robot Conference
    submitted by /u/belindahooper [link] [comments]  ( 92 min )
    OpenAI cuts prices for GPT-3 by two thirds
    submitted by /u/Zirius_Sadfaces [link] [comments]  ( 89 min )
    Art work has created by Dreamstudio AI
    submitted by /u/Viacheslav_Varenia [link] [comments]  ( 86 min )
    How to Run Stable Diffusion (Locally and in the Cloud)
    Check out this article on How to Run Stable Diffusion to get started either on a local machine (if you have a GPU) or in Colab if you don't! It's super easy to follow and you can get started making images like the ones below in just a few minutes! "a vaporwave image of a black hole" https://preview.redd.it/jtul1an99hj91.png?width=512&format=png&auto=webp&s=d04735ad92e06d4fabb3e3ed31c4628ab0a76be0 ​ "a photorealistic image of Iron Man making breakfast" ​ https://preview.redd.it/g8bc0p2a9hj91.png?width=512&format=png&auto=webp&s=f31189d2fad41653eed7b683afa60ca85e4d5fdc "A solarpunk painting of an alien civilization" ​ https://preview.redd.it/11entnba9hj91.png?width=512&format=png&auto=webp&s=4d3d28dfd18696de5514a121f3e0850008732dbe submitted by /u/SleekEagle [link] [comments]  ( 87 min )
    An awesome video visualization about Jesus by AI
    submitted by /u/nalr00n [link] [comments]  ( 87 min )
    What Effects on Artistic Creativity will AI Generated Art Have?
    After getting the chance to play with these things myself, I've found that if I'm using a good model that understands English very well, I can express what I imagine down to the smallest detail and sometimes it will completely capture the image if I'm lucky. As these language models get better and the diffusion models get bigger or more accurate, I can see that it will take more literary skill to generate specific art, while completely taking away the skills necessary to "generate" art in a medium. I'm not saying it will take away creativity, I'm saying it will change the way we need to be creative. I've seen some people argue that this will "take our jobs" when it comes to art generation. I agree that on some level, yes in effect, it will prune the artist industry quite a bit, but it can't be more of an industry disruptor than Adobe Photoshop was. If anything, like with the digital camera revolution, it will take away some specific jobs and create some very specific new jobs as well. In the long run, society will be left in tact, but with some new ways to express itself. Some of these skill sets will go the way of the dinosaur and others will flourish. The real question that this puts in people's face I think is, how do we redefine our identities now that things have changed? submitted by /u/enspiralart [link] [comments]  ( 89 min )
    DARPA's digital tutor: training people to expert level in 16 weeks
    submitted by /u/Ok-Craft-9908 [link] [comments]  ( 88 min )
    "Learn AI Together" Discord community is looking for professionals (junior,mid-level, senior..), grad students, TAs, and professors willing to help people learn AI by answering questions from time to time - also a great place to find interesting people!
    Hey everyone! The Learn AI Together community is getting bigger and bigger with more and more people learning AI, we would love to find more enthusiastic professionals (junior,mid-level, senior..), and grad students, TAs or professors willing to help and exchange with people learning AI by answering questions from time to time. We are an AI-enthusiasts community of over 28'000 people now where members can chat, ask questions, share resources and projects, find people to work with, find job offers, etc. We are now focusing on getting experts or advanced members to join us and help us help others. More info about the community and how to join us (free): Learn AI Together Excited to chat with you there! submitted by /u/OnlyProggingForFun [link] [comments]  ( 88 min )
    Stable Diffusion NotebookReleased Use it for freewith Huggingspace andGo...
    submitted by /u/prfitofthesngularity [link] [comments]  ( 87 min )
    A Comprehensive Overview of Azure Chatbot Services
    Azure provides a variety of powerful chatbot-related services: creating rule-based chatbots, understanding user intents, creating simple FAQ chatbots, and orchestrating all those services. It’s not easy to understand how to integrate different services, though. Complicating matters further are Microsoft’s new services, which don’t have all the functionality and integration capabilities of their old services. In this article, we will review all Azure chatbot services and figure out how they can work together. https://exadel.com/news/a-comprehensive-overview-of-azure-chatbot-services submitted by /u/lklimusheuskaja [link] [comments]  ( 107 min )
    For the First Time – A Robot Has Learned To Imagine Itself
    submitted by /u/Tao_Dragon [link] [comments]  ( 87 min )
    Microsoft's Artificial Intelligence for Beginners
    submitted by /u/pmz [link] [comments]  ( 90 min )
    Stability.Ai Stable Diffusion Public Release (links to UIs in comments)
    submitted by /u/professoreyl [link] [comments]  ( 87 min )
  • Open

    Off-Policy Policy Gradient: Reputable Researchers seem to disagree on the Correct Computation of Importance Sampling Weights
    I've been working with off-policy REINFORCE recently, and the question came up of computing the importance sampling weights. The intuitive solution for me was this: - for a return G_t collected under the behavior policy b, compute the importance sampling ratio using the learned policy \pi and the behavior policy b - Adjust R in the same way as it is done for value function approximation in chapter 5.5 of Sutton and Barto: http://incompleteideas.net/book/RLbook2020.pdf This view seems to be supported by a paper on which Sutton is an author in section 2.3: https://arxiv.org/abs/1205.4839 Here, they use per-step importance sampling, and replace Q_{pi}(s, a) with the importance sampled return (collected using b). Importantly, the compute the importance weights where k=t...T-1. This is intuitive to me, the future return only depends on future states and actions. *** On the other hand, there is Sergey Levine's lecture at Berkeley which directly contradicts this: http://rail.eecs.berkeley.edu/deeprlcourse-fa17/f17docs/lecture_4_policy_gradient.pdf On slide 25, he derives an off-policy PG rule, but only compute the importance sampling ratio using past actions. Being a slideshow, the explanation is very hand-wavy: - "what about causality?"- "future actions don't affect current weight." To me, this is not intuitive, because it seems that future actions matter a lot for determining future rewards. Either way, these very reputable researchers seem to be directly contradicting each other? Who is right? submitted by /u/green-top [link] [comments]  ( 92 min )
    How do you handle neural network outputs in a game where multiple choices can be made and when there are illegal moves?
    Illegal moves in other words 'grayed out', against the logic/hard-rules of the game. submitted by /u/Ninjaxas [link] [comments]  ( 104 min )
    Does PPO n_steps need to be more than a typical episode length?
    Lets say a game on average terminates in about 4000 steps, when the done condition is fulfiled. If I want to train a PPO on that game, does the n_steps need to be at least 4000 so that it learns when the game is completed successfully or it doesnt matter? I was wondering once the max PPO steps is reached in the for loop, does the environment get reset or does it continue from the previous max PPo steps? Thank you! submitted by /u/Playful_Shop_8165 [link] [comments]  ( 101 min )
    Best Books to Learn Reinforcement Learning in 2022
    submitted by /u/Lakshmireddys [link] [comments]  ( 87 min )
    PPO with e-greedy.
    Greetings! Is there a way to enhance the PPO (Proximal Policy Optimization) algorithm with e-greedy exploration, I will still have the default Normal Distribution with decaying std, yet with a High-decaying chance ε, I will sample randomly (uniformly) from my action-space with chance of ε and sample according to Normal dist with a chance of 1-ε. Does it make any sense? How will I handle the logprobs, is it possible? What you guys think? submitted by /u/White_Sirilo [link] [comments]  ( 88 min )
    On-policy algorithms with external actions
    Hello :D, In on-policy algorithms (such as PPO), the agent needs to act based on the latest version of the actor network. I basically tried to feed the agent with external expert demonstrations that lead to good rewards, but the learning was so bad. Mathematically (in PPO for example), why would this not work? ​ Thank you :D submitted by /u/AhmedNizam_ [link] [comments]  ( 103 min )
    Have Shapley values been applied to help make DRL more interpretable? I’ve been looking to see if this has been done before but haven’t found anything yet
    submitted by /u/elonmusk12345_ [link] [comments]  ( 87 min )
  • Open

    DSC Weekly 23 August 2022: Five Billion Person Graph – Grand Achievement or Wakeup Call?
    A graph with five billion people (nearly everyone on the Internet) brings up both ethical questions and opportunities. The post DSC Weekly 23 August 2022: Five Billion Person Graph – Grand Achievement or Wakeup Call? appeared first on Data Science Central.  ( 21 min )
    The MPAI-AIF V2 Call for Technologies
    Moving Picture, Audio, and Data Coding by Artificial Intelligence (MPAI), an international unaffiliated not-for-profit organization, develops AI-based Data Coding standards with associated clear licensing frameworks. The post The MPAI-AIF V2 Call for Technologies appeared first on Data Science Central.  ( 19 min )
  • Open

    Conduct what-if analyses with Amazon Forecast, up to 80% faster than before
    Now with Amazon Forecast, you can seamlessly conduct what-if analyses up to 80% faster to analyze and quantify the potential impact of business levers on your demand forecasts. Forecast is a service that uses machine learning (ML) to generate accurate demand forecasts, without requiring any ML experience. Simulating scenarios through what-if analyses is a powerful […]  ( 9 min )
  • Open

    UVQ: Measuring YouTube's Perceptual Video Quality
    Posted by Yilin Wang, Staff Software Engineer, YouTube and Feng Yang, Senior Staff Software Engineer, Google Research Online video sharing platforms, like YouTube, need to understand perceptual video quality (i.e., a user's subjective perception of video quality) in order to better optimize and improve user experience. Video quality assessment (VQA) attempts to build a bridge between video signals and perceptual quality by using objective mathematical models to approximate the subjective opinions of users. Traditional video quality metrics, like peak signal-to-noise ratio (PSNR) and Video Multi-Method Assessment Fusion (VMAF), are reference-based and focus on the relative difference between the target and reference videos. Such metrics, which work best on professionally generated content …  ( 27 min )
  • Open

    Variational Autoencoder reconstruction loss hyper-parameter
    For a VAE, the total loss computation (usually) is: total_loss = (alpha * recon_loss) + (beta * kl_loss) Here alpha and beta are hyper-parameters for recon_loss (reconstruction loss) and kl_loss (KL-divergence loss). One common way to make the VAE's synthesis/generation better approximate/produce the original data is by increasing alpha hyper-parameter. However, as you keep increasing alpha, you are deviating away from a multivariate, standard Gaussian distribution. After a VAE has been trained, you generate new samples by sampling z from a standard Gaussian distribution as- z = tf.random.normal( shape = (batch_size, latent_space_dim), mean = 0.0, stddev = 1.0, dtype = tf.float32 ) This z is then fed through the VAE's decoder to get the generated samples. But, since the encoded latent space distribution has diverged from a standard Gaussian distribution, sampling z as above would lead to worse synthesis. How can you avoid this? Or, put differently, what's the appropriate way to sample from your latent space distribution for a high value of alpha? submitted by /u/grid_world [link] [comments]  ( 88 min )
  • Open

    Learn How Leading Companies Are Building AI Centers of Excellence, at NVIDIA GTC
    AI Centers of Excellence are organizational units dedicated to implementing a company-wide AI vision. They help identify business use cases, create an implementation roadmap, accelerate adoption, assess impact and more. NVIDIA GTC, a global conference on AI and the metaverse, brings together the world’s top business and technology leaders who’ve embraced artificial intelligence to transform Read article > The post Learn How Leading Companies Are Building AI Centers of Excellence, at NVIDIA GTC appeared first on NVIDIA Blog.  ( 5 min )
    Shelter From the Storm: AI Helps Gauge Catastrophe Risks
    Floods in Kentucky and wildfires in California are the kinds of disasters companies of all sorts are trying to address with AI. Tom Rikert, co-founder and CEO of San Francisco-based startup Masterful AI, is one of many experts helping them manage catastrophe risk. In the U.S. alone, the National Association of Insurance Commissioners estimates that Read article > The post Shelter From the Storm: AI Helps Gauge Catastrophe Risks appeared first on NVIDIA Blog.  ( 6 min )
    Predict, Detect, Mitigate: AI for Climate Science Takes the Stage at NVIDIA GTC
    Recent AI advances enable modeling of weather forecasting 4-5 magnitudes faster than traditional computing methods. The brightest leaders, researchers and developers in climate science, high performance computing and AI will discuss such technology breakthroughs — and how they can help foster a greener Earth — at NVIDIA GTC. The virtual conference, running Sept. 19-22, also Read article > The post Predict, Detect, Mitigate: AI for Climate Science Takes the Stage at NVIDIA GTC appeared first on NVIDIA Blog.  ( 5 min )
    3D Artists Reimagine, Remaster Iconic European Architecture This Week ‘In the NVIDIA Studio’
    A triple threat steps In the NVIDIA Studio this week: a tantalizing trio of talented 3D artists who each reimagined and remastered classic European buildings with individualistic flair. The post 3D Artists Reimagine, Remaster Iconic European Architecture This Week ‘In the NVIDIA Studio’ appeared first on NVIDIA Blog.  ( 9 min )
  • Open

    Pythagoras on a sphere
    Suppose you drive a distance a, turn right, then drive a distance b. How far are you from where you started? If a and b are relatively small, then the answer is given by the Pythagorean theorem. Small here means small relative to the radius of the Earth. Any distance you drive is small enough: […] Pythagoras on a sphere first appeared on John D. Cook.  ( 6 min )
    Chimera and sine of 60°
    I was playing around with DALL-E last night. I pasted the definition of chimera into DALL-E and the results were bizarre. See this Twitter thread for images. I also played around with some more mnemonic images. Students often memorize the sines and cosines of 30°, 45°, and 60°. I thought about making a mnemonic image […] Chimera and sine of 60° first appeared on John D. Cook.  ( 5 min )
  • Open

    Machine Learning Algorithms — Top 5 Examples in Real Life
    Find impressive examples of ML that we use every day.  ( 21 min )
  • Open

    Regret Analysis of Certainty Equivalence Policies in Continuous-Time Linear-Quadratic Systems. (arXiv:2206.04434v2 [cs.LG] UPDATED)
    This work theoretically studies a ubiquitous reinforcement learning policy for controlling the canonical model of continuous-time stochastic linear-quadratic systems. We show that randomized certainty equivalent policy addresses the exploration-exploitation dilemma in linear control systems that evolve according to unknown stochastic differential equations and their operating cost is quadratic. More precisely, we establish square-root of time regret bounds, indicating that randomized certainty equivalent policy learns optimal control actions fast from a single state trajectory. Further, linear scaling of the regret with the number of parameters is shown. The presented analysis introduces novel and useful technical approaches, and sheds light on fundamental challenges of continuous-time reinforcement learning.
    Uncertainty-Aware Mixed-Variable Machine Learning for Materials Design. (arXiv:2207.04994v2 [stat.ML] UPDATED)
    Data-driven design shows the promise of accelerating materials discovery but is challenging due to the prohibitive cost of searching the vast design space of chemistry, structure, and synthesis methods. Bayesian Optimization (BO) employs uncertainty-aware machine learning models to select promising designs to evaluate, hence reducing the cost. However, BO with mixed numerical and categorical variables, which is of particular interest in materials design, has not been well studied. In this work, we survey frequentist and Bayesian approaches to uncertainty quantification of machine learning with mixed variables. We then conduct a systematic comparative study of their performances in BO using a popular representative model from each group, the random forest-based Lolo model (frequentist) and the latent variable Gaussian process model (Bayesian). We examine the efficacy of the two models in the optimization of mathematical functions, as well as properties of structural and functional materials, where we observe performance differences as related to problem dimensionality and complexity. By investigating the machine learning models' predictive and uncertainty estimation capabilities, we provide interpretations of the observed performance differences. Our results provide practical guidance on choosing between frequentist and Bayesian uncertainty-aware machine learning models for mixed-variable BO in materials design.
    A semi-supervised methodology for fishing activity detection using the geometry behind the trajectory of multiple vessels. (arXiv:2207.05514v2 [cs.LG] UPDATED)
    Automatic Identification System (AIS) messages are useful for tracking vessel activity across oceans worldwide using radio links and satellite transceivers. Such data plays a significant role in tracking vessel activity and mapping mobility patterns such as those found in fishing. Accordingly, this paper proposes a geometric-driven semi-supervised approach for fishing activity detection from AIS data. Through the proposed methodology we show how to explore the information included in the messages to extract features describing the geometry of the vessel route. To this end, we leverage the unsupervised nature of cluster analysis to label the trajectory geometry highlighting the changes in the vessel's moving pattern which tends to indicate fishing activity. The labels obtained by the proposed unsupervised approach are used to detect fishing activities, which we approach as a time-series classification task. In this context, we propose a solution using recurrent neural networks on AIS data streams with roughly 87% of the overall $F$-score on the whole trajectories of 50 different unseen fishing vessels. Such results are accompanied by a broad benchmark study assessing the performance of different Recurrent Neural Network (RNN) architectures. In conclusion, this work contributes by proposing a thorough process that includes data preparation, labeling, data modeling, and model validation. Therefore, we present a novel solution for mobility pattern detection that relies upon unfolding the trajectory in time and observing their inherent geometry.
    Enhancing Zero-Shot Many to Many Voice Conversion with Self-Attention VAE. (arXiv:2203.16037v2 [cs.SD] UPDATED)
    Variational auto-encoder (VAE) is an effective neural network architecture to disentangle a speech utterance into speaker identity and linguistic content latent embeddings, then generate an utterance for a target speaker from that of a source speaker. This is possible by concatenating the identity embedding of the target speaker and the content embedding of the source speaker uttering a desired sentence. In this work, we propose to improve VAE models with self-attention and structural regularization (RGSM). Specifically, we found a suitable location of VAE's decoder to add a self-attention layer for incorporating non-local information in generating a converted utterance and hiding the source speaker's identity. We applied relaxed group-wise splitting method (RGSM) to regularize network weights and remarkably enhance generalization performance. In experiments of zero-shot many-to-many voice conversion task on VCTK data set, with the self-attention layer and relaxed group-wise splitting method, our model achieves a gain of speaker classification accuracy on unseen speakers by 28.3\% while slightly improved conversion voice quality in terms of MOSNet scores. Our encouraging findings point to future research on integrating more variety of attention structures in VAE framework while controlling model size and overfitting for advancing zero-shot many-to-many voice conversions.  ( 3 min )
    Simultaneously Learning Stochastic and Adversarial Bandits with General Graph Feedback. (arXiv:2206.07908v2 [cs.LG] UPDATED)
    The problem of online learning with graph feedback has been extensively studied in the literature due to its generality and potential to model various learning tasks. Existing works mainly study the adversarial and stochastic feedback separately. If the prior knowledge of the feedback mechanism is unavailable or wrong, such specially designed algorithms could suffer great loss. To avoid this problem, \citet{erez2021towards} try to optimize for both environments. However, they assume the feedback graphs are undirected and each vertex has a self-loop, which compromises the generality of the framework and may not be satisfied in applications. With a general feedback graph, the observation of an arm may not be available when this arm is pulled, which makes the exploration more expensive and the algorithms more challenging to perform optimally in both environments. In this work, we overcome this difficulty by a new trade-off mechanism with a carefully-designed proportion for exploration and exploitation. We prove the proposed algorithm simultaneously achieves $\mathrm{poly} \log T$ regret in the stochastic setting and minimax-optimal regret of $\tilde{O}(T^{2/3})$ in the adversarial setting where $T$ is the horizon and $\tilde{O}$ hides parameters independent of $T$ as well as logarithmic terms. To our knowledge, this is the first best-of-both-worlds result for general feedback graphs.
    PatchNR: Learning from Small Data by Patch Normalizing Flow Regularization. (arXiv:2205.12021v2 [cs.LG] UPDATED)
    Learning neural networks using only a small amount of data is an important ongoing research topic with tremendous potential for applications. In this paper, we introduce a regularizer for the variational modeling of inverse problems in imaging based on normalizing flows. Our regularizer, called patchNR, involves a normalizing flow learned on patches of very few images. In particular, the training is independent from the considered inverse problem such that the same regularizer can be used for different forward operators acting on the same class of images. By investigating the distribution of patches versus those of the whole image class, we prove that our variational model is indeed a MAP approach. Our model can be generalized to conditional patchNRs, if additional supervised information is available. Numerical examples for superresolution of material images and low-dose or limited-angle computed tomography (CT) demonstrate that our method provides high quality results among methods with similar assumptions, but requires only few data.  ( 2 min )
    Exploring the Limits of Synthetic Creation of Solar EUV Images via Image-to-Image Translation. (arXiv:2208.09512v1 [astro-ph.SR])
    The Solar Dynamics Observatory (SDO), a NASA multi-spectral decade-long mission that has been daily producing terabytes of observational data from the Sun, has been recently used as a use-case to demonstrate the potential of machine learning methodologies and to pave the way for future deep-space mission planning. In particular, the idea of using image-to-image translation to virtually produce extreme ultra-violet channels has been proposed in several recent studies, as a way to both enhance missions with less available channels and to alleviate the challenges due to the low downlink rate in deep space. This paper investigates the potential and the limitations of such a deep learning approach by focusing on the permutation of four channels and an encoder--decoder based architecture, with particular attention to how morphological traits and brightness of the solar surface affect the neural network predictions. In this work we want to answer the question: can synthetic images of the solar corona produced via image-to-image translation be used for scientific studies of the Sun? The analysis highlights that the neural network produces high-quality images over three orders of magnitude in count rate (pixel intensity) and can generally reproduce the covariance across channels within a 1% error. However the model performance drastically diminishes in correspondence of extremely high energetic events like flares, and we argue that the reason is related to the rareness of such events posing a challenge to model training.  ( 3 min )
    A Generic Self-Supervised Framework of Learning Invariant Discriminative Features. (arXiv:2202.06914v2 [cs.LG] UPDATED)
    Self-supervised learning (SSL) has become a popular method for generating invariant representations without the need for human annotations. Nonetheless, the desired invariant representation is achieved by utilising prior online transformation functions on the input data. As a result, each SSL framework is customised for a particular data type, e.g., visual data, and further modifications are required if it is used for other dataset types. On the other hand, autoencoder (AE), which is a generic and widely applicable framework, mainly focuses on dimension reduction and is not suited for learning invariant representation. This paper proposes a generic SSL framework based on a constrained self-labelling assignment process that prevents degenerate solutions. Specifically, the prior transformation functions are replaced with a self-transformation mechanism, derived through an unsupervised training process of adversarial training, for imposing invariant representations. Via the self-transformation mechanism, pairs of augmented instances can be generated from the same input data. Finally, a training objective based on contrastive learning is designed by leveraging both the self-labelling assignment and the self-transformation mechanism. Despite the fact that the self-transformation process is very generic, the proposed training strategy outperforms a majority of state-of-the-art representation learning methods based on AE structures. To validate the performance of our method, we conduct experiments on four types of data, namely visual, audio, text, and mass spectrometry data, and compare them in terms of four quantitative metrics. Our comparison results indicate that the proposed method demonstrate robustness and successfully identify patterns within the datasets.  ( 3 min )
    ASE: Anomaly Scoring Based Ensemble Learning for Imbalanced Datasets. (arXiv:2203.10769v3 [cs.LG] UPDATED)
    Nowadays, many classification algorithms have been applied to various industries to help them work out their problems met in real-life scenarios. However, in many binary classification tasks, samples in the minority class only make up a small part of all instances, which leads to the datasets we get usually suffer from high imbalance ratio. Existing models sometimes treat minority classes as noise or ignore them as outliers encountering data skewing. In order to solve this problem, we propose a bagging ensemble learning framework $ASE$ (Anomaly Scoring Based Ensemble Learning). This framework has a scoring system based on anomaly detection algorithms which can guide the resampling strategy by divided samples in the majority class into subspaces. Then specific number of instances will be under-sampled from each subspace to construct subsets by combining with the minority class. And we calculate the weights of base classifiers trained by the subsets according to the classification result of the anomaly detection model and the statistics of the subspaces. Experiments have been conducted which show that our ensemble learning model can dramatically improve the performance of base classifiers and is more efficient than other existing methods under a wide range of imbalance ratio, data scale and data dimension. $ASE$ can be combined with various classifiers and every part of our framework has been proved to be reasonable and necessary.  ( 3 min )
    CandidateDrug4Cancer: An Open Molecular Graph Learning Benchmark on Drug Discovery for Cancer. (arXiv:2203.00836v2 [cs.LG] UPDATED)
    Anti-cancer drug discoveries have been serendipitous, we sought to present the Open Molecular Graph Learning Benchmark, named CandidateDrug4Cancer, a challenging and realistic benchmark dataset to facilitate scalable, robust, and reproducible graph machine learning research for anti-cancer drug discovery. CandidateDrug4Cancer dataset encompasses multiple most-mentioned 29 targets for cancer, covering 54869 cancer-related drug molecules which are ranged from pre-clinical, clinical and FDA-approved. Besides building the datasets, we also perform benchmark experiments with effective Drug Target Interaction (DTI) prediction baselines using descriptors and expressive graph neural networks. Experimental results suggest that CandidateDrug4Cancer presents significant challenges for learning molecular graphs and targets in practical application, indicating opportunities for future researches on developing candidate drugs for treating cancers.  ( 2 min )
    Learning entanglement breakdown as a phase transition by confusion. (arXiv:2202.00348v2 [quant-ph] UPDATED)
    Quantum technologies require methods for preparing and manipulating entangled multiparticle states. However, the problem of determining whether a given quantum state is entangled or separable is known to be an NP-hard problem in general, and even the task of detecting entanglement breakdown for a given class of quantum states is difficult. In this work, we develop an approach for revealing entanglement breakdown using a machine learning technique, which is known as 'learning by confusion'. We consider a family of quantum states, which is parameterized such that there is a single critical value dividing states within this family into separate and entangled. We demonstrate the 'learning by confusion' scheme allows us to determine the critical value. Specifically, we study the performance of the method for the two-qubit, two-qutrit, and two-ququart entangled state. In addition, we investigate the properties of the local depolarization and the generalized amplitude damping channel in the framework of the confusion scheme. Within our approach and setting the parameterization of special trajectories, we obtain an entanglement-breakdown 'phase diagram' of a quantum channel, which indicates regions of entangled (separable) states and the entanglement-breakdown region. Then we extend the way of using the 'learning by confusion' scheme for recognizing whether an arbitrary given state is entangled or separable. We show that the developed method provides correct answers for a variety of states, including entangled states with positive partial transpose. We also present a more practical version of the method, which is suitable for studying entanglement breakdown in noisy intermediate-scale quantum devices. We demonstrate its performance using an available cloud-based IBM quantum processor.  ( 3 min )
    Feature-level augmentation to improve robustness of deep neural networks to affine transformations. (arXiv:2202.05152v4 [cs.CV] UPDATED)
    Recent studies revealed that convolutional neural networks do not generalize well to small image transformations, e.g. rotations by a few degrees or translations of a few pixels. To improve the robustness to such transformations, we propose to introduce data augmentation at intermediate layers of the neural architecture, in addition to the common data augmentation applied on the input images. By introducing small perturbations to activation maps (features) at various levels, we develop the capacity of the neural network to cope with such transformations. We conduct experiments on three image classification benchmarks (Tiny ImageNet, Caltech-256 and Food-101), considering two different convolutional architectures (ResNet-18 and DenseNet-121). When compared with two state-of-the-art stabilization methods, the empirical results show that our approach consistently attains the best trade-off between accuracy and mean flip rate.  ( 2 min )
    Stock Performance Evaluation for Portfolio Design from Different Sectors of the Indian Stock Market. (arXiv:2208.07166v1 [q-fin.PM] CROSS LISTED)
    The stock market offers a platform where people buy and sell shares of publicly listed companies. Generally, stock prices are quite volatile; hence predicting them is a daunting task. There is still much research going to develop more accuracy in stock price prediction. Portfolio construction refers to the allocation of different sector stocks optimally to achieve a maximum return by taking a minimum risk. A good portfolio can help investors earn maximum profit by taking a minimum risk. Beginning with Dow Jones Theory a lot of advancement has happened in the area of building efficient portfolios. In this project, we have tried to predict the future value of a few stocks from six important sectors of the Indian economy and also built a portfolio. As part of the project, our team has conducted a study of the performance of various Time series, machine learning, and deep learning models in stock price prediction on selected stocks from the chosen six important sectors of the economy. As part of building an efficient portfolio, we have studied multiple portfolio optimization theories beginning with the Modern Portfolio theory. We have built a minimum variance portfolio and optimal risk portfolio for all the six chosen sectors by using the daily stock prices over the past five years as training data and have also conducted back testing to check the performance of the portfolio. We look forward to continuing our study in the area of stock price prediction and asset allocation and consider this project as the first stepping stone.
    Learning Multiple Probabilistic Degradation Generators for Unsupervised Real World Image Super Resolution. (arXiv:2201.10747v2 [eess.IV] UPDATED)
    Unsupervised real world super resolution (USR) aims to restore high-resolution (HR) images given low-resolution (LR) inputs, and its difficulty stems from the absence of paired dataset. One of the most common approaches is synthesizing noisy LR images using GANs (i.e., degradation generators) and utilizing a synthetic dataset to train the model in a supervised manner. Although the goal of training the degradation generator is to approximate the distribution of LR images given a HR image, previous works have heavily relied on the unrealistic assumption that the conditional distribution is a delta function and learned the deterministic mapping from the HR image to a LR image. In this paper, we show that we can improve the performance of USR models by relaxing the assumption and propose to train the probabilistic degradation generator. Our probabilistic degradation generator can be viewed as a deep hierarchical latent variable model and is more suitable for modeling the complex conditional distribution. We also reveal the notable connection with the noise injection of StyleGAN. Furthermore, we train multiple degradation generators to improve the mode coverage and apply collaborative learning for ease of training. We outperform several baselines on benchmark datasets in terms of PSNR and SSIM and demonstrate the robustness of our method on unseen data distribution. Code is available at https://github.com/sangyun884/MSSR.  ( 3 min )
    Why Robust Generalization in Deep Learning is Difficult: Perspective of Expressive Power. (arXiv:2205.13863v2 [cs.LG] UPDATED)
    It is well-known that modern neural networks are vulnerable to adversarial examples. To mitigate this problem, a series of robust learning algorithms have been proposed. However, although the robust training error can be near zero via some methods, all existing algorithms lead to a high robust generalization error. In this paper, we provide a theoretical understanding of this puzzling phenomenon from the perspective of expressive power for deep neural networks. Specifically, for binary classification problems with well-separated data, we show that, for ReLU networks, while mild over-parameterization is sufficient for high robust training accuracy, there exists a constant robust generalization gap unless the size of the neural network is exponential in the data dimension $d$. Even if the data is linear separable, which means achieving low clean generalization error is easy, we can still prove an $\exp({\Omega}(d))$ lower bound for robust generalization. In general, our exponential lower bounds hold true for a variety of neural network families and other function classes as well, as long as their VC dimension is at most polynomial in the number of parameters. Moreover, we establish an improved upper bound of $\exp({\mathcal{O}}(k))$ for the network size to achieve low robust generalization error when the data lies on a manifold with intrinsic dimension $k$ ($k \ll d$). Nonetheless, we also have a lower bound that grows exponentially with respect to $k$ -- the curse of dimensionality is inevitable. By demonstrating an exponential separation between the network size for achieving low robust training and generalization error, our results reveal that the hardness of robust generalization may stem from the expressive power of practical models.
    A Twitter-Driven Deep Learning Mechanism for the Determination of Vehicle Hijacking Spots in Cities. (arXiv:2208.10280v1 [cs.CL])
    Vehicle hijacking is one of the leading crimes in many cities. For instance, in South Africa, drivers must constantly remain vigilant on the road in order to ensure that they do not become hijacking victims. This work is aimed at developing a map depicting hijacking spots in a city by using Twitter data. Tweets, which include the keyword "hijacking", are obtained in a designated city of Cape Town, in this work. In order to extract relevant tweets, these tweets are analyzed by using the following machine learning techniques: 1) a Multi-layer Feed-forward Neural Network (MLFNN); 2) Convolutional Neural Network; and Bidirectional Encoder Representations from Transformers (BERT). Through training and testing, CNN achieved an accuracy of 99.66%, while MLFNN and BERT achieve accuracies of 98.99% and 73.99% respectively. In terms of Recall, Precision and F1-score, CNN also achieved the best results. Therefore, CNN was used for the identification of relevant tweets. The relevant reports that it generates are visually presented on a points map of the City of Cape Town. This work used a small dataset of 426 tweets. In future, the use of evolutionary computation will be explored for purposes of optimizing the deep learning models. A mobile application is under development to make this information usable by the general public.  ( 2 min )
    Deconstructed Generation-Based Zero-Shot Model. (arXiv:2204.11280v2 [cs.CV] UPDATED)
    Generation-based methods have captured most of the recent attention in Zero-Shot Learning research. In this paper, we attempt to deconstruct the generator-classifier framework to guide its improvement and extension. We begin by analyzing the generator-learned instance-level distribution by alternating it with a Gaussian distribution. Then we reveal the roles of the class-level distribution and the instance-level distribution learned by the generator in classifier training by decomposing the classifier gradients. We finally conclude with the guidelines for improving the generator-classifier framework from the deconstruction of the generator and the classifier, i.e., (i) The key for the ZSL generator is attribute generalization; and (ii) classifier learning emphasizes mitigating the impact of pseudo unseen samples on decision boundaries between seen classes during training, and reducing the seen-unseen bias. We propose a simple method based on the guidelines. Without complex designs, the proposed method outperforms the state of the art on four public ZSL datasets, which demonstrates the validity of the proposed guidelines. The proposed method is still effective when replacing the generative model with an attribute-to-visual center single mapping model, demonstrating its strong transferability. Codes will be public upon acceptance.  ( 2 min )
    Stochastic Weight Averaging Revisited. (arXiv:2201.00519v3 [cs.LG] UPDATED)
    Averaging neural network weights sampled by a backbone stochastic gradient descent (SGD) is a simple yet effective approach to assist the backbone SGD in finding better optima, in terms of generalization. From a statistical perspective, weight averaging (WA) contributes to variance reduction. Recently, a well-established stochastic weight averaging (SWA) method is proposed, which is featured by the application of a cyclical or high constant (CHC) learning rate schedule (LRS) in the process of generating weight samples for the WA operation. Then a new insight on WA appears, which states that WA helps to discover wider optima and then leads to better generalization. We conduct extensive experimental studies for SWA, involving a dozen modern DNN model structures and a dozen benchmark open-source image, graph, and text datasets. We disentangle contributions of the WA operation and the CHC LRS for SWA, showing that the WA operation in SWA still contributes to variance reduction but does not always lead to wide optima. We show how the statistical and geometric views on SWA reconcile. Based on our experimental findings, we raise a hypothesis that there are global scale geometric structures in the DNN loss landscape that can be discovered by an SGD agent at the early stage of its working period, and such global geometric structures can be exploited by the WA operation. This hypothesis inspires an algorithm design termed periodic SWA (PSWA). We find that PSWA outperforms its backbone SGD remarkably during the early stage of the SGD sampling process, and thus demonstrate that our hypothesis holds. Codes for reproducing the experimental results can be found at https://github.com/ZJLAB-AMMI/PSWA.  ( 3 min )
    Visual Analysis of Neural Architecture Spaces for Summarizing Design Principles. (arXiv:2208.09665v1 [cs.HC])
    Recent advances in artificial intelligence largely benefit from better neural network architectures. These architectures are a product of a costly process of trial-and-error. To ease this process, we develop ArchExplorer, a visual analysis method for understanding a neural architecture space and summarizing design principles. The key idea behind our method is to make the architecture space explainable by exploiting structural distances between architectures. We formulate the pairwise distance calculation as solving an all-pairs shortest path problem. To improve efficiency, we decompose this problem into a set of single-source shortest path problems. The time complexity is reduced from O(kn^2N) to O(knN). Architectures are hierarchically clustered according to the distances between them. A circle-packing-based architecture visualization has been developed to convey both the global relationships between clusters and local neighborhoods of the architectures in each cluster. Two case studies and a post-analysis are presented to demonstrate the effectiveness of ArchExplorer in summarizing design principles and selecting better-performing architectures.  ( 2 min )
    A Domain Generalization Approach for Out-Of-Distribution 12-lead ECG Classification with Convolutional Neural Networks. (arXiv:2208.09656v1 [cs.LG])
    Deep Learning systems have achieved great success in the past few years, even surpassing human intelligence in several cases. As of late, they have also established themselves in the biomedical and healthcare domains, where they have shown a lot of promise, but have not yet achieved widespread adoption. This is in part due to the fact that most methods fail to maintain their performance when they are called to make decisions on data that originate from a different distribution than the one they were trained on, namely Out-Of-Distribution (OOD) data. For example, in the case of biosignal classification, models often fail to generalize well on datasets from different hospitals, due to the distribution discrepancy amongst different sources of data. Our goal is to demonstrate the Domain Generalization problem present between distinct hospital databases and propose a method that classifies abnormalities on 12-lead Electrocardiograms (ECGs), by leveraging information extracted across the architecture of a Deep Neural Network, and capturing the underlying structure of the signal. To this end, we adopt a ResNet-18 as the backbone model and extract features from several intermediate convolutional layers of the network. To evaluate our method, we adopt publicly available ECG datasets from four sources and handle them as separate domains. To simulate the distributional shift present in real-world settings, we train our model on a subset of the domains and leave-out the remaining ones. We then evaluate our model both on the data present at training time (intra-distribution) and the held-out data (out-of-distribution), achieving promising results and surpassing the baseline of a vanilla Residual Network in most of the cases.  ( 3 min )
    Learning Low Bending and Low Distortion Manifold Embeddings: Theory and Applications. (arXiv:2208.10193v1 [math.NA])
    Autoencoders, which consist of an encoder and a decoder, are widely used in machine learning for dimension reduction of high-dimensional data. The encoder embeds the input data manifold into a lower-dimensional latent space, while the decoder represents the inverse map, providing a parametrization of the data manifold by the manifold in latent space. A good regularity and structure of the embedded manifold may substantially simplify further data processing tasks such as cluster analysis or data interpolation. We propose and analyze a novel regularization for learning the encoder component of an autoencoder: a loss functional that prefers isometric, extrinsically flat embeddings and allows to train the encoder on its own. To perform the training it is assumed that for pairs of nearby points on the input manifold their local Riemannian distance and their local Riemannian average can be evaluated. The loss functional is computed via Monte Carlo integration with different sampling strategies for pairs of points on the input manifold. Our main theorem identifies a geometric loss functional of the embedding map as the $\Gamma$-limit of the sampling-dependent loss functionals. Numerical tests, using image data that encodes different explicitly given data manifolds, show that smooth manifold embeddings into latent space are obtained. Due to the promotion of extrinsic flatness, these embeddings are regular enough such that interpolation between not too distant points on the manifold is well approximated by linear interpolation in latent space as one possible postprocessing.  ( 3 min )
    Non-Determinism and the Lawlessness of Machine Learning Code. (arXiv:2206.11834v2 [cs.CY] UPDATED)
    Legal literature on machine learning (ML) tends to focus on harms, and thus tends to reason about individual model outcomes and summary error rates. This focus has masked important aspects of ML that are rooted in its reliance on randomness -- namely, stochasticity and non-determinism. While some recent work has begun to reason about the relationship between stochasticity and arbitrariness in legal contexts, the role of non-determinism more broadly remains unexamined. In this paper, we clarify the overlap and differences between these two concepts, and show that the effects of non-determinism, and consequently its implications for the law, become clearer from the perspective of reasoning about ML outputs as distributions over possible outcomes. This distributional viewpoint accounts for randomness by emphasizing the possible outcomes of ML. Importantly, this type of reasoning is not exclusive with current legal reasoning; it complements (and in fact can strengthen) analyses concerning individual, concrete outcomes for specific automated decisions. By illuminating the important role of non-determinism, we demonstrate that ML code falls outside of the cyberlaw frame of treating "code as law," as this frame assumes that code is deterministic. We conclude with a brief discussion of what work ML can do to constrain the potentially harm-inducing effects of non-determinism, and we indicate where the law must do work to bridge the gap between its current individual-outcome focus and the distributional approach that we recommend.
    Improving Task-free Continual Learning by Distributionally Robust Memory Evolution. (arXiv:2207.07256v2 [cs.LG] UPDATED)
    Task-free continual learning (CL) aims to learn a non-stationary data stream without explicit task definitions and not forget previous knowledge. The widely adopted memory replay approach could gradually become less effective for long data streams, as the model may memorize the stored examples and overfit the memory buffer. Second, existing methods overlook the high uncertainty in the memory data distribution since there is a big gap between the memory data distribution and the distribution of all the previous data examples. To address these problems, for the first time, we propose a principled memory evolution framework to dynamically evolve the memory data distribution by making the memory buffer gradually harder to be memorized with distributionally robust optimization (DRO). We then derive a family of methods to evolve the memory buffer data in the continuous probability measure space with Wasserstein gradient flow (WGF). The proposed DRO is w.r.t the worst-case evolved memory data distribution, thus guarantees the model performance and learns significantly more robust features than existing memory-replay-based methods. Extensive experiments on existing benchmarks demonstrate the effectiveness of the proposed methods for alleviating forgetting. As a by-product of the proposed framework, our method is more robust to adversarial examples than existing task-free CL methods. Code is available on GitHub \url{https://github.com/joey-wang123/DRO-Task-free}
    Local and Global Context-Based Pairwise Models for Sentence Ordering. (arXiv:2110.04291v2 [cs.CL] UPDATED)
    Sentence Ordering refers to the task of rearranging a set of sentences into the appropriate coherent order. For this task, most previous approaches have explored global context-based end-to-end methods using Sequence Generation techniques. In this paper, we put forward a set of robust local and global context-based pairwise ordering strategies, leveraging which our prediction strategies outperform all previous works in this domain. Our proposed encoding method utilizes the paragraph's rich global contextual information to predict the pairwise order using novel transformer architectures. Analysis of the two proposed decoding strategies helps better explain error propagation in pairwise models. This approach is the most accurate pure pairwise model and our encoding strategy also significantly improves the performance of other recent approaches that use pairwise models, including the previous state-of-the-art, demonstrating the research novelty and generalizability of this work. Additionally, we show how the pre-training task for ALBERT helps it to significantly outperform BERT, despite having considerably lesser parameters. The extensive experimental results, architectural analysis and ablation studies demonstrate the effectiveness and superiority of the proposed models compared to the previous state-of-the-art, besides providing a much better understanding of the functioning of pairwise models.  ( 3 min )
    Universal Caching. (arXiv:2205.04860v2 [cs.IT] UPDATED)
    In learning theory, the performance of an online policy is commonly measured in terms of the static regret metric, which compares the cumulative loss of an online policy to that of an optimal benchmark in hindsight. In the definition of static regret, the action of the benchmark policy remains fixed throughout the time horizon. Naturally, the resulting regret bounds become loose in non-stationary settings where fixed actions often suffer from poor performance. In this paper, we investigate a stronger notion of regret minimization in the context of online caching. In particular, we allow the action of the benchmark at any round to be decided by a finite state machine containing any number of states. Popular caching policies, such as LRU and FIFO, belong to this class. Using ideas from the universal prediction literature in information theory, we propose an efficient online caching policy with a sub-linear regret bound. To the best of our knowledge, this is the first data-dependent regret bound known for the caching problem in the universal setting. We establish this result by combining a recently-proposed online caching policy with an incremental parsing algorithm, namely Lempel-Ziv '78. Our methods also yield a simpler learning-theoretic proof of the improved regret bound as opposed to the involved problem-specific combinatorial arguments used in the earlier works.
    Conditional Born machine for Monte Carlo event generation. (arXiv:2205.07674v2 [quant-ph] UPDATED)
    Generative modeling is a promising task for near-term quantum devices, which can use the stochastic nature of quantum measurements as a random source. So called Born machines are purely quantum models and promise to generate probability distributions in a quantum way, inaccessible to classical computers. This paper presents an application of Born machines to Monte Carlo simulations and extends their reach to multivariate and conditional distributions. Models are run on (noisy) simulators and IBM Quantum superconducting quantum hardware. More specifically, Born machines are used to generate muonic force carrier (MFC) events resulting from scattering processes between muons and the detector material in high-energy physics colliders experiments. MFCs are bosons appearing in beyond-the-standard-model theoretical frameworks, which are candidates for dark matter. Empirical evidence suggests that Born machines can reproduce the marginal distributions and correlations of data sets from Monte Carlo simulations.
    DeePKS+ABACUS as a Bridge between Expensive Quantum Mechanical Models and Machine Learning Potentials. (arXiv:2206.10093v2 [physics.chem-ph] UPDATED)
    Recently, the development of machine learning (ML) potentials has made it possible to perform large-scale and long-time molecular simulations with the accuracy of quantum mechanical (QM) models. However, for high-level QM methods, such as density functional theory (DFT) at the meta-GGA level and/or with exact exchange, quantum Monte Carlo, etc., generating a sufficient amount of data for training a ML potential has remained computationally challenging due to their high cost. In this work, we demonstrate that this issue can be largely alleviated with Deep Kohn-Sham (DeePKS), a ML-based DFT model. DeePKS employs a computationally efficient neural network-based functional model to construct a correction term added upon a cheap DFT model. Upon training, DeePKS offers closely-matched energies and forces compared with high-level QM method, but the number of training data required is orders of magnitude less than that required for training a reliable ML potential. As such, DeePKS can serve as a bridge between expensive QM models and ML potentials: one can generate a decent amount of high-accuracy QM data to train a DeePKS model, and then use the DeePKS model to label a much larger amount of configurations to train a ML potential. This scheme for periodic systems is implemented in a DFT package ABACUS, which is open-source and ready for use in various applications.
    Looking For A Match: Self-supervised Clustering For Automatic Doubt Matching In e-learning Platforms. (arXiv:2208.09600v1 [cs.LG])
    Recently, e-learning platforms have grown as a place where students can post doubts (as a snap taken with smart phones) and get them resolved in minutes. However, the significant increase in the number of student-posted doubts with high variance in quality on these platforms not only presents challenges for teachers' navigation to address them but also increases the resolution time per doubt. Both are not acceptable, as high doubt resolution time hinders the students learning progress. This necessitates ways to automatically identify if there exists a similar doubt in repository and then serve it to the teacher as the plausible solution to validate and communicate with the student. Supervised learning techniques (like Siamese architecture) require labels to identify the matches, which is not feasible as labels are scarce and expensive. In this work, we, thus, developed a label-agnostic doubt matching paradigm based on the representations learnt via self-supervised technique. Building on prior theoretical insights of BYOL (bootstrap your own latent space), we propose custom BYOL which combines domain-specific augmentation with contrastive objective over a varied set of appropriately constructed data views. Results highlighted that, custom BYOL improves the top-1 matching accuracy by approximately 6\% and 5\% as compared to both BYOL and supervised learning instances, respectively. We further show that both BYOL-based learning instances performs either on par or better than human labeling.  ( 3 min )
    Critical Bach Size Minimizes Stochastic First-Order Oracle Complexity of Deep Learning Optimizer using Hyperparameters Close to One. (arXiv:2208.09814v1 [cs.LG])
    Practical results have shown that deep learning optimizers using small constant learning rates, hyperparameters close to one, and large batch sizes can find the model parameters of deep neural networks that minimize the loss functions. We first show theoretical evidence that the momentum method (Momentum) and adaptive moment estimation (Adam) perform well in the sense that the upper bound of the theoretical performance measure is small with a small constant learning rate, hyperparameters close to one, and a large batch size. Next, we show that there exists a batch size called the critical batch size minimizing the stochastic first-order oracle (SFO) complexity, which is the stochastic gradient computation cost, and that SFO complexity increases once the batch size exceeds the critical batch size. Finally, we provide numerical results that support our theoretical results. That is, the numerical results indicate that Adam using a small constant learning rate, hyperparameters close to one, and the critical batch size minimizing SFO complexity has faster convergence than Momentum and stochastic gradient descent (SGD).  ( 2 min )
    A Personalized Dialogue Generator with Implicit User Persona Detection. (arXiv:2204.07372v2 [cs.CL] UPDATED)
    Current works in the generation of personalized dialogue primarily contribute to the agent presenting a consistent personality and driving a more informative response. However, we found that the generated responses from most previous models tend to be self-centered, with little care for the user in the dialogue. Moreover, we consider that human-like conversation is essentially built based on inferring information about the persona of the other party. Motivated by this, we propose a novel personalized dialogue generator by detecting an implicit user persona. Because it is hard to collect a large number of detailed personas for each user, we attempted to model the user's potential persona and its representation from dialogue history, with no external knowledge. The perception and fader variables were conceived using conditional variational inference. The two latent variables simulate the process of people being aware of each other's persona and producing a corresponding expression in conversation. Finally, posterior-discriminated regularization was presented to enhance the training procedure. Empirical studies demonstrate that, compared to state-of-the-art methods, our approach is more concerned with the user's persona and achieves a considerable boost across the evaluations.  ( 2 min )
    BYEL : Bootstrap Your Emotion Latent. (arXiv:2207.10003v2 [cs.LG] UPDATED)
    With the improved performance of deep learning, the number of studies trying to apply deep learning to human emotion analysis is increasing rapidly. But even with this trend going on, it is still difficult to obtain high-quality images and annotations. For this reason, the Learning from Synthetic Data (LSD) Challenge, which learns from synthetic images and infers from real images, is one of the most interesting areas. In general, Domain Adaptation methods are widely used to address LSD challenges, but there is a limitation that target domains (real images) are still needed. Focusing on these limitations, we propose a framework Bootstrap Your Emotion Latent (BYEL), which uses only synthetic images in training. BYEL is implemented by adding Emotion Classifiers and Emotion Vector Subtraction to the BYOL framework that performs well in Self-Supervised Representation Learning. We train our framework using synthetic images generated from the Aff-wild2 dataset and evaluate it using real images from the Aff-wild2 dataset. The result shows that our framework (0.3084) performs 2.8% higher than the baseline (0.3) on the macro F1 score metric.
    Stability of Image-Reconstruction Algorithms. (arXiv:2206.07128v2 [math.OC] UPDATED)
    Robustness and stability of image-reconstruction algorithms have recently come under scrutiny. Their importance to medical imaging cannot be overstated. We review the known results for the topical variational regularization strategies ($\ell_2$ and $\ell_1$ regularization) and present novel stability results for $\ell_p$-regularized linear inverse problems for $p\in(1,\infty)$. Our results guarantee Lipschitz continuity for small $p$ and H\"{o}lder continuity for larger $p$. They generalize well to the $L_p(\Omega)$ function spaces.
    Transferring Chemical and Energetic Knowledge Between Molecular Systems with Machine Learning. (arXiv:2205.03339v2 [physics.chem-ph] UPDATED)
    Predicting structural and energetic properties of a molecular system is one of the fundamental tasks in molecular simulations, and it has use cases in chemistry, biology, and medicine. In the past decade, the advent of machine learning algorithms has impacted on molecular simulations for various tasks, including property prediction of atomistic systems. In this paper, we propose a novel methodology for transferring knowledge obtained from simple molecular systems to a more complex one, possessing a significantly larger number of atoms and degrees of freedom. In particular, we focus on the classification of high and low free-energy states. Our approach relies on utilizing (i) a novel hypergraph representation of molecules, encoding all relevant information for characterizing the potential energy of a conformation, and (ii) novel message passing and pooling layers for processing and making predictions on such hypergraph-structured data. Despite the complexity of the problem, our results show a remarkable AUC of 0.92 for transfer learning from tri-alanine to the deca-alanine system. Moreover, we show that the very same transfer learning approach can be used to group, in an unsupervised way, various secondary structures of deca-alanine in clusters having similar free-energy values. Our study represents a proof of concept that reliable transfer learning models for molecular systems can be designed paving the way to unexplored routes in prediction of structural and energetic properties of biologically relevant systems.
    VC Theoretical Explanation of Double Descent. (arXiv:2205.15549v2 [stat.ML] UPDATED)
    There has been growing interest in generalization performance of large multilayer neural networks that can be trained to achieve zero training error, while generalizing well on test data. This regime is known as 'second descent' and it appears to contradict the conventional view that optimal model complexity should reflect an optimal balance between underfitting and overfitting, i.e., the bias-variance trade-off. This paper presents a VC-theoretical analysis of double descent and shows that it can be fully explained by classical VC-generalization bounds. We illustrate an application of analytic VC-bounds for modeling double descent for classification problems, using empirical results for several learning methods, such as SVM, Least Squares, and Multilayer Perceptron classifiers. In addition, we discuss several reasons for the misinterpretation of VC-theoretical results in Deep Learning community.
    Undersampling is a Minimax Optimal Robustness Intervention in Nonparametric Classification. (arXiv:2205.13094v2 [cs.LG] UPDATED)
    While a broad range of techniques have been proposed to tackle distribution shift, the simple baseline of training on an $\textit{undersampled}$ dataset often achieves close to state-of-the-art-accuracy across several popular benchmarks. This is rather surprising, since undersampling algorithms discard excess majority group data. To understand this phenomenon, we ask if learning is fundamentally constrained by a lack of minority group samples. We prove that this is indeed the case in the setting of nonparametric binary classification. Our results show that in the worst case, an algorithm cannot outperform undersampling unless there is a high degree of overlap between the train and test distributions (which is unlikely to be the case in real-world datasets), or if the algorithm leverages additional structure about the distribution shift. In particular, in the case of label shift we show that there is always an undersampling algorithm that is minimax optimal. While in the case of group-covariate shift we show that there is an undersampling algorithm that is minimax optimal when the overlap between the group distributions is small. We also perform an experimental case study on a label shift dataset and find that in line with our theory the test accuracy of robust neural network classifiers is constrained by the number of minority samples.
    Meta Learning for High-dimensional Ising Model Selection Using $\ell_1$-regularized Logistic Regression. (arXiv:2208.09539v1 [cs.LG])
    In this paper, we consider the meta learning problem for estimating the graphs associated with high-dimensional Ising models, using the method of $\ell_1$-regularized logistic regression for neighborhood selection of each node. Our goal is to use the information learned from the auxiliary tasks in the learning of the novel task to reduce its sufficient sample complexity. To this end, we propose a novel generative model as well as an improper estimation method. In our setting, all the tasks are \emph{similar} in their \emph{random} model parameters and supports. By pooling all the samples from the auxiliary tasks to \emph{improperly} estimate a single parameter vector, we can recover the true support union, assumed small in size, with a high probability with a sufficient sample complexity of $\Omega(1) $ per task, for $K = \Omega(d^3 \log p ) $ tasks of Ising models with $p$ nodes and a maximum neighborhood size $d$. Then, with the support for the novel task restricted to the estimated support union, we prove that consistent neighborhood selection for the novel task can be obtained with a reduced sufficient sample complexity of $\Omega(d^3 \log d)$.
    A Boosting Algorithm for Positive-Unlabeled Learning. (arXiv:2205.09485v3 [cs.LG] UPDATED)
    Positive-unlabeled (PU) learning deals with binary classification problems when only positive (P) and unlabeled (U) data are available. A lot of PU methods based on linear models and neural networks have been proposed; however, there is still a lack of study on boosting algorithms for PU learning, while a traditional boosting algorithm with simple base learners may perform better than neural networks. We propose a novel boosting algorithm for PU learning: Ada-PU, which compares against neural networks. Ada-PU follows the general procedure of AdaBoost, while P data are regarded as positive and negative simultaneously. Three distributions of PU data are maintained and updated in Ada-PU instead of one in the ordinary supervised (PN) learning. After a weak classifier is learned on the newly updated distribution, the corresponding weight of the classifier for the final ensemble is estimated using only PU data. We demonstrated that the proposed method is guaranteed to keep three theoretical properties of boosting algorithms with a defined set of base classifiers. In experiments, we showed that Ada-PU outperforms neural networks on benchmark PU datasets. We also study a real-world dataset UNSW-NB15 in cyber security and demonstrated that Ada-PU has superior performance for malicious activity detection.
    Byzantines can also Learn from History: Fall of Centered Clipping in Federated Learning. (arXiv:2208.09894v1 [cs.LG])
    The increasing popularity of the federated learning framework due to its success in a wide range of collaborative learning tasks also induces certain security concerns regarding the learned model due to the possibility of malicious clients participating in the learning process. Hence, the objective is to neutralize the impact of the malicious participants and to ensure the final model is trustable. One common observation regarding the Byzantine attacks is that the higher the variance among the clients' models/updates, the more space for attacks to be hidden. To this end, it has been recently shown that by utilizing momentum, thus reducing the variance, it is possible to weaken the strength of the known Byzantine attacks. The Centered Clipping framework (ICML 2021) has further shown that, besides reducing the variance, the momentum term from the previous iteration can be used as a reference point to neutralize the Byzantine attacks and show impressive performance against well-known attacks. However, in the scope of this work, we show that the centered clipping framework has certain vulnerabilities, and existing attacks can be revised based on these vulnerabilities to circumvent the centered clipping defense. Hence, we introduce a strategy to design an attack to circumvent the centered clipping framework and numerically illustrate its effectiveness against centered clipping as well as other known defense strategies by reducing test accuracy to 5-40 on best-case scenarios.
    Friendly Noise against Adversarial Noise: A Powerful Defense against Data Poisoning Attacks. (arXiv:2208.10224v1 [cs.CR])
    A powerful category of data poisoning attacks modify a subset of training examples by small adversarial perturbations to change the prediction of certain test-time data. Existing defense mechanisms are not desirable to deploy in practice, as they often drastically harm the generalization performance, or are attack-specific and prohibitively slow to apply. Here, we propose a simple but highly effective approach that unlike existing methods breaks various types of poisoning attacks with the slightest drop in the generalization performance. We make the key observation that attacks exploit sharp loss regions to craft adversarial perturbations which can substantially alter examples' gradient or representations under small perturbations. To break poisoning attacks, our approach comprises two components: an optimized friendly noise that is generated to maximally perturb examples without degrading the performance, and a random varying noise component. The first component takes examples farther away from the sharp loss regions, and the second component smooths out the loss landscape. The combination of both components builds a very light-weight but extremely effective defense against the most powerful triggerless targeted and hidden-trigger backdoor poisoning attacks, including Gradient Matching, Bulls-eye Polytope, and Sleeper Agent. We show that our friendly noise is transferable to other architectures, and adaptive attacks cannot break our defense due to its random noise component.
    Learning Downstream Task by Selectively Capturing Complementary Knowledge from Multiple Self-supervisedly Learning Pretexts. (arXiv:2204.05248v2 [cs.LG] UPDATED)
    Self-supervised learning (SSL), as a newly emerging unsupervised representation learning paradigm, generally follows a two-stage learning pipeline: 1) learning invariant and discriminative representations with auto-annotation pretext(s), then 2) transferring the representations to assist downstream task(s). Such two stages are usually implemented separately, making the learned representation learned agnostic to the downstream tasks. Currently, most works are devoted to exploring the first stage. Whereas, it is less studied on how to learn downstream tasks with limited labeled data using the already learned representations. Especially, it is crucial and challenging to selectively utilize the complementary representations from diverse pretexts for a downstream task. In this paper, we technically propose a novel solution by leveraging the attention mechanism to adaptively squeeze suitable representations for the tasks. Meanwhile, resorting to information theory, we theoretically prove that gathering representation from diverse pretexts is more effective than a single one. Extensive experiments validate that our scheme significantly exceeds current popular pretext-matching based methods in gathering knowledge and relieving negative transfer in downstream tasks.
    Study of Novel Sparse Array Design Based on the Maximum Inter-Element Spacing Criterion. (arXiv:2208.09574v1 [cs.LG])
    A novel sparse array (SA) structure is proposed based on the maximum inter-element spacing (IES) constraint (MISC) criterion. Compared with the traditional MISC array, the proposed SA configurations, termed as improved MISC (IMISC) has significantly increased uniform degrees of freedom (uDOF) and reduced mutual coupling. In particular, the IMISC arrays are composed of six uniform linear arrays (ULAs), which can be determined by an IES set. The IES set is constrained by two parameters, namely the maximum IES and the number of sensors. The uDOF of the IMISC arrays is derived and the weight function of the IMISC arrays is analyzed as well. The proposed IMISC arrays have a great advantage in terms of uDOF against the existing SAs, while their mutual coupling remains at a low level. Simulations are carried out to demonstrate the advantages of the IMISC arrays.
    Attentive Walk-Aggregating Graph Neural Networks. (arXiv:2110.02667v2 [cs.LG] UPDATED)
    Graph neural networks (GNNs) have been shown to possess strong representation power, which can be exploited for downstream prediction tasks on graph-structured data, such as molecules and social networks. They typically learn representations by aggregating information from the $K$-hop neighborhood of individual vertices or from the enumerated walks in the graph. Prior studies have demonstrated the effectiveness of incorporating weighting schemes into GNNs; however, this has been primarily limited to $K$-hop neighborhood GNNs so far. In this paper, we aim to design an algorithm incorporating weighting schemes into walk-aggregating GNNs and analyze their effect. We propose a novel GNN model, called AWARE, that aggregates information about the walks in the graph using attention schemes. This leads to an end-to-end supervised learning method for graph-level prediction tasks in the standard setting where the input is the adjacency and vertex information of a graph, and the output is a predicted label for the graph. We then perform theoretical, empirical, and interpretability analyses of AWARE. Our theoretical analysis in a simplified setting identifies successful conditions for provable guarantees, demonstrating how the graph information is encoded in the representation, and how the weighting schemes in AWARE affect the representation and learning performance. Our experiments demonstrate the strong performance of AWARE in graph-level prediction tasks in the standard setting in the domains of molecular property prediction and social networks. Lastly, our interpretation study illustrates that AWARE can successfully capture the important substructures of the input graph. The code is available on $\href{https://github.com/mehmetfdemirel/aware}{GitHub}$.
    ImageNet Challenging Classification with the Raspberry Pi: An Incremental Local Stochastic Gradient Descent Algorithm. (arXiv:2203.11853v3 [cs.CV] UPDATED)
    With rising powerful, low-cost embedded devices, the edge computing has become an increasingly popular choice. In this paper, we propose a new incremental local stochastic gradient descent (SGD) tailored on the Raspberry Pi to deal with large ImageNet ILSVRC 2010 dataset having 1,261,405 images with 1,000 classes. The local SGD splits the data block into $k$ partitions using $k$means algorithm and then it learns in the parallel way SGD models in each data partition to classify the data locally. The incremental local SGD sequentially loads small data blocks of the training dataset to learn local SGD models. The numerical test results on Imagenet dataset show that our incremental local SGD algorithm with the Raspberry Pi 4 is faster and more accurate than the state-of-the-art linear SVM run on a PC Intel(R) Core i7-4790 CPU, 3.6 GHz, 4 cores.
    A structured proof of Kolmogorov's Superposition Theorem. (arXiv:2105.00408v2 [math.FA] UPDATED)
    We present a well-structured detailed exposition of a well-known proof of the following celebrated result solving Hilbert's 13th problem on superpositions. For functions of 2 variables the statement is as follows. Kolmogorov Theorem. There are continuous functions $\varphi_1,\ldots,\varphi_5 : [\,0, 1\,]\to [\,0,1\,]$ such that for any continuous function $f: [\,0,1\,]^2\to\mathbb R$ there is a continuous function $h: [\,0,3\,]\to\mathbb R$ such that for any $x,y\in [\,0, 1\,]$ we have $$f(x,y)=\sum\limits_{k=1}^5 h\left(\varphi_k(x)+\sqrt{2}\,\varphi_k(y)\right).$$ The proof is accessible to non-specialists, in particular, to students familiar with only basic properties of continuous functions.
    To show or not to show: Redacting sensitive text from videos of electronic displays. (arXiv:2208.10270v1 [cs.CV])
    With the increasing prevalence of video recordings there is a growing need for tools that can maintain the privacy of those recorded. In this paper, we define an approach for redacting personally identifiable text from videos using a combination of optical character recognition (OCR) and natural language processing (NLP) techniques. We examine the relative performance of this approach when used with different OCR models, specifically Tesseract and the OCR system from Google Cloud Vision (GCV). For the proposed approach the performance of GCV, in both accuracy and speed, is significantly higher than Tesseract. Finally, we explore the advantages and disadvantages of both models in real-world applications.
    Adjacency constraint for efficient hierarchical reinforcement learning. (arXiv:2111.00213v4 [cs.LG] UPDATED)
    Goal-conditioned Hierarchical Reinforcement Learning (HRL) is a promising approach for scaling up reinforcement learning (RL) techniques. However, it often suffers from training inefficiency as the action space of the high-level, i.e., the goal space, is large. Searching in a large goal space poses difficulty for both high-level subgoal generation and low-level policy learning. In this paper, we show that this problem can be effectively alleviated by restricting the high-level action space from the whole goal space to a $k$-step adjacent region of the current state using an adjacency constraint. We theoretically prove that in a deterministic Markov Decision Process (MDP), the proposed adjacency constraint preserves the optimal hierarchical policy, while in a stochastic MDP the adjacency constraint induces a bounded state-value suboptimality determined by the MDP's transition structure. We further show that this constraint can be practically implemented by training an adjacency network that can discriminate between adjacent and non-adjacent subgoals. Experimental results on discrete and continuous control tasks including challenging simulated robot locomotion and manipulation tasks show that incorporating the adjacency constraint significantly boosts the performance of state-of-the-art goal-conditioned HRL approaches.
    SSMTL++: Revisiting Self-Supervised Multi-Task Learning for Video Anomaly Detection. (arXiv:2207.08003v2 [cs.CV] UPDATED)
    A self-supervised multi-task learning (SSMTL) framework for video anomaly detection was recently introduced in literature. Due to its highly accurate results, the method attracted the attention of many researchers. In this work, we revisit the self-supervised multi-task learning framework, proposing several updates to the original method. First, we study various detection methods, e.g. based on detecting high-motion regions using optical flow or background subtraction, since we believe the currently used pre-trained YOLOv3 is suboptimal, e.g. objects in motion or objects from unknown classes are never detected. Second, we modernize the 3D convolutional backbone by introducing multi-head self-attention modules, inspired by the recent success of vision transformers. As such, we alternatively introduce both 2D and 3D convolutional vision transformer (CvT) blocks. Third, in our attempt to further improve the model, we study additional self-supervised learning tasks, such as predicting segmentation maps through knowledge distillation, solving jigsaw puzzles, estimating body pose through knowledge distillation, predicting masked regions (inpainting), and adversarial learning with pseudo-anomalies. We conduct experiments to assess the performance impact of the introduced changes. Upon finding more promising configurations of the framework, dubbed SSMTL++v1 and SSMTL++v2, we extend our preliminary experiments to more data sets, demonstrating that our performance gains are consistent across all data sets. In most cases, our results on Avenue, ShanghaiTech and UBnormal raise the state-of-the-art performance bar to a new level.
    Explainable multiple abnormality classification of chest CT volumes. (arXiv:2111.12215v3 [eess.IV] UPDATED)
    Understanding model predictions is critical in healthcare, to facilitate rapid verification of model correctness and to guard against use of models that exploit confounding variables. We introduce the challenging new task of explainable multiple abnormality classification in volumetric medical images, in which a model must indicate the regions used to predict each abnormality. To solve this task, we propose a multiple instance learning convolutional neural network, AxialNet, that allows identification of top slices for each abnormality. Next we incorporate HiResCAM, an attention mechanism, to identify sub-slice regions. We prove that for AxialNet, HiResCAM explanations are guaranteed to reflect the locations the model used, unlike Grad-CAM which sometimes highlights irrelevant locations. Armed with a model that produces faithful explanations, we then aim to improve the model's learning through a novel mask loss that leverages HiResCAM and 3D allowed regions to encourage the model to predict abnormalities based only on the organs in which those abnormalities appear. The 3D allowed regions are obtained automatically through a new approach, PARTITION, that combines location information extracted from radiology reports with organ segmentation maps obtained through morphological image processing. Overall, we propose the first model for explainable multi-abnormality prediction in volumetric medical images, and then use the mask loss to achieve a 33% improvement in organ localization of multiple abnormalities in the RAD-ChestCT data set of 36,316 scans, representing the state of the art. This work advances the clinical applicability of multiple abnormality modeling in chest CT volumes.
    Game-Theoretic Algorithms for Conditional Moment Matching. (arXiv:2208.09551v1 [cs.GT])
    A variety of problems in econometrics and machine learning, including instrumental variable regression and Bellman residual minimization, can be formulated as satisfying a set of conditional moment restrictions (CMR). We derive a general, game-theoretic strategy for satisfying CMR that scales to nonlinear problems, is amenable to gradient-based optimization, and is able to account for finite sample uncertainty. We recover the approaches of Dikkala et al. and Dai et al. as special cases of our general framework before detailing various extensions and how to efficiently solve the game defined by CMR.
    DBN-Mix: Training Dual Branch Network Using Bilateral Mixup Augmentation for Long-Tailed Visual Recognition. (arXiv:2207.02173v2 [cs.CV] UPDATED)
    There is growing interest in the challenging visual perception task of learning from long-tailed class distributions. The extreme class imbalance in the training dataset biases the model to prefer recognizing majority class data over minority class data. Furthermore, the lack of diversity in minority class samples makes it difficult to find a good representation. In this paper, we propose an effective data augmentation method, referred to as bilateral mixup augmentation, which can improve the performance of long-tailed visual recognition. The bilateral mixup augmentation combines two samples generated by a uniform sampler and a re-balanced sampler and augments the training dataset to enhance the representation learning for minority classes. We also reduce the classifier bias using class-wise temperature scaling, which scales the logits differently per class in the training phase. We apply both ideas to the dual-branch network (DBN) framework, presenting a new model, named dual-branch network with bilateral mixup (DBN-Mix). Experiments on popular long-tailed visual recognition datasets show that DBN-Mix improves performance significantly over baseline and that the proposed method achieves state-of-the-art performance in some categories of benchmarks.
    Robust Graph Meta-learning for Weakly-supervised Few-shot Node Classification. (arXiv:2106.06873v2 [cs.LG] UPDATED)
    Graphs are widely used to model the relational structure of data, and the research of graph machine learning (ML) has a wide spectrum of applications ranging from drug design in molecular graphs to friendship recommendation in social networks. Prevailing approaches for graph ML typically require abundant labeled instances in achieving satisfactory results, which is commonly infeasible in real-world scenarios since labeled data for newly emerged concepts (e.g., new categorizations of nodes) on graphs is limited. Though meta-learning has been applied to different few-shot graph learning problems, most existing efforts predominately assume that all the data from those seen classes is gold-labeled, while those methods may lose their efficacy when the seen data is weakly-labeled with severe label noise. As such, we aim to investigate a novel problem of weakly-supervised graph meta-learning for improving the model robustness in terms of knowledge transfer. To achieve this goal, we propose a new graph meta-learning framework -- Graph Hallucination Networks (Meta-GHN) in this paper. Based on a new robustness-enhanced episodic training, Meta-GHN is meta-learned to hallucinate clean node representations from weakly-labeled data and extracts highly transferable meta-knowledge, which enables the model to quickly adapt to unseen tasks with few labeled instances. Extensive experiments demonstrate the superiority of Meta-GHN over existing graph meta-learning studies on the task of weakly-supervised few-shot node classification.
    Calibration of P-values for calibration and for deviation of a subpopulation from the full population. (arXiv:2202.00100v5 [stat.ME] UPDATED)
    The author's recent research papers, "Cumulative deviation of a subpopulation from the full population" and "A graphical method of cumulative differences between two subpopulations" (both published in volume 8 of Springer's open-access "Journal of Big Data" during 2021), propose graphical methods and summary statistics, without extensively calibrating formal significance tests. The summary metrics and methods can measure the calibration of probabilistic predictions and can assess differences in responses between a subpopulation and the full population while controlling for a covariate or score via conditioning on it. These recently published papers construct significance tests based on the scalar summary statistics, but only sketch how to calibrate the attained significance levels (also known as "P-values") for the tests. The present article reviews and synthesizes work spanning many decades in order to detail how to calibrate the P-values. The present paper presents computationally efficient, easily implemented numerical methods for evaluating properly calibrated P-values, together with rigorous mathematical proofs guaranteeing their accuracy, and illustrates and validates the methods with open-source software and numerical examples.
    Do Differentiable Simulators Give Better Policy Gradients?. (arXiv:2202.00817v2 [cs.LG] UPDATED)
    Differentiable simulators promise faster computation time for reinforcement learning by replacing zeroth-order gradient estimates of a stochastic objective with an estimate based on first-order gradients. However, it is yet unclear what factors decide the performance of the two estimators on complex landscapes that involve long-horizon planning and control on physical systems, despite the crucial relevance of this question for the utility of differentiable simulators. We show that characteristics of certain physical systems, such as stiffness or discontinuities, may compromise the efficacy of the first-order estimator, and analyze this phenomenon through the lens of bias and variance. We additionally propose an $\alpha$-order gradient estimator, with $\alpha \in [0,1]$, which correctly utilizes exact gradients to combine the efficiency of first-order estimates with the robustness of zero-order methods. We demonstrate the pitfalls of traditional estimators and the advantages of the $\alpha$-order estimator on some numerical examples.
    AugShuffleNet: Communicate More, Compute Less. (arXiv:2203.06589v2 [cs.CV] UPDATED)
    As a remarkable compact model, ShuffleNetV2 offers a good example to design efficient ConvNets but its limit is rarely noticed. In this paper, we rethink the design pattern of ShuffleNetV2 and find that the channel-wise redundancy problem still constrains the efficiency improvement of Shuffle block in the wider ShuffleNetV2. To resolve this issue, we propose another augmented variant of shuffle block in the form of bottleneck-like structure and more implicit short connections. To verify the effectiveness of this building block, we further build a more powerful and efficient model family, termed as AugShuffleNets. Evaluated on the CIFAR-10 and CIFAR-100 datasets, AugShuffleNet consistently outperforms ShuffleNetV2 in terms of accuracy with less computational cost and fewer parameter count.
    Multi-Agent Reinforcement Learning for Network Load Balancing in Data Center. (arXiv:2201.11727v4 [cs.DC] UPDATED)
    This paper presents the network load balancing problem, a challenging real-world task for multi-agent reinforcement learning (MARL) methods. Traditional heuristic solutions like Weighted-Cost Multi-Path (WCMP) and Local Shortest Queue (LSQ) are less flexible to the changing workload distributions and arrival rates, with a poor balance among multiple load balancers. The cooperative network load balancing task is formulated as a Dec-POMDP problem, which naturally induces the MARL methods. To bridge the reality gap for applying learning-based methods, all methods are directly trained and evaluated on an emulation system from moderate-to large-scale. Experiments on realistic testbeds show that the independent and "selfish" load balancing strategies are not necessarily the globally optimal ones, while the proposed MARL solution has a superior performance over different realistic settings. Additionally, the potential difficulties of MARL methods for network load balancing are analysed, which helps to draw the attention of the learning and network communities to such challenges.
    Leveraging Cross Feedback of User and Item Embeddings with Attention for Variational Autoencoder based Collaborative Filtering. (arXiv:2002.09145v3 [cs.LG] UPDATED)
    Matrix factorization (MF) has been widely applied to collaborative filtering in recommendation systems. Its Bayesian variants can derive posterior distributions of user and item embeddings, and are more robust to sparse ratings. However, the Bayesian methods are restricted by their update rules for the posterior parameters due to the conjugacy of the priors and the likelihood. Variational autoencoders (VAE) can address this issue by capturing complex mappings between the posterior parameters and the data. However, current research on VAEs for collaborative filtering only considers the mappings based on the explicit data information while the implicit embedding information is overlooked. In this paper, we first derive evidence lower bounds (ELBO) for Bayesian MF models from two viewpoints: user-oriented and item-oriented. Based on the ELBOs, we propose a VAE-based Bayesian MF framework. It leverages not only the data but also the embedding information to approximate the user-item joint distribution. As suggested by the ELBOs, the approximation is iterative with cross feedback of user and item embeddings into each other's encoders. More specifically, user embeddings sampled at the previous iteration are fed to the item-side encoders to estimate the posterior parameters for the item embeddings at the current iteration, and vice versa. The estimation also attends to the cross-fed embeddings to further exploit useful information. The decoder then reconstructs the data via the matrix factorization over the currently re-sampled user and item embeddings.
    Goal Misgeneralization in Deep Reinforcement Learning. (arXiv:2105.14111v5 [cs.LG] UPDATED)
    We study goal misgeneralization, a type of out-of-distribution generalization failure in reinforcement learning (RL). Goal misgeneralization failures occur when an RL agent retains its capabilities out-of-distribution yet pursues the wrong goal. For instance, an agent might continue to competently avoid obstacles, but navigate to the wrong place. In contrast, previous works have typically focused on capability generalization failures, where an agent fails to do anything sensible at test time. We formalize this distinction between capability and goal generalization, provide the first empirical demonstrations of goal misgeneralization, and present a partial characterization of its causes.
    Colloquium: Advances in automation of quantum dot devices control. (arXiv:2112.09362v2 [quant-ph] UPDATED)
    Arrays of quantum dots (QDs) are a promising candidate system to realize scalable, coupled qubits systems and serve as a fundamental building block for quantum computers. In such semiconductor quantum systems, devices now have tens of individual electrostatic and dynamical voltages that must be carefully set to localize the system into the single-electron regime and to realize good qubit operational performance. The mapping of requisite QD locations and charges to gate voltages presents a challenging classical control problem. With an increasing number of QD qubits, the relevant parameter space grows sufficiently to make heuristic control unfeasible.In recent years, there has been a considerable effort to automate device control that combines script-based algorithms with machine learning (ML) techniques. In this colloquium, we present a comprehensive overview of the recent progress in the automation of QD device control, with a particular emphasis on silicon- and GaAs-based QDs formed in two-dimensional electron gases. Combining physics-based modeling with modern numerical optimization and ML has proven quite effective in yielding efficient, scalable control. Further integration of theoretical, computational, and experimental efforts with computer science and ML holds tremendous potential in advancing semiconductor and other platforms for quantum computing.
    Microgrid Optimal Energy Scheduling Considering Neural Network based Battery Degradation. (arXiv:2202.12416v3 [eess.SP] UPDATED)
    Battery energy storage system (BESS) can effec-tively mitigate the uncertainty of variable renewable generation. Degradation is unpreventable and hard to model and predict for batteries such as the most popular Lithium-ion battery (LiB). In this paper, we propose a data driven method to predict the bat-tery degradation per a given scheduled battery operational pro-file. Particularly, a neural network based battery degradation (NNBD) model is proposed to quantify the battery degradation with inputs of major battery degradation factors. When incorpo-rating the proposed NNBD model into microgrid day-ahead scheduling (MDS), we can establish a battery degradation based MDS (BDMDS) model that can consider the equivalent battery degradation cost precisely with the proposed cycle based battery usage processing (CBUP) method for the NNBD model. Since the proposed NNBD model is highly non-linear and non-convex, BDMDS would be very hard to solve. To address this issue, a neural network and optimization decoupled heuristic (NNODH) algorithm is proposed in this paper to effectively solve this neural network embedded optimization problem. Simulation results demonstrate that the proposed NNODH algorithm is able to ob-tain the optimal solution with lowest total cost including normal operation cost and battery degradation cost.
    e-CLIP: Large-Scale Vision-Language Representation Learning in E-commerce. (arXiv:2207.00208v2 [cs.LG] UPDATED)
    Understanding vision and language representations of product content is vital for search and recommendation applications in e-commerce. As a backbone for online shopping platforms and inspired by the recent success in representation learning research, we propose a contrastive learning framework that aligns language and visual models using unlabeled raw product text and images. We present techniques we used to train large-scale representation learning models and share solutions that address domain-specific challenges. We study the performance using our pre-trained model as backbones for diverse downstream tasks, including category classification, attribute extraction, product matching, product clustering, and adult product recognition. Experimental results show that our proposed method outperforms the baseline in each downstream task regarding both single modality and multiple modalities.
    Deep Learning based Coverage and Rate Manifold Estimation in Cellular Networks. (arXiv:2202.06390v2 [cs.NI] UPDATED)
    This article proposes Convolutional Neural Network-based Auto Encoder (CNN-AE) to predict location-dependent rate and coverage probability of a network from its topology. We train the CNN utilising BS location data of India, Brazil, Germany, and the USA and compare its performance with stochastic geometry (SG) based analytical models. In comparison to the best-fitted SG-based model, CNN-AE improves the coverage and rate prediction errors by a margin of as large as $40\%$ and $25\%$ respectively. As an application, we propose a low complexity, provably convergent algorithm that, using trained CNN-AE, can compute locations of new BSs that need to be deployed in a network in order to satisfy pre-defined spatially heterogeneous performance goals.
    Multivariate Boosted Trees and Applications to Forecasting and Control. (arXiv:2003.03835v2 [cs.LG] UPDATED)
    Gradient boosted trees are competition-winning, general-purpose, non-parametric regressors, which exploit sequential model fitting and gradient descent to minimize a specific loss function. The most popular implementations are tailored to univariate regression and classification tasks, precluding the possibility of capturing multivariate target cross-correlations and applying structured penalties to the predictions. In this paper, we present a computationally efficient algorithm for fitting multivariate boosted trees. We show that multivariate trees can outperform their univariate counterpart when the predictions are correlated. Furthermore, the algorithm allows to arbitrarily regularize the predictions, so that properties like smoothness, consistency and functional relations can be enforced. We present applications and numerical results related to forecasting and control.
    A semantic web approach to uplift decentralized household energy data. (arXiv:2208.10265v1 [cs.AI])
    In a decentralized household energy system comprised of various devices such as home appliances, electric vehicles, and solar panels, end-users are able to dig deeper into the system's details and further achieve energy sustainability if they are presented with data on the electric energy consumption and production at the granularity of the device. However, many databases in this field are siloed from other domains, including solely information pertaining to energy. This may result in the loss of information (\textit{e.g.} weather) on each device's energy use. Meanwhile, a large number of these datasets have been extensively used in computational modeling techniques such as machine learning models. While such computational approaches achieve great accuracy and performance by concentrating only on a local view of datasets, model reliability cannot be guaranteed since such models are very vulnerable to data input fluctuations when information omission is taken into account. This article tackles the data isolation issue in the field of smart energy systems by examining Semantic Web methods on top of a household energy system. We offer an ontology-based approach for managing decentralized data at the device-level resolution in a system. As a consequence, the scope of the data associated with each device may easily be expanded in an interoperable manner throughout the Web, and additional information, such as weather, can be obtained from the Web, provided that the data is organized according to W3C standards.
    Model-Free Non-Stationary RL: Near-Optimal Regret and Applications in Multi-Agent RL and Inventory Control. (arXiv:2010.03161v4 [cs.LG] UPDATED)
    We consider model-free reinforcement learning (RL) in non-stationary Markov decision processes. Both the reward functions and the state transition functions are allowed to vary arbitrarily over time as long as their cumulative variations do not exceed certain variation budgets. We propose Restarted Q-Learning with Upper Confidence Bounds (RestartQ-UCB), the first model-free algorithm for non-stationary RL, and show that it outperforms existing solutions in terms of dynamic regret. Specifically, RestartQ-UCB with Freedman-type bonus terms achieves a dynamic regret bound of $\widetilde{O}(S^{\frac{1}{3}} A^{\frac{1}{3}} \Delta^{\frac{1}{3}} H T^{\frac{2}{3}})$, where $S$ and $A$ are the numbers of states and actions, respectively, $\Delta>0$ is the variation budget, $H$ is the number of time steps per episode, and $T$ is the total number of time steps. We further present a parameter-free algorithm named Double-Restart Q-UCB that does not require prior knowledge of the variation budget. We show that our algorithms are \emph{nearly optimal} by establishing an information-theoretical lower bound of $\Omega(S^{\frac{1}{3}} A^{\frac{1}{3}} \Delta^{\frac{1}{3}} H^{\frac{2}{3}} T^{\frac{2}{3}})$, the first lower bound in non-stationary RL. Numerical experiments validate the advantages of RestartQ-UCB in terms of both cumulative rewards and computational efficiency. We demonstrate the power of our results in examples of multi-agent RL and inventory control across related products.
    SDBERT: SparseDistilBERT, a faster and smaller BERT model. (arXiv:2208.10246v1 [cs.CL])
    In this work we introduce a new transformer architecture called SparseDistilBERT (SDBERT), which is a combination of sparse attention and knowledge distillantion (KD). We implemented sparse attention mechanism to reduce quadratic dependency on input length to linear. In addition to reducing computational complexity of the model, we used knowledge distillation (KD). We were able to reduce the size of BERT model by 60% while retaining 97% performance and it only took 40% of time to train.
    STS-GAN: Can We Synthesize Solid Texture with High Fidelity from Arbitrary Exemplars?. (arXiv:2102.03973v6 [cs.CV] UPDATED)
    Solid texture synthesis (STS), an effective way to extend a 2D exemplar to a 3D solid volume, exhibits advantages in numerous application domains. However, existing methods generally fail to accurately learn arbitrary textures, which may result in the failure to synthesize solid textures with high fidelity. In this paper, we propose a novel generative adversarial nets-based framework (STS-GAN) to hierarchically learn arbitrary solid textures. In STS-GAN, multi-scale discriminators evaluate the similarity between patch from exemplar and slice from the generated volume, promoting the generator synthesizing realistic solid textures. Finally, experimental results demonstrate that the proposed method can generate high-fidelity solid textures with similar visual characteristics to the exemplar.
    Complexity of Inexact Proximal Point Algorithm for minimizing convex functions with Holderian Growth. (arXiv:2108.04482v5 [cs.LG] UPDATED)
    Several decades ago the Proximal Point Algorithm (PPA) started to gain a long-lasting attraction for both abstract operator theory and numerical optimization communities. Even in modern applications, researchers still use proximal minimization theory to design scalable algorithms that overcome nonsmoothness. Remarkable works as \cite{Fer:91,Ber:82constrained,Ber:89parallel,Tom:11} established tight relations between the convergence behaviour of PPA and the regularity of the objective function. In this manuscript we derive nonasymptotic iteration complexity of exact and inexact PPA to minimize convex functions under $\gamma-$Holderian growth: $\BigO{\log(1/\epsilon)}$ (for $\gamma \in [1,2]$) and $\BigO{1/\epsilon^{\gamma - 2}}$ (for $\gamma > 2$). In particular, we recover well-known results on PPA: finite convergence for sharp minima and linear convergence for quadratic growth, even under presence of deterministic noise. Moreover, when a simple Proximal Subgradient Method is recurrently called as an inner routine for computing each IPPA iterate, novel computational complexity bounds are obtained for Restarting Inexact PPA. Our numerical tests show improvements over existing restarting versions of the Subgradient Method.
    Distributed Saddle-Point Problems Under Similarity. (arXiv:2107.10706v3 [math.OC] UPDATED)
    We study solution methods for (strongly-)convex-(strongly)-concave Saddle-Point Problems (SPPs) over networks of two type - master/workers (thus centralized) architectures and meshed (thus decentralized) networks. The local functions at each node are assumed to be similar, due to statistical data similarity or otherwise. We establish lower complexity bounds for a fairly general class of algorithms solving the SPP. We show that a given suboptimality $\epsilon>0$ is achieved over master/workers networks in $\Omega\big(\Delta\cdot \delta/\mu\cdot \log (1/\varepsilon)\big)$ rounds of communications, where $\delta>0$ measures the degree of similarity of the local functions, $\mu$ is their strong convexity constant, and $\Delta$ is the diameter of the network. The lower communication complexity bound over meshed networks reads $\Omega\big(1/{\sqrt{\rho}} \cdot {\delta}/{\mu}\cdot\log (1/\varepsilon)\big)$, where $\rho$ is the (normalized) eigengap of the gossip matrix used for the communication between neighbouring nodes. We then propose algorithms matching the lower bounds over either types of networks (up to log-factors). We assess the effectiveness of the proposed algorithms on a robust logistic regression problem.
    Bit-Metric Decoding Rate in Multi-User MIMO Systems: Theory. (arXiv:2203.06271v3 [cs.IT] UPDATED)
    Link-adaptation (LA) is one of the most important aspects of wireless communications where the modulation and coding scheme (MCS) used by the transmitter is adapted to the channel conditions in order to meet a certain target error-rate. In a single-user SISO (SU-SISO) system with out-of-cell interference, LA is performed by computing the post-equalization signal-to-interference-noise ratio (SINR) at the receiver. The same technique can be employed in multi-user MIMO (MU-MIMO) receivers that use linear detectors. Another important use of post-equalization SINR is for physical layer (PHY) abstraction, where several PHY blocks like the channel encoder, the detector, and the channel decoder are replaced by an abstraction model in order to speed up system-level simulations. However, for MU-MIMO systems with non-linear receivers, there is no known equivalent of post-equalization SINR which makes both LA and PHY abstraction extremely challenging. This important issue is addressed in this two-part paper. In this part, a metric called the bit-metric decoding rate (BMDR) of a detector, which is the proposed equivalent of post-equalization SINR, is presented. Since BMDR does not have a closed form expression that would enable its instantaneous calculation, a machine-learning approach to predict it is presented along with extensive simulation results.
    Graph-Embedded Subspace Support Vector Data Description. (arXiv:2104.14370v2 [cs.LG] UPDATED)
    In this paper, we propose a novel subspace learning framework for one-class classification. The proposed framework presents the problem in the form of graph embedding. It includes the previously proposed subspace one-class techniques as its special cases and provides further insight on what these techniques actually optimize. The framework allows to incorporate other meaningful optimization goals via the graph preserving criterion and reveals a spectral solution and a spectral regression-based solution as alternatives to the previously used gradient-based technique. We combine the subspace learning framework iteratively with Support Vector Data Description applied in the subspace to formulate Graph-Embedded Subspace Support Vector Data Description. We experimentally analyzed the performance of newly proposed different variants. We demonstrate improved performance against the baselines and the recently proposed subspace learning methods for one-class classification.
    On the Theory of Reinforcement Learning with Once-per-Episode Feedback. (arXiv:2105.14363v3 [cs.LG] UPDATED)
    We study a theory of reinforcement learning (RL) in which the learner receives binary feedback only once at the end of an episode. While this is an extreme test case for theory, it is also arguably more representative of real-world applications than the traditional requirement in RL practice that the learner receive feedback at every time step. Indeed, in many real-world applications of reinforcement learning, such as self-driving cars and robotics, it is easier to evaluate whether a learner's complete trajectory was either "good" or "bad," but harder to provide a reward signal at each step. To show that learning is possible in this more challenging setting, we study the case where trajectory labels are generated by an unknown parametric model, and provide a statistically and computationally efficient algorithm that achieves sublinear regret.
    One Model, Any CSP: Graph Neural Networks as Fast Global Search Heuristics for Constraint Satisfaction. (arXiv:2208.10227v1 [cs.AI])
    We propose a universal Graph Neural Network architecture which can be trained as an end-2-end search heuristic for any Constraint Satisfaction Problem (CSP). Our architecture can be trained unsupervised with policy gradient descent to generate problem specific heuristics for any CSP in a purely data driven manner. The approach is based on a novel graph representation for CSPs that is both generic and compact and enables us to process every possible CSP instance with one GNN, regardless of constraint arity, relations or domain size. Unlike previous RL-based methods, we operate on a global search action space and allow our GNN to modify any number of variables in every step of the stochastic search. This enables our method to properly leverage the inherent parallelism of GNNs. We perform a thorough empirical evaluation where we learn heuristics for well known and important CSPs from random data, including graph coloring, MaxCut, 3-SAT and MAX-k-SAT. Our approach outperforms prior approaches for neural combinatorial optimization by a substantial margin. It can compete with, and even improve upon, conventional search heuristics on test instances that are several orders of magnitude larger and structurally more complex than those seen during training.
    Multi-Task Learning for Depression Detection in Dialogs. (arXiv:2208.10250v1 [cs.CL])
    Depression is a serious mental illness that impacts the way people communicate, especially through their emotions, and, allegedly, the way they interact with others. This work examines depression signals in dialogs, a less studied setting that suffers from data sparsity. We hypothesize that depression and emotion can inform each other, and we propose to explore the influence of dialog structure through topic and dialog act prediction. We investigate a Multi-Task Learning (MTL) approach, where all tasks mentioned above are learned jointly with dialog-tailored hierarchical modeling. We experiment on the DAIC and DailyDialog corpora-both contain dialogs in English-and show important improvements over state-ofthe-art on depression detection (at best 70.6% F 1), which demonstrates the correlation of depression with emotion and dialog organization and the power of MTL to leverage information from different sources.
    Tight Bounds on the Smallest Eigenvalue of the Neural Tangent Kernel for Deep ReLU Networks. (arXiv:2012.11654v5 [stat.ML] UPDATED)
    A recent line of work has analyzed the theoretical properties of deep neural networks via the Neural Tangent Kernel (NTK). In particular, the smallest eigenvalue of the NTK has been related to the memorization capacity, the global convergence of gradient descent algorithms and the generalization of deep nets. However, existing results either provide bounds in the two-layer setting or assume that the spectrum of the NTK matrices is bounded away from 0 for multi-layer networks. In this paper, we provide tight bounds on the smallest eigenvalue of NTK matrices for deep ReLU nets, both in the limiting case of infinite widths and for finite widths. In the finite-width setting, the network architectures we consider are fairly general: we require the existence of a wide layer with roughly order of $N$ neurons, $N$ being the number of data samples; and the scaling of the remaining layer widths is arbitrary (up to logarithmic factors). To obtain our results, we analyze various quantities of independent interest: we give lower bounds on the smallest singular value of hidden feature matrices, and upper bounds on the Lipschitz constant of input-output feature maps.
    A Simple Unified Framework for Anomaly Detection in Deep Reinforcement Learning. (arXiv:2109.09889v2 [cs.LG] UPDATED)
    Abnormal states in deep reinforcement learning~(RL) are states that are beyond the scope of an RL policy. Such states may lead to sub-optimal and unsafe decision making for the RL system, impeding its deployment in real scenarios. In this paper, we propose a simple yet effective anomaly detection framework for deep RL algorithms that simultaneously considers random, adversarial and out-of-distribution~(OOD) state outliers. In particular, we attain the class-conditional distributions for each action class under the Gaussian assumption, and rely on these distributions to discriminate between inliers and outliers based on Mahalanobis Distance~(MD) and Robust Mahalanobis Distance. We conduct extensive experiments on Atari games that verify the effectiveness of our detection strategies. To the best of our knowledge, we present the first in-detail study of statistical and adversarial anomaly detection in deep RL algorithms. This simple unified anomaly detection paves the way towards deploying safe RL systems in real-world applications.
    Use-Case-Grounded Simulations for Explanation Evaluation. (arXiv:2206.02256v2 [cs.HC] UPDATED)
    A growing body of research runs human subject evaluations to study whether providing users with explanations of machine learning models can help them with practical real-world use cases. However, running user studies is challenging and costly, and consequently each study typically only evaluates a limited number of different settings, e.g., studies often only evaluate a few arbitrarily selected explanation methods. To address these challenges and aid user study design, we introduce Use-Case-Grounded Simulated Evaluations (SimEvals). SimEvals involve training algorithmic agents that take as input the information content (such as model explanations) that would be presented to each participant in a human subject study, to predict answers to the use case of interest. The algorithmic agent's test set accuracy provides a measure of the predictiveness of the information content for the downstream use case. We run a comprehensive evaluation on three real-world use cases (forward simulation, model debugging, and counterfactual reasoning) to demonstrate that Simevals can effectively identify which explanation methods will help humans for each use case. These results provide evidence that SimEvals can be used to efficiently screen an important set of user study design decisions, e.g. selecting which explanations should be presented to the user, before running a potentially costly user study.
    Estimating Smooth GLM in Non-interactive Local Differential Privacy Model with Public Unlabeled Data. (arXiv:1910.00482v4 [cs.LG] UPDATED)
    In this paper, we study the problem of estimating smooth Generalized Linear Models (GLMs) in the Non-interactive Local Differential Privacy (NLDP) model. Different from its classical setting, our model allows the server to access some additional public but unlabeled data. In the first part of the paper we focus on GLMs. Specifically, we first consider the case where each data record is i.i.d. sampled from a zero-mean multivariate Gaussian distribution. Motivated by the Stein's lemma, we present an $(\epsilon, \delta)$-NLDP algorithm for GLMs. Moreover, the sample complexity of public and private data for the algorithm to achieve an $\ell_2$-norm estimation error of $\alpha$ (with high probability) is ${O}(p \alpha^{-2})$ and $\tilde{O}(p^3\alpha^{-2}\epsilon^{-2})$ respectively, where $p$ is the dimension of the feature vector. This is a significant improvement over the previously known exponential or quasi-polynomial in $\alpha^{-1}$, or exponential in $p$ sample complexities of GLMs with no public data. Then we consider a more general setting where each data record is i.i.d. sampled from some sub-Gaussian distribution with bounded $\ell_1$-norm. Based on a variant of Stein's lemma, we propose an $(\epsilon, \delta)$-NLDP algorithm for GLMs whose sample complexity of public and private data to achieve an $\ell_\infty$-norm estimation error of $\alpha$ is ${O}(p^2\alpha^{-2})$ and $\tilde{O}(p^2\alpha^{-2}\epsilon^{-2})$ respectively, under some mild assumptions and if $\alpha$ is not too small ({\em i.e.,} $\alpha\geq \Omega(\frac{1}{\sqrt{p}})$). In the second part of the paper, we extend our idea to the problem of estimating non-linear regressions and show similar results as in GLMs for both multivariate Gaussian and sub-Gaussian cases. Finally, we demonstrate the effectiveness of our algorithms through experiments on both synthetic and real-world datasets.
    Confident Learning: Estimating Uncertainty in Dataset Labels. (arXiv:1911.00068v6 [stat.ML] UPDATED)
    Learning exists in the context of data, yet notions of confidence typically focus on model predictions, not label quality. Confident learning (CL) is an alternative approach which focuses instead on label quality by characterizing and identifying label errors in datasets, based on the principles of pruning noisy data, counting with probabilistic thresholds to estimate noise, and ranking examples to train with confidence. Whereas numerous studies have developed these principles independently, here, we combine them, building on the assumption of a class-conditional noise process to directly estimate the joint distribution between noisy (given) labels and uncorrupted (unknown) labels. This results in a generalized CL which is provably consistent and experimentally performant. We present sufficient conditions where CL exactly finds label errors, and show CL performance exceeding seven recent competitive approaches for learning with noisy labels on the CIFAR dataset. Uniquely, the CL framework is not coupled to a specific data modality or model (e.g., we use CL to find several label errors in the presumed error-free MNIST dataset and improve sentiment classification on text data in Amazon Reviews). We also employ CL on ImageNet to quantify ontological class overlap (e.g., estimating 645 "missile" images are mislabeled as their parent class "projectile"), and moderately increase model accuracy (e.g., for ResNet) by cleaning data prior to training. These results are replicable using the open-source cleanlab release.
    Preventing Oversmoothing in VAE via Generalized Variance Parameterization. (arXiv:2102.08663v2 [cs.LG] UPDATED)
    Variational autoencoders (VAEs) often suffer from posterior collapse, which is a phenomenon in which the learned latent space becomes uninformative. This is often related to the hyperparameter resembling the data variance. It can be shown that an inappropriate choice of this hyperparameter causes the oversmoothness in the linearly approximated case and can be empirically verified for the general cases. Moreover, determining such appropriate choice becomes infeasible if the data variance is non-uniform or conditional. Therefore, we propose VAE extensions with generalized parameterizations of the data variance and incorporate maximum likelihood estimation into the objective function to adaptively regularize the decoder smoothness. The images generated from proposed VAE extensions show improved Fr\'echet inception distance (FID) on MNIST and CelebA datasets.
    On Robustness in Nonconvex Optimization with Application to Defense Planning. (arXiv:2208.09725v1 [math.OC])
    In the context of structured nonconvex optimization, we estimate the increase in minimum value for a decision that is robust to parameter perturbations as compared to the value of a nominal problem. The estimates rely on detailed expressions for subgradients and local Lipschitz moduli of min-value functions in nonconvex robust optimization and require only the solution of the nominal problem. The theoretical results are illustrated by examples from military operations research involving mixed-integer optimization models. Across 54 cases examined, the median error in estimating the increase in minimum value is 12%. Therefore, the derived expressions for subgradients and local Lipschitz moduli may accurately inform analysts about the possibility of obtaining cost-effective, parameter-robust decisions in nonconvex optimization.
    Optimal Client Sampling for Federated Learning. (arXiv:2010.13723v3 [cs.LG] UPDATED)
    It is well understood that client-master communication can be a primary bottleneck in Federated Learning. In this work, we address this issue with a novel client subsampling scheme, where we restrict the number of clients allowed to communicate their updates back to the master node. In each communication round, all participating clients compute their updates, but only the ones with "important" updates communicate back to the master. We show that importance can be measured using only the norm of the update and give a formula for optimal client participation. This formula minimizes the distance between the full update, where all clients participate, and our limited update, where the number of participating clients is restricted. In addition, we provide a simple algorithm that approximates the optimal formula for client participation, which only requires secure aggregation and thus does not compromise client privacy. We show both theoretically and empirically that for Distributed SGD (DSGD) and Federated Averaging (FedAvg), the performance of our approach can be close to full participation and superior to the baseline where participating clients are sampled uniformly. Moreover, our approach is orthogonal to and compatible with existing methods for reducing communication overhead, such as local methods and communication compression methods.
    Quadratic Metric Elicitation for Fairness and Beyond. (arXiv:2011.01516v3 [stat.ML] UPDATED)
    Metric elicitation is a recent framework for eliciting classification performance metrics that best reflect implicit user preferences based on the task and context. However, available elicitation strategies have been limited to linear (or quasi-linear) functions of predictive rates, which can be practically restrictive for many applications including fairness. This paper develops a strategy for eliciting more flexible multiclass metrics defined by quadratic functions of rates, designed to reflect human preferences better. We show its application in eliciting quadratic violation-based group-fair metrics. Our strategy requires only relative preference feedback, is robust to noise, and achieves near-optimal query complexity. We further extend this strategy to eliciting polynomial metrics -- thus broadening the use cases for metric elicitation.
    Practical Vertical Federated Learning with Unsupervised Representation Learning. (arXiv:2208.10278v1 [cs.CR])
    As societal concerns on data privacy recently increase, we have witnessed data silos among multiple parties in various applications. Federated learning emerges as a new learning paradigm that enables multiple parties to collaboratively train a machine learning model without sharing their raw data. Vertical federated learning, where each party owns different features of the same set of samples and only a single party has the label, is an important and challenging topic in federated learning. Communication costs among different parties have been a major hurdle for practical vertical learning systems. In this paper, we propose a novel communication-efficient vertical federated learning algorithm named FedOnce, which requires only one-shot communication among parties. To improve model accuracy and provide privacy guarantee, FedOnce features unsupervised learning representations in the federated setting and privacy-preserving techniques based on moments accountant. The comprehensive experiments on 10 datasets demonstrate that FedOnce achieves close performance compared to state-of-the-art vertical federated learning algorithms with much lower communication costs. Meanwhile, our privacy-preserving technique significantly outperforms the state-of-the-art approaches under the same privacy budget.
    Deterministic Graph-Walking Program Mining. (arXiv:2208.10290v1 [cs.LG])
    Owing to their versatility, graph structures admit representations of intricate relationships between the separate entities comprising the data. We formalise the notion of connection between two vertex sets in terms of edge and vertex features by introducing graph-walking programs. We give two algorithms for mining of deterministic graph-walking programs that yield programs in the order of increasing length. These programs characterise linear long-distance relationships between the given two vertex sets in the context of the whole graph.
    Using Large Language Models to Simulate Multiple Humans. (arXiv:2208.10264v1 [cs.CL])
    We propose a method for using a large language model, such as GPT-3, to simulate responses of different humans in a given context. We test our method by attempting to reproduce well-established economic, psycholinguistic, and social experiments. The method requires prompt templates for each experiment. Simulations are run by varying the (hypothetical) subject details such as name and analyzing the text generated by the language model. We validate our methodology by using GPT-3, to show that it is possible to simulate responses of different people and that their responses are consistent with prior human studies from the literature. We find that the distributions generated by larger language models better align with prior experimental results, suggesting a trend that future language models may be used for even more faithful simulations of human responses. Our use of a language model for simulation is contrasted with anthropomorphic views of a language model as having its own behavior.
    Comparison-based Conversational Recommender System with Relative Bandit Feedback. (arXiv:2208.09837v1 [cs.IR])
    With the recent advances of conversational recommendations, the recommender system is able to actively and dynamically elicit user preference via conversational interactions. To achieve this, the system periodically queries users' preference on attributes and collects their feedback. However, most existing conversational recommender systems only enable the user to provide absolute feedback to the attributes. In practice, the absolute feedback is usually limited, as the users tend to provide biased feedback when expressing the preference. Instead, the user is often more inclined to express comparative preferences, since user preferences are inherently relative. To enable users to provide comparative preferences during conversational interactions, we propose a novel comparison-based conversational recommender system. The relative feedback, though more practical, is not easy to be incorporated since its feedback scale is always mismatched with users' absolute preferences. With effectively collecting and understanding the relative feedback from an interactive manner, we further propose a new bandit algorithm, which we call RelativeConUCB. The experiments on both synthetic and real-world datasets validate the advantage of our proposed method, compared to the existing bandit algorithms in the conversational recommender systems.
    Defensive Distillation based Adversarial Attacks Mitigation Method for Channel Estimation using Deep Learning Models in Next-Generation Wireless Networks. (arXiv:2208.10279v1 [cs.CR])
    Future wireless networks (5G and beyond) are the vision of forthcoming cellular systems, connecting billions of devices and people together. In the last decades, cellular networks have been dramatically growth with advanced telecommunication technologies for high-speed data transmission, high cell capacity, and low latency. The main goal of those technologies is to support a wide range of new applications, such as virtual reality, metaverse, telehealth, online education, autonomous and flying vehicles, smart cities, smart grids, advanced manufacturing, and many more. The key motivation of NextG networks is to meet the high demand for those applications by improving and optimizing network functions. Artificial Intelligence (AI) has a high potential to achieve these requirements by being integrated in applications throughout all layers of the network. However, the security concerns on network functions of NextG using AI-based models, i.e., model poising, have not been investigated deeply. Therefore, it needs to design efficient mitigation techniques and secure solutions for NextG networks using AI-based methods. This paper proposes a comprehensive vulnerability analysis of deep learning (DL)-based channel estimation models trained with the dataset obtained from MATLAB's 5G toolbox for adversarial attacks and defensive distillation-based mitigation methods. The adversarial attacks produce faulty results by manipulating trained DL-based models for channel estimation in NextG networks, while making models more robust against any attacks through mitigation methods. This paper also presents the performance of the proposed defensive distillation mitigation method for each adversarial attack against the channel estimation model. The results indicated that the proposed mitigation method can defend the DL-based channel estimation models against adversarial attacks in NextG networks.
    Long-Short History of Gradients is All You Need: Detecting Malicious and Unreliable Clients in Federated Learning. (arXiv:2208.10273v1 [cs.CR])
    Federated learning offers a framework of training a machine learning model in a distributed fashion while preserving privacy of the participants. As the server cannot govern the clients' actions, nefarious clients may attack the global model by sending malicious local gradients. In the meantime, there could also be unreliable clients who are benign but each has a portion of low-quality training data (e.g., blur or low-resolution images), thus may appearing similar as malicious clients. Therefore, a defense mechanism will need to perform a three-fold differentiation which is much more challenging than the conventional (two-fold) case. This paper introduces MUD-HoG, a novel defense algorithm that addresses this challenge in federated learning using long-short history of gradients, and treats the detected malicious and unreliable clients differently. Not only this, but we can also distinguish between targeted and untargeted attacks among malicious clients, unlike most prior works which only consider one type of the attacks. Specifically, we take into account sign-flipping, additive-noise, label-flipping, and multi-label-flipping attacks, under a non-IID setting. We evaluate MUD-HoG with six state-of-the-art methods on two datasets. The results show that MUD-HoG outperforms all of them in terms of accuracy as well as precision and recall, in the presence of a mixture of multiple (four) types of attackers as well as unreliable clients. Moreover, unlike most prior works which can only tolerate a low population of harmful users, MUD-HoG can work with and successfully detect a wide range of malicious and unreliable clients - up to 47.5% and 10%, respectively, of the total population. Our code is open-sourced at https://github.com/LabSAINT/MUD-HoG_Federated_Learning.
    Collaboration between parallel connected neural networks -- A possible criterion for distinguishing artificial neural networks from natural organs. (arXiv:2208.09983v1 [cs.LG])
    We find experimentally that when artificial neural networks are connected in parallel and trained together, they display the following properties. (i) When the parallel-connected neural network (PNN) is optimized, each sub-network in the connection is not optimized. (ii) The contribution of an inferior sub-network to the whole PNN can be on par with that of the superior sub-network. (iii) The PNN can output the correct result even when all sub-networks give incorrect results. These properties are unlikely for natural biological sense organs. Therefore, they could serve as a simple yet effective criterion for measuring the bionic level of neural networks. With this criterion, we further show that when serving as the activation function, the ReLU function can make an artificial neural network more bionic than the sigmoid and Tanh functions do.
    Predicting the protein-ligand affinity from molecular dynamics trajectories. (arXiv:2208.10230v1 [q-bio.BM])
    The accurate protein-ligand binding affinity prediction is essential in drug design and many other molecular recognition problems. Despite many advances in affinity prediction based on machine learning techniques, they are still limited since the protein-ligand binding is determined by the dynamics of atoms and molecules. To this end, we curated an MD dataset containing 3,218 dynamic protein-ligand complexes and further developed Dynaformer, a graph-based deep learning framework. Dynaformer can fully capture the dynamic binding rules by considering various geometric characteristics of the interaction. Our method shows superior performance over the methods hitherto reported. Moreover, we performed virtual screening on heat shock protein 90 (HSP90) by integrating our model with structure-based docking. We benchmarked our performance against other baselines, demonstrating that our method can identify the molecule with the highest experimental potency. We anticipate that large-scale MD dataset and machine learning models will form a new synergy, providing a new route towards accelerated drug discovery and optimization.
    When BERT Fails -- The Limits of EHR Classification. (arXiv:2208.10245v1 [cs.CL])
    Transformers are powerful text representation learners, useful for all kinds of clinical decision support tasks. Although they outperform baselines on readmission prediction, they are not infallible. Here, we look into one such failure case, and report patterns that lead to inferior predictive performance.
    An Exploratory Study of Tweets about the SARS-CoV-2 Omicron Variant: Insights from Sentiment Analysis, Language Interpretation, Source Tracking, Type Classification, and Embedded URL Detection. (arXiv:2208.10252v1 [cs.CL])
    This paper presents the findings of an exploratory study on the continuously generating Big Data on Twitter related to the sharing of information, news, views, opinions, ideas, feedback, and experiences about the COVID-19 pandemic, with a specific focus on the Omicron variant, which is the globally dominant variant of SARS-CoV-2 at this time. A total of 12028 tweets about the Omicron variant were studied, and the specific characteristics of tweets that were analyzed include - sentiment, language, source, type, and embedded URLs. The findings of this study are manifold. First, from sentiment analysis, it was observed that 50.5% of tweets had a neutral emotion. The other emotions - bad, good, terrible, and great were found in 15.6%, 14.0%, 12.5%, and 7.5% of the tweets, respectively. Second, the findings of language interpretation showed that 65.9% of the tweets were posted in English. It was followed by Spanish, French, Italian, and other languages. Third, the findings from source tracking showed that Twitter for Android was associated with 35.2% of tweets. It was followed by Twitter Web App, Twitter for iPhone, Twitter for iPad, and other sources. Fourth, studying the type of tweets revealed that retweets accounted for 60.8% of the tweets, it was followed by original tweets and replies that accounted for 19.8% and 19.4% of the tweets, respectively. Fifth, in terms of embedded URL analysis, the most common domain embedded in the tweets was found to be twitter.com, which was followed by biorxiv.org, nature.com, and other domains. Finally, to support similar research in this field, we have developed a Twitter dataset that comprises more than 500,000 tweets about the SARS-CoV-2 omicron variant since the first detected case of this variant on November 24, 2021.
    Learning Invariant Representations under General Interventions on the Response. (arXiv:2208.10027v1 [stat.ME])
    It has become increasingly common nowadays to collect observations of feature and response pairs from different environments. As a consequence, one has to apply learned predictors to data with a different distribution due to distribution shifts. One principled approach is to adopt the structural causal models to describe training and test models, following the invariance principle which says that the conditional distribution of the response given its predictors remains the same across environments. However, this principle might be violated in practical settings when the response is intervened. A natural question is whether it is still possible to identify other forms of invariance to facilitate prediction in unseen environments. To shed light on this challenging scenario, we introduce invariant matching property (IMP) which is an explicit relation to capture interventions through an additional feature. This leads to an alternative form of invariance that enables a unified treatment of general interventions on the response. We analyze the asymptotic generalization errors of our method under both the discrete and continuous environment settings, where the continuous case is handled by relating it to the semiparametric varying coefficient models. We present algorithms that show competitive performance compared to existing methods over various experimental settings.
    Meta-Learning Online Control for Linear Dynamical Systems. (arXiv:2208.10259v1 [cs.LG])
    In this paper, we consider the problem of finding a meta-learning online control algorithm that can learn across the tasks when faced with a sequence of $N$ (similar) control tasks. Each task involves controlling a linear dynamical system for a finite horizon of $T$ time steps. The cost function and system noise at each time step are adversarial and unknown to the controller before taking the control action. Meta-learning is a broad approach where the goal is to prescribe an online policy for any new unseen task exploiting the information from other tasks and the similarity between the tasks. We propose a meta-learning online control algorithm for the control setting and characterize its performance by \textit{meta-regret}, the average cumulative regret across the tasks. We show that when the number of tasks are sufficiently large, our proposed approach achieves a meta-regret that is smaller by a factor $D/D^{*}$ compared to an independent-learning online control algorithm which does not perform learning across the tasks, where $D$ is a problem constant and $D^{*}$ is a scalar that decreases with increase in the similarity between tasks. Thus, when the sequence of tasks are similar the regret of the proposed meta-learning online control is significantly lower than that of the naive approaches without meta-learning. We also present experiment results to demonstrate the superior performance achieved by our meta-learning algorithm.
    Robust Bayesian Nonnegative Matrix Factorization with Implicit Regularizers. (arXiv:2208.10053v1 [cs.LG])
    We introduce a probabilistic model with implicit norm regularization for learning nonnegative matrix factorization (NMF) that is commonly used for predicting missing values and finding hidden patterns in the data, in which the matrix factors are latent variables associated with each data dimension. The nonnegativity constraint for the latent factors is handled by choosing priors with support on the nonnegative subspace, e.g., exponential density or distribution based on exponential function. Bayesian inference procedure based on Gibbs sampling is employed. We evaluate the model on several real-world datasets including Genomics of Drug Sensitivity in Cancer (GDSC $IC_{50}$) and Gene body methylation with different sizes and dimensions, and show that the proposed Bayesian NMF GL$_2^2$ and GL$_\infty$ models lead to robust predictions for different data values and avoid overfitting compared with competitive Bayesian NMF approaches.
    SoK: Machine Learning with Confidential Computing. (arXiv:2208.10134v1 [cs.CR])
    Privacy and security challenges in Machine Learning (ML) have become a critical topic to address, along with ML's pervasive development and the recent demonstration of large attack surfaces. As a mature system-oriented approach, confidential computing has been increasingly utilized in both academia and industry to improve privacy and security in various ML scenarios. In this paper, we systematize the findings on confidential computing-assisted ML security and privacy techniques for providing i) confidentiality guarantees and ii) integrity assurances. We further identify key challenges and provide dedicated analyses of the limitations in existing Trusted Execution Environment (TEE) systems for ML use cases. We discuss prospective works, including grounded privacy definitions, partitioned ML executions, dedicated TEE designs for ML, TEE-aware ML, and ML full pipeline guarantee. These potential solutions can help achieve a much strong TEE-enabled ML for privacy guarantees without introducing computation and system costs.
    Generalized Attention Mechanism and Relative Position for Transformer. (arXiv:2208.10247v1 [cs.CL])
    In this paper, we propose generalized attention mechanism (GAM) by first suggesting a new interpretation for self-attention mechanism of Vaswani et al. . Following the interpretation, we provide description for different variants of attention mechanism which together form GAM. Further, we propose a new relative position representation within the framework of GAM. This representation can be easily utilized for cases in which elements next to each other in input sequence can be at random locations in actual dataset/corpus.
    Do-AIQ: A Design-of-Experiment Approach to Quality Evaluation of AI Mislabel Detection Algorithm. (arXiv:2208.09953v1 [stat.ML])
    The quality of Artificial Intelligence (AI) algorithms is of significant importance for confidently adopting algorithms in various applications such as cybersecurity, healthcare, and autonomous driving. This work presents a principled framework of using a design-of-experimental approach to systematically evaluate the quality of AI algorithms, named as Do-AIQ. Specifically, we focus on investigating the quality of the AI mislabel data algorithm against data poisoning. The performance of AI algorithms is affected by hyperparameters in the algorithm and data quality, particularly, data mislabeling, class imbalance, and data types. To evaluate the quality of the AI algorithms and obtain a trustworthy assessment on the quality of the algorithms, we establish a design-of-experiment framework to construct an efficient space-filling design in a high-dimensional constraint space and develop an effective surrogate model using additive Gaussian process to enable the emulation of the quality of AI algorithms. Both theoretical and numerical studies are conducted to justify the merits of the proposed framework. The proposed framework can set an exemplar for AI algorithm to enhance the AI assurance of robustness, reproducibility, and transparency.
    Evaluating and Crafting Datasets Effective for Deep Learning With Data Maps. (arXiv:2208.10033v1 [cs.LG])
    Rapid development in deep learning model construction has prompted an increased need for appropriate training data. The popularity of large datasets - sometimes known as "big data" - has diverted attention from assessing their quality. Training on large datasets often requires excessive system resources and an infeasible amount of time. Furthermore, the supervised machine learning process has yet to be fully automated: for supervised learning, large datasets require more time for manually labeling samples. We propose a method of curating smaller datasets with comparable out-of-distribution model accuracy after an initial training session using an appropriate distribution of samples classified by how difficult it is for a model to learn from them.
    BRIEF but Powerful: Byzantine-Robust and Privacy-Preserving Federated Learning via Model Segmentation and Secure clustering. (arXiv:2208.10161v1 [cs.CR])
    Byzantine-robust Federated Learning (FL) aims to counter malicious clients and to train an accurate global model while maintaining an extremely low attack success rate. Most of the existing systems, however, are only robust in honest/semi-honest majority settings. FLTrust (NDSS '21) extends the context to the malicious majority for clients but with a strong restriction that the server should be provided with an auxiliary dataset before training in order to filter malicious inputs. Private FLAME/FLGUARD (USENIX '22) gives a solution to guarantee both robustness and updates confidentiality in the semi-honest majority context. It is so far impossible to balance the trade-off among malicious context, robustness, and updates confidentiality. To tackle this problem, we propose a novel Byzantine-robust and privacy-preserving FL system, called BRIEF, to capture malicious minority and majority for server and client sides. Specifically, based on the DBSCAN algorithm, we design a new method for clustering via pairwise adjusted cosine similarity to boost the accuracy of the clustering results. To thwart attacks of malicious majority, we develop an algorithm called Model Segmentation, where local updates in the same cluster are aggregated together, and the aggregations are sent back to corresponding clients correctly. We also leverage multiple cryptographic tools to conduct clustering tasks without sacrificing training correctness and updates confidentiality. We present detailed security proof and empirical evaluation along with convergence analysis for BRIEF. The experimental results demonstrate that the testing accuracy of BRIEF is practically close to the FL baseline (0.8% gap on average). At the same time, the attack success rate is around 0%-5%. We further optimize our design so that the communication overhead and runtime can be decreased by {67%-89.17% and 66.05%-68.75%}, respectively.
    Heterogeneous Graph Masked Autoencoders. (arXiv:2208.09957v1 [cs.LG])
    Generative self-supervised learning (SSL), especially masked autoencoders, has become one of the most exciting learning paradigms and has shown great potential in handling graph data. However, real-world graphs are always heterogeneous, which poses three critical challenges that existing methods ignore: 1) how to capture complex graph structure? 2) how to incorporate various node attributes? and 3) how to encode different node positions? In light of this, we study the problem of generative SSL on heterogeneous graphs and propose HGMAE, a novel heterogeneous graph masked autoencoder model to address these challenges. HGMAE captures comprehensive graph information via two innovative masking techniques and three unique training strategies. In particular, we first develop metapath masking and adaptive attribute masking with dynamic mask rate to enable effective and stable learning on heterogeneous graphs. We then design several training strategies including metapath-based edge reconstruction to adopt complex structural information, target attribute restoration to incorporate various node attributes, and positional feature prediction to encode node positional information. Extensive experiments demonstrate that HGMAE outperforms both contrastive and generative state-of-the-art baselines on several tasks across multiple datasets.
    Inferring Sensitive Attributes from Model Explanations. (arXiv:2208.09967v1 [cs.CR])
    Model explanations provide transparency into a trained machine learning model's blackbox behavior to a model builder. They indicate the influence of different input attributes to its corresponding model prediction. The dependency of explanations on input raises privacy concerns for sensitive user data. However, current literature has limited discussion on privacy risks of model explanations. We focus on the specific privacy risk of attribute inference attack wherein an adversary infers sensitive attributes of an input (e.g., race and sex) given its model explanations. We design the first attribute inference attack against model explanations in two threat models where model builder either (a) includes the sensitive attributes in training data and input or (b) censors the sensitive attributes by not including them in the training data and input. We evaluate our proposed attack on four benchmark datasets and four state-of-the-art algorithms. We show that an adversary can successfully infer the value of sensitive attributes from explanations in both the threat models accurately. Moreover, the attack is successful even by exploiting only the explanations corresponding to sensitive attributes. These suggest that our attack is effective against explanations and poses a practical threat to data privacy. On combining the model predictions (an attack surface exploited by prior attacks) with explanations, we note that the attack success does not improve. Additionally, the attack success on exploiting model explanations is better compared to exploiting only model predictions. These suggest that model explanations are a strong attack surface to exploit for an adversary.
    Hierarchical Capsule Prediction Network for Marketing Campaigns Effect. (arXiv:2208.10113v1 [stat.ML])
    Marketing campaigns are a set of strategic activities that can promote a business's goal. The effect prediction for marketing campaigns in a real industrial scenario is very complex and challenging due to the fact that prior knowledge is often learned from observation data, without any intervention for the marketing campaign. Furthermore, each subject is always under the interference of several marketing campaigns simultaneously. Therefore, we cannot easily parse and evaluate the effect of a single marketing campaign. To the best of our knowledge, there are currently no effective methodologies to solve such a problem, i.e., modeling an individual-level prediction task based on a hierarchical structure with multiple intertwined events. In this paper, we provide an in-depth analysis of the underlying parse tree-like structure involved in the effect prediction task and we further establish a Hierarchical Capsule Prediction Network (HapNet) for predicting the effects of marketing campaigns. Extensive results based on both the synthetic data and real data demonstrate the superiority of our model over the state-of-the-art methods and show remarkable practicability in real industrial applications.
    Improving GANs for Long-Tailed Data through Group Spectral Regularization. (arXiv:2208.09932v1 [cs.CV])
    Deep long-tailed learning aims to train useful deep networks on practical, real-world imbalanced distributions, wherein most labels of the tail classes are associated with a few samples. There has been a large body of work to train discriminative models for visual recognition on long-tailed distribution. In contrast, we aim to train conditional Generative Adversarial Networks, a class of image generation models on long-tailed distributions. We find that similar to recognition, state-of-the-art methods for image generation also suffer from performance degradation on tail classes. The performance degradation is mainly due to class-specific mode collapse for tail classes, which we observe to be correlated with the spectral explosion of the conditioning parameter matrix. We propose a novel group Spectral Regularizer (gSR) that prevents the spectral explosion alleviating mode collapse, which results in diverse and plausible image generation even for tail classes. We find that gSR effectively combines with existing augmentation and regularization techniques, leading to state-of-the-art image generation performance on long-tailed data. Extensive experiments demonstrate the efficacy of our regularizer on long-tailed datasets with different degrees of imbalance.
    Transfer Ranking in Finance: Applications to Cross-Sectional Momentum with Data Scarcity. (arXiv:2208.09968v1 [q-fin.TR])
    Cross-sectional strategies are a classical and popular trading style, with recent high performing variants incorporating sophisticated neural architectures. While these strategies have been applied successfully to data-rich settings involving mature assets with long histories, deploying them on instruments with limited samples generally produces over-fitted models with degraded performance. In this paper, we introduce Fused Encoder Networks -- a hybrid parameter-sharing transfer ranking model. The model fuses information extracted using an encoder-attention module operated on a source dataset with a similar but separate module focused on a smaller target dataset of interest. In addition to mitigating the issue of target data scarcity, the model's self-attention mechanism enables interactions among instruments to be accounted for, not just at the loss level during model training, but also at inference time. Focusing on momentum applied to the top ten cryptocurrencies by market capitalisation as a demonstrative use-case, the Fused Encoder Networks outperforms the reference benchmarks on most performance measures, delivering a three-fold boost in the Sharpe ratio over classical momentum as well as an improvement of approximately 50% against the best benchmark model without transaction costs. It continues outperforming baselines even after accounting for the high transaction costs associated with trading cryptocurrencies.
    Simple and Optimal Stochastic Gradient Methods for Nonsmooth Nonconvex Optimization. (arXiv:2208.10025v1 [cs.LG])
    We propose and analyze several stochastic gradient algorithms for finding stationary points or local minimum in nonconvex, possibly with nonsmooth regularizer, finite-sum and online optimization problems. First, we propose a simple proximal stochastic gradient algorithm based on variance reduction called ProxSVRG+. We provide a clean and tight analysis of ProxSVRG+, which shows that it outperforms the deterministic proximal gradient descent (ProxGD) for a wide range of minibatch sizes, hence solves an open problem proposed in Reddi et al. (2016b). Also, ProxSVRG+ uses much less proximal oracle calls than ProxSVRG (Reddi et al., 2016b) and extends to the online setting by avoiding full gradient computations. Then, we further propose an optimal algorithm, called SSRGD, based on SARAH (Nguyen et al., 2017) and show that SSRGD further improves the gradient complexity of ProxSVRG+ and achieves the optimal upper bound, matching the known lower bound of (Fang et al., 2018; Li et al., 2021). Moreover, we show that both ProxSVRG+ and SSRGD enjoy automatic adaptation with local structure of the objective function such as the Polyak-\L{}ojasiewicz (PL) condition for nonconvex functions in the finite-sum case, i.e., we prove that both of them can automatically switch to faster global linear convergence without any restart performed in prior work ProxSVRG (Reddi et al., 2016b). Finally, we focus on the more challenging problem of finding an $(\epsilon, \delta)$-local minimum instead of just finding an $\epsilon$-approximate (first-order) stationary point (which may be some bad unstable saddle points). We show that SSRGD can find an $(\epsilon, \delta)$-local minimum by simply adding some random perturbations. Our algorithm is almost as simple as its counterpart for finding stationary points, and achieves similar optimal rates.
    Robust Tests in Online Decision-Making. (arXiv:2208.09819v1 [stat.ML])
    Bandit algorithms are widely used in sequential decision problems to maximize the cumulative reward. One potential application is mobile health, where the goal is to promote the user's health through personalized interventions based on user specific information acquired through wearable devices. Important considerations include the type of, and frequency with which data is collected (e.g. GPS, or continuous monitoring), as such factors can severely impact app performance and users' adherence. In order to balance the need to collect data that is useful with the constraint of impacting app performance, one needs to be able to assess the usefulness of variables. Bandit feedback data are sequentially correlated, so traditional testing procedures developed for independent data cannot apply. Recently, a statistical testing procedure was developed for the actor-critic bandit algorithm. An actor-critic algorithm maintains two separate models, one for the actor, the action selection policy, and the other for the critic, the reward model. The performance of the algorithm as well as the validity of the test are guaranteed only when the critic model is correctly specified. However, misspecification is frequent in practice due to incorrect functional form or missing covariates. In this work, we propose a modified actor-critic algorithm which is robust to critic misspecification and derive a novel testing procedure for the actor parameters in this case.
    Alexa, Predict My Flight Delay. (arXiv:2208.09921v1 [cs.LG])
    Airlines are critical today for carrying people and commodities on time. Any delay in the schedule of these planes can potentially disrupt the business and trade of thousands of employees at any given time. Therefore, precise flight delay prediction is beneficial for the aviation industry and passenger travel. Recent research has focused on using artificial intelligence algorithms to predict the possibility of flight delays. Earlier prediction algorithms were designed for a specific air route or airfield. Many present flight delay prediction algorithms rely on tiny samples and are challenging to understand, allowing almost no room for machine learning implementation. This research study develops a flight delay prediction system by analyzing data from domestic flights inside the United States of America. The proposed models learn about the factors that cause flight delays and cancellations and the link between departure and arrival delays.
    Multiple Descent in the Multiple Random Feature Model. (arXiv:2208.09897v1 [math.ST])
    Recent works have demonstrated a double descent phenomenon in over-parameterized learning: as the number of model parameters increases, the excess risk has a $\mathsf{U}$-shape at beginning, then decreases again when the model is highly over-parameterized. Although this phenomenon has been investigated by recent works under different settings such as linear models, random feature models and kernel methods, it has not been fully understood in theory. In this paper, we consider a double random feature model (DRFM) consisting of two types of random features, and study the excess risk achieved by the DRFM in ridge regression. We calculate the precise limit of the excess risk under the high dimensional framework where the training sample size, the dimension of data, and the dimension of random features tend to infinity proportionally. Based on the calculation, we demonstrate that the risk curves of DRFMs can exhibit triple descent. We then provide an explanation of the triple descent phenomenon, and discuss how the ratio between random feature dimensions, the regularization parameter and the signal-to-noise ratio control the shape of the risk curves of DRFMs. At last, we extend our study to the multiple random feature model (MRFM), and show that MRFMs with $K$ types of random features may exhibit $(K+1)$-fold descent. Our analysis points out that risk curves with a specific number of descent generally exist in random feature based regression. Another interesting finding is that our result can recover the risk peak locations reported in the literature when learning neural networks are in the "neural tangent kernel" regime.
    AA-Forecast: Anomaly-Aware Forecast for Extreme Events. (arXiv:2208.09933v1 [stat.ML])
    Time series models often deal with extreme events and anomalies, both prevalent in real-world datasets. Such models often need to provide careful probabilistic forecasting, which is vital in risk management for extreme events such as hurricanes and pandemics. However, it is challenging to automatically detect and learn to use extreme events and anomalies for large-scale datasets, which often require manual effort. Hence, we propose an anomaly-aware forecast framework that leverages the previously seen effects of anomalies to improve its prediction accuracy during and after the presence of extreme events. Specifically, the framework automatically extracts anomalies and incorporates them through an attention mechanism to increase its accuracy for future extreme events. Moreover, the framework employs a dynamic uncertainty optimization algorithm that reduces the uncertainty of forecasts in an online manner. The proposed framework demonstrated consistent superior accuracy with less uncertainty on three datasets with different varieties of anomalies over the current prediction models.
    Visual Probing: Cognitive Framework for Explaining Self-Supervised Image Representations. (arXiv:2106.11054v3 [cs.CV] UPDATED)
    Recently introduced self-supervised methods for image representation learning provide on par or superior results to their fully supervised competitors, yet the corresponding efforts to explain the self-supervised approaches lag behind. Motivated by this observation, we introduce a novel visual probing framework for explaining the self-supervised models by leveraging probing tasks employed previously in natural language processing. The probing tasks require knowledge about semantic relationships between image parts. Hence, we propose a systematic approach to obtain analogs of natural language in vision, such as visual words, context, and taxonomy. Our proposal is grounded in Marr's computational theory of vision and concerns features like textures, shapes, and lines. We show the effectiveness and applicability of those analogs in the context of explaining self-supervised representations. Our key findings emphasize that relations between language and vision can serve as an effective yet intuitive tool for discovering how machine learning models work, independently of data modality. Our work opens a plethora of research pathways towards more explainable and transparent AI.
    An anomaly detection approach for backdoored neural networks: face recognition as a case study. (arXiv:2208.10231v1 [cs.CV])
    Backdoor attacks allow an attacker to embed functionality jeopardizing proper behavior of any algorithm, machine learning or not. This hidden functionality can remain inactive for normal use of the algorithm until activated by the attacker. Given how stealthy backdoor attacks are, consequences of these backdoors could be disastrous if such networks were to be deployed for applications as critical as border or access control. In this paper, we propose a novel backdoored network detection method based on the principle of anomaly detection, involving access to the clean part of the training data and the trained network. We highlight its promising potential when considering various triggers, locations and identity pairs, without the need to make any assumptions on the nature of the backdoor and its setup. We test our method on a novel dataset of backdoored networks and report detectability results with perfect scores.
    Merging of neural networks. (arXiv:2204.09973v2 [cs.LG] UPDATED)
    We propose a simple scheme for merging two neural networks trained with different starting initialization into a single one with the same size as the original ones. We do this by carefully selecting channels from each input network. Our procedure might be used as a finalization step after one tries multiple starting seeds to avoid an unlucky one. We also show that training two networks and merging them leads to better performance than training a single network for an extended period of time. Availability: https://github.com/fmfi-compbio/neural-network-merging
    Increasing-Margin Adversarial (IMA) Training to Improve Adversarial Robustness of Neural Networks. (arXiv:2005.09147v9 [cs.CV] UPDATED)
    Deep neural networks (DNNs) are vulnerable to adversarial noises. By adding adversarial noises to training samples, adversarial training can improve the model's robustness against adversarial noises. However, adversarial training samples with excessive noises can harm standard accuracy, which may be unacceptable for many medical image analysis applications. This issue has been termed the trade-off between standard accuracy and adversarial robustness. In this paper, we hypothesize that this issue may be alleviated if the adversarial samples for training are placed right on the decision boundaries. Based on this hypothesis, we design an adaptive adversarial training method, named IMA. For each individual training sample, IMA makes a sample-wise estimation of the upper bound of the adversarial perturbation. In the training process, each of the sample-wise adversarial perturbations is gradually increased to match the margin. Once an equilibrium state is reached, the adversarial perturbations will stop increasing. IMA is evaluated on publicly available datasets under two popular adversarial attacks, PGD and IFGSM. The results show that: (1) IMA significantly improves adversarial robustness of DNN classifiers, which achieves state-of-the-art performance; (2) IMA has a minimal reduction in clean accuracy among all competing defense methods; (3) IMA can be applied to pretrained models to reduce time cost; (4) IMA can be applied to the state-of-the-art medical image segmentation networks, with outstanding performance. We hope our work may help to lift the trade-off between adversarial robustness and clean accuracy and facilitate the development of robust applications in the medical field. The source code will be released when this paper is published.
    Provable Adaptivity in Adam. (arXiv:2208.09900v1 [cs.LG])
    Adaptive Moment Estimation (Adam) optimizer is widely used in deep learning tasks because of its fast convergence properties. However, the convergence of Adam is still not well understood. In particular, the existing analysis of Adam cannot clearly demonstrate the advantage of Adam over SGD. We attribute this theoretical embarrassment to $L$-smooth condition (i.e., assuming the gradient is globally Lipschitz continuous with constant $L$) adopted by literature, which has been pointed out to often fail in practical neural networks. To tackle this embarrassment, we analyze the convergence of Adam under a relaxed condition called $(L_0,L_1)$ smoothness condition, which allows the gradient Lipschitz constant to change with the local gradient norm. $(L_0,L_1)$ is strictly weaker than $L$-smooth condition and it has been empirically verified to hold for practical deep neural networks. Under the $(L_0,L_1)$ smoothness condition, we establish the convergence for Adam with practical hyperparameters. Specifically, we argue that Adam can adapt to the local smoothness condition, justifying the \emph{adaptivity} of Adam. In contrast, SGD can be arbitrarily slow under this condition. Our result might shed light on the benefit of adaptive gradient methods over non-adaptive ones.
    Hierarchical Bayesian Modelling for Knowledge Transfer Across Engineering Fleets via Multitask Learning. (arXiv:2204.12404v2 [stat.ML] UPDATED)
    A population-level analysis is proposed to address data sparsity when building predictive models for engineering infrastructure. Utilising an interpretable hierarchical Bayesian approach and operational fleet data, domain expertise is naturally encoded (and appropriately shared) between different sub-groups, representing (i) use-type, (ii) component, or (iii) operating condition. Specifically, domain expertise is exploited to constrain the model via assumptions (and prior distributions) allowing the methodology to automatically share information between similar assets, improving the survival analysis of a truck fleet and power prediction in a wind farm. In each asset management example, a set of correlated functions is learnt over the fleet, in a combined inference, to learn a population model. Parameter estimation is improved when sub-fleets are allowed to share correlated information at different levels in the hierarchy. In turn, groups with incomplete data automatically borrow statistical strength from those that are data-rich. The statistical correlations enable knowledge transfer via Bayesian transfer learning, and the correlations can be inspected to inform which assets share information for which effect (i.e. parameter). Successes in both case studies demonstrate the wide applicability in practical infrastructure monitoring, since the approach is naturally adapted between interpretable fleet models of different in-situ examples.
    Performance, Opaqueness, Consequences, and Assumptions: Simple questions for responsible planning of machine learning solutions. (arXiv:2208.09966v1 [cs.LG])
    The data revolution has generated a huge demand for data-driven solutions. This demand propels a growing number of easy-to-use tools and training for aspiring data scientists that enable the rapid building of predictive models. Today, weapons of math destruction can be easily built and deployed without detailed planning and validation. This rapidly extends the list of AI failures, i.e. deployments that lead to financial losses or even violate democratic values such as equality, freedom and justice. The lack of planning, rules and standards around the model development leads to the ,,anarchisation of AI". This problem is reported under different names such as validation debt, reproducibility crisis, and lack of explainability. Post-mortem analysis of AI failures often reveals mistakes made in the early phase of model development or data acquisition. Thus, instead of curing the consequences of deploying harmful models, we shall prevent them as early as possible by putting more attention to the initial planning stage. In this paper, we propose a quick and simple framework to support planning of AI solutions. The POCA framework is based on four pillars: Performance, Opaqueness, Consequences, and Assumptions. It helps to set the expectations and plan the constraints for the AI solution before any model is built and any data is collected. With the help of the POCA method, preliminary requirements can be defined for the model-building process, so that costly model misspecification errors can be identified as soon as possible or even avoided. AI researchers, product owners and business analysts can use this framework in the initial stages of building AI solutions.
    MetaRF: Differentiable Random Forest for Reaction Yield Prediction with a Few Trails. (arXiv:2208.10083v1 [cs.LG])
    Artificial intelligence has deeply revolutionized the field of medicinal chemistry with many impressive applications, but the success of these applications requires a massive amount of training samples with high-quality annotations, which seriously limits the wide usage of data-driven methods. In this paper, we focus on the reaction yield prediction problem, which assists chemists in selecting high-yield reactions in a new chemical space only with a few experimental trials. To attack this challenge, we first put forth MetaRF, an attention-based differentiable random forest model specially designed for the few-shot yield prediction, where the attention weight of a random forest is automatically optimized by the meta-learning framework and can be quickly adapted to predict the performance of new reagents while given a few additional samples. To improve the few-shot learning performance, we further introduce a dimension-reduction based sampling method to determine valuable samples to be experimentally tested and then learned. Our methodology is evaluated on three different datasets and acquires satisfactory performance on few-shot prediction. In high-throughput experimentation (HTE) datasets, the average yield of our methodology's top 10 high-yield reactions is relatively close to the results of ideal yield selection.
    MolGraph: a Python package for the implementation of small molecular graphs and graph neural networks with TensorFlow and Keras. (arXiv:2208.09944v1 [cs.LG])
    Molecular machine learning (ML) has proven important for tackling various molecular problems, including the prediction of protein-drug interactions and blood brain-barrier permeability. Since relatively recently, so-called graph neural networks (GNNs) have been implemented for molecular ML, showing comparable or superior performance to descriptor-based approaches. Although various tools and packages exist to apply GNNs for molecular ML, a new GNN package, named MolGraph (https://github.com/akensert/molgraph), was developed in this work with the motivation to create GNNs highly compatible with the TensorFlow and Keras application programming interface (API). As MolGraph focuses specifically and exclusively on molecular ML, a chemistry module was implemented to accommodate the generation of molecular graphs $\unicode{x2014}$ which could then be inputted to the GNNs for molecular ML. To validate the GNNs, they were benchmarked against the datasets of MoleculeNet, as well as three chromatographic retention time datasets. The results on these benchmarks show that the GNNs performed as expected. Additionally, the GNNs proved useful for molecular identification and improved interpretability of chromatographic retention data.
    An Upper Limit of Decaying Rate with Respect to Frequency in Deep Neural Network. (arXiv:2105.11675v3 [cs.LG] UPDATED)
    Deep neural network (DNN) usually learns the target function from low to high frequency, which is called frequency principle or spectral bias. This frequency principle sheds light on a high-frequency curse of DNNs -- difficult to learn high-frequency information. Inspired by the frequency principle, a series of works are devoted to develop algorithms for overcoming the high-frequency curse. A natural question arises: what is the upper limit of the decaying rate w.r.t. frequency when one trains a DNN? In this work, our theory, confirmed by numerical experiments, suggests that there is a critical decaying rate w.r.t. frequency in DNN training. Below the upper limit of the decaying rate, the DNN interpolates the training data by a function with a certain regularity. However, above the upper limit, the DNN interpolates the training data by a trivial function, i.e., a function is only non-zero at training data points. Our results indicate a better way to overcome the high-frequency curse is to design a proper pre-condition approach to shift high-frequency information to low-frequency one, which coincides with several previous developed algorithms for fast learning high-frequency information. More importantly, this work rigorously proves that the high-frequency curse is an intrinsic difficulty of DNNs.
    GAT: Generative Adversarial Training for Adversarial Example Detection and Robust Classification. (arXiv:1905.11475v3 [cs.LG] UPDATED)
    The vulnerabilities of deep neural networks against adversarial examples have become a significant concern for deploying these models in sensitive domains. Devising a definitive defense against such attacks is proven to be challenging, and the methods relying on detecting adversarial samples are only valid when the attacker is oblivious to the detection mechanism. In this paper we propose a principled adversarial example detection method that can withstand norm-constrained white-box attacks. Inspired by one-versus-the-rest classification, in a K class classification problem, we train K binary classifiers where the i-th binary classifier is used to distinguish between clean data of class i and adversarially perturbed samples of other classes. At test time, we first use a trained classifier to get the predicted label (say k) of the input, and then use the k-th binary classifier to determine whether the input is a clean sample (of class k) or an adversarially perturbed example (of other classes). We further devise a generative approach to detecting/classifying adversarial examples by interpreting each binary classifier as an unnormalized density model of the class-conditional data. We provide comprehensive evaluation of the above adversarial example detection/classification methods, and demonstrate their competitive performances and compelling properties.
    Survey of NLP in Pharmacology: Methodology, Tasks, Resources, Knowledge, and Tools. (arXiv:2208.10228v1 [cs.CL])
    Natural language processing (NLP) is an area of artificial intelligence that applies information technologies to process the human language, understand it to a certain degree, and use it in various applications. This area has rapidly developed in the last few years and now employs modern variants of deep neural networks to extract relevant patterns from large text corpora. The main objective of this work is to survey the recent use of NLP in the field of pharmacology. As our work shows, NLP is a highly relevant information extraction and processing approach for pharmacology. It has been used extensively, from intelligent searches through thousands of medical documents to finding traces of adversarial drug interactions in social media. We split our coverage into five categories to survey modern NLP methodology, commonly addressed tasks, relevant textual data, knowledge bases, and useful programming libraries. We split each of the five categories into appropriate subcategories, describe their main properties and ideas, and summarize them in a tabular form. The resulting survey presents a comprehensive overview of the area, useful to practitioners and interested observers.
    On the non-efficient PAC learnability of acyclic conjunctive queries. (arXiv:2208.10255v1 [cs.DB])
    This note serves three purposes: (i) we provide a self-contained exposition of the fact that conjunctive queries are not efficiently learnable in the Probably-Approximately-Correct (PAC) model, paying clear attention to the complicating fact that this concept class lacks the polynomial-size fitting property, a property that is tacitly assumed in much of the computational learning theory literature; (ii) we establish a strong negative PAC learnability result that applies to many restricted classes of conjunctive queries (CQs), including acyclic CQs for a wide range of notions of "acyclicity"; (iii) we show that CQs are efficiently PAC learnable with membership queries.
    LTE4G: Long-Tail Experts for Graph Neural Networks. (arXiv:2208.10205v1 [cs.LG])
    Existing Graph Neural Networks (GNNs) usually assume a balanced situation where both the class distribution and the node degree distribution are balanced. However, in real-world situations, we often encounter cases where a few classes (i.e., head class) dominate other classes (i.e., tail class) as well as in the node degree perspective, and thus naively applying existing GNNs eventually fall short of generalizing to the tail cases. Although recent studies proposed methods to handle long-tail situations on graphs, they only focus on either the class long-tailedness or the degree long-tailedness. In this paper, we propose a novel framework for training GNNs, called Long-Tail Experts for Graphs (LTE4G), which jointly considers the class long-tailedness, and the degree long-tailedness for node classification. The core idea is to assign an expert GNN model to each subset of nodes that are split in a balanced manner considering both the class and degree long-tailedness. After having trained an expert for each balanced subset, we adopt knowledge distillation to obtain two class-wise students, i.e., Head class student and Tail class student, each of which is responsible for classifying nodes in the head classes and tail classes, respectively. We demonstrate that LTE4G outperforms a wide range of state-of-the-art methods in node classification evaluated on both manual and natural imbalanced graphs. The source code of LTE4G can be found at https://github.com/SukwonYun/LTE4G.
    Bayesian Complementary Kernelized Learning for Multidimensional Spatiotemporal Data. (arXiv:2208.09978v1 [stat.ML])
    Probabilistic modeling of multidimensional spatiotemporal data is critical to many real-world applications. However, real-world spatiotemporal data often exhibits complex dependencies that are nonstationary, i.e., correlation structure varies with location/time, and nonseparable, i.e., dependencies exist between space and time. Developing effective and computationally efficient statistical models to accommodate nonstationary/nonseparable processes containing both long-range and short-scale variations becomes a challenging task, especially for large-scale datasets with various corruption/missing structures. In this paper, we propose a new statistical framework -- Bayesian Complementary Kernelized Learning (BCKL) -- to achieve scalable probabilistic modeling for multidimensional spatiotemporal data. To effectively describe complex dependencies, BCKL integrates kernelized low-rank factorization with short-range spatiotemporal Gaussian processes (GP), in which the two components complement each other. Specifically, we use a multi-linear low-rank factorization component to capture the global/long-range correlations in the data and introduce an additive short-scale GP based on compactly supported kernel functions to characterize the remaining local variabilities. We develop an efficient Markov chain Monte Carlo (MCMC) algorithm for model inference and evaluate the proposed BCKL framework on both synthetic and real-world spatiotemporal datasets. Our results confirm the superior performance of BCKL in providing accurate posterior mean and high-quality uncertainty estimates.
    Socially Fair Center-based and Linear Subspace Clustering. (arXiv:2208.10095v1 [cs.LG])
    Center-based clustering (e.g., $k$-means, $k$-medians) and clustering using linear subspaces are two most popular techniques to partition real-world data into smaller clusters. However, when the data consists of sensitive demographic groups, significantly different clustering cost per point for different sensitive groups can lead to fairness-related harms (e.g., different quality-of-service). The goal of socially fair clustering is to minimize the maximum cost of clustering per point over all groups. In this work, we propose a unified framework to solve socially fair center-based clustering and linear subspace clustering, and give practical, efficient approximation algorithms for these problems. We do extensive experiments to show that on multiple benchmark datasets our algorithms either closely match or outperform state-of-the-art baselines.
    MLExchange -- A web-based platform enabling exchangeable machine learning workflows. (arXiv:2208.09751v1 [cs.LG])
    Machine learning (ML) algorithms are showing a growing trend in helping the scientific communities across different disciplines and institutions to address large and diverse data problems. However, many available ML tools are programmatically demanding and computationally costly. The MLExchange project aims to build a collaborative platform equipped with enabling tools that allow scientists and facility users who do not have a profound ML background to use ML and computational resources in scientific discovery. At the high level, we are targeting a full user experience where managing and exchanging ML algorithms, workflows, and data are readily available through web applications. So far, we have built four major components, i.e, the central job manager, the centralized content registry, user portal, and search engine, and successfully deployed these components on a testing server. Since each component is an independent container, the whole platform or its individual service(s) can be easily deployed at servers of different scales, ranging from a laptop (usually a single user) to high performance clusters (HPC) accessed (simultaneously) by many users. Thus, MLExchange renders flexible using scenarios -- users could either access the services and resources from a remote server or run the whole platform or its individual service(s) within their local network.
    DiscrimLoss: A Universal Loss for Hard Samples and Incorrect Samples Discrimination. (arXiv:2208.09884v1 [cs.LG])
    Given data with label noise (i.e., incorrect data), deep neural networks would gradually memorize the label noise and impair model performance. To relieve this issue, curriculum learning is proposed to improve model performance and generalization by ordering training samples in a meaningful (e.g., easy to hard) sequence. Previous work takes incorrect samples as generic hard ones without discriminating between hard samples (i.e., hard samples in correct data) and incorrect samples. Indeed, a model should learn from hard samples to promote generalization rather than overfit to incorrect ones. In this paper, we address this problem by appending a novel loss function DiscrimLoss, on top of the existing task loss. Its main effect is to automatically and stably estimate the importance of easy samples and difficult samples (including hard and incorrect samples) at the early stages of training to improve the model performance. Then, during the following stages, DiscrimLoss is dedicated to discriminating between hard and incorrect samples to improve the model generalization. Such a training strategy can be formulated dynamically in a self-supervised manner, effectively mimicking the main principle of curriculum learning. Experiments on image classification, image regression, text sequence regression, and event relation reasoning demonstrate the versatility and effectiveness of our method, particularly in the presence of diversified noise levels.
    NOSMOG: Learning Noise-robust and Structure-aware MLPs on Graphs. (arXiv:2208.10010v1 [cs.LG])
    While Graph Neural Networks (GNNs) have demonstrated their efficacy in dealing with non-Euclidean structural data, they are difficult to be deployed in real applications due to the scalability constraint imposed by multi-hop data dependency. Existing methods attempt to address this scalability issue by training multi-layer perceptrons (MLPs) exclusively on node content features using labels derived from trained GNNs. Even though the performance of MLPs can be significantly improved, two issues prevent MLPs from outperforming GNNs and being used in practice: the ignorance of graph structural information and the sensitivity to node feature noises. In this paper, we propose to learn NOise-robust Structure-aware MLPs On Graphs (NOSMOG) to overcome the challenges. Specifically, we first complement node content with position features to help MLPs capture graph structural information. We then design a novel representational similarity distillation strategy to inject structural node similarities into MLPs. Finally, we introduce the adversarial feature augmentation to ensure stable learning against feature noises and further improve performance. Extensive experiments demonstrate that NOSMOG outperforms GNNs and the state-of-the-art method in both transductive and inductive settings across seven datasets, while maintaining a competitive inference efficiency.
    ProPaLL: Probabilistic Partial Label Learning. (arXiv:2208.09931v1 [cs.LG])
    Partial label learning is a type of weakly supervised learning, where each training instance corresponds to a set of candidate labels, among which only one is true. In this paper, we introduce ProPaLL, a novel probabilistic approach to this problem, which has at least three advantages compared to the existing approaches: it simplifies the training process, improves performance, and can be applied to any deep architecture. Experiments conducted on artificial and real-world datasets indicate that ProPaLL outperforms the existing approaches.  ( 2 min )
    Robust Node Classification on Graphs: Jointly from Bayesian Label Transition and Topology-based Label Propagation. (arXiv:2208.09779v1 [cs.LG])
    Node classification using Graph Neural Networks (GNNs) has been widely applied in various real-world scenarios. However, in recent years, compelling evidence emerges that the performance of GNN-based node classification may deteriorate substantially by topological perturbation, such as random connections or adversarial attacks. Various solutions, such as topological denoising methods and mechanism design methods, have been proposed to develop robust GNN-based node classifiers but none of these works can fully address the problems related to topological perturbations. Recently, the Bayesian label transition model is proposed to tackle this issue but its slow convergence may lead to inferior performance. In this work, we propose a new label inference model, namely LInDT, which integrates both Bayesian label transition and topology-based label propagation for improving the robustness of GNNs against topological perturbations. LInDT is superior to existing label transition methods as it improves the label prediction of uncertain nodes by utilizing neighborhood-based label propagation leading to better convergence of label inference. Besides, LIndT adopts asymmetric Dirichlet distribution as a prior, which also helps it to improve label inference. Extensive experiments on five graph datasets demonstrate the superiority of LInDT for GNN-based node classification under three scenarios of topological perturbations.  ( 3 min )
    Semantic-enhanced Image Clustering. (arXiv:2208.09849v1 [cs.CV])
    Image clustering is an important, and open challenge task in computer vision. Although many methods have been proposed to solve the image clustering task, they only explore images and uncover clusters according to the image features, thus are unable to distinguish visually similar but semantically different images. In this paper, we propose to investigate the task of image clustering with the help of visual-language pre-training model. Different from the zero-shot setting in which the class names are known, we only know the number of clusters in this setting. Therefore, how to map images to a proper semantic space and how to cluster images from both image and semantic spaces are two key problems. To solve the above problems, we propose a novel image clustering method guided by the visual-language pre-training model CLIP, named as \textbf{Semantic-enhanced Image Clustering (SIC)}. In this new method, we propose a method to map the given images to a proper semantic space first and efficient methods to generate pseudo-labels according to the relationships between images and semantics. Finally, we propose to perform clustering with the consistency learning in both image space and semantic space, in a self-supervised learning fashion. Theoretical result on convergence analysis shows that our proposed method can converge in sublinear speed. Theoretical analysis on expectation risk also shows that we can reduce the expectation risk by improving the neighborhood consistency or prediction confidence or reducing neighborhood imbalance. Experimental results on five benchmark datasets clearly show the superiority of our new method.  ( 3 min )
    Fuse and Attend: Generalized Embedding Learning for Art and Sketches. (arXiv:2208.09698v1 [cs.CV])
    While deep Embedding Learning approaches have witnessed widespread success in multiple computer vision tasks, the state-of-the-art methods for representing natural images need not necessarily perform well on images from other domains, such as paintings, cartoons, and sketch. This is because of the huge shift in the distribution of data from across these domains, as compared to natural images. Domains like sketch often contain sparse informative pixels. However, recognizing objects in such domains is crucial, given multiple relevant applications leveraging such data, for instance, sketch to image retrieval. Thus, achieving an Embedding Learning model that could perform well across multiple domains is not only challenging, but plays a pivotal role in computer vision. To this end, in this paper, we propose a novel Embedding Learning approach with the goal of generalizing across different domains. During training, given a query image from a domain, we employ gated fusion and attention to generate a positive example, which carries a broad notion of the semantics of the query object category (from across multiple domains). By virtue of Contrastive Learning, we pull the embeddings of the query and positive, in order to learn a representation which is robust across domains. At the same time, to teach the model to be discriminative against examples from different semantic categories (across domains), we also maintain a pool of negative embeddings (from different categories). We show the prowess of our method using the DomainBed framework, on the popular PACS (Photo, Art painting, Cartoon, and Sketch) dataset.  ( 3 min )
    Seeing Objects in dark with Continual Contrastive Learning. (arXiv:2112.02891v3 [cs.CV] UPDATED)
    Object Detection, a fundamental computer vision problem, has paramount importance in smart camera systems. However, a truly reliable camera system could be achieved if and only if the underlying object detection component is robust enough across varying imaging conditions (or domains), for instance, different times of the day, adverse weather conditions, etc. In an effort to achieving a reliable camera system, in this paper, we make an attempt to train such a robust detector. Unfortunately, to build a well-performing detector across varying imaging conditions, one would require labeled training images (often in large numbers) from a plethora of corner cases. As manually obtaining such a large labeled dataset may be infeasible, we suggest using synthetic images, to mimic different training image domains. We propose a novel, contrastive learning method to align the latent representations of a pair of real and synthetic images, to make the detector robust to the different domains. However, we found that merely contrasting the embeddings may lead to catastrophic forgetting of the information essential for object detection. Hence, we employ a continual learning based penalty, to alleviate the issue of forgetting, while contrasting the representations. We showcase that our proposed method outperforms a wide range of alternatives to address the extremely challenging, yet under-studied scenario of object detection at night-time.  ( 3 min )
    A semi-supervised Teacher-Student framework for surgical tool detection and localization. (arXiv:2208.09926v1 [cs.CV])
    Surgical tool detection in minimally invasive surgery is an essential part of computer-assisted interventions. Current approaches are mostly based on supervised methods which require large fully labeled data to train supervised models and suffer from pseudo label bias because of class imbalance issues. However large image datasets with bounding box annotations are often scarcely available. Semi-supervised learning (SSL) has recently emerged as a means for training large models using only a modest amount of annotated data; apart from reducing the annotation cost. SSL has also shown promise to produce models that are more robust and generalizable. Therefore, in this paper we introduce a semi-supervised learning (SSL) framework in surgical tool detection paradigm which aims to mitigate the scarcity of training data and the data imbalance through a knowledge distillation approach. In the proposed work, we train a model with labeled data which initialises the Teacher-Student joint learning, where the Student is trained on Teacher-generated pseudo labels from unlabeled data. We propose a multi-class distance with a margin based classification loss function in the region-of-interest head of the detector to effectively segregate foreground classes from background region. Our results on m2cai16-tool-locations dataset indicate the superiority of our approach on different supervised data settings (1%, 2%, 5%, 10% of annotated data) where our model achieves overall improvements of 8%, 12% and 27% in mAP (on 1% labeled data) over the state-of-the-art SSL methods and a fully supervised baseline, respectively. The code is available at https://github.com/Mansoor-at/Semi-supervised-surgical-tool-det  ( 3 min )
    Are discrete units necessary for Spoken Language Modeling?. (arXiv:2203.05936v2 [cs.CL] UPDATED)
    Recent work in spoken language modeling shows the possibility of learning a language unsupervisedly from raw audio without any text labels. The approach relies first on transforming the audio into a sequence of discrete units (or pseudo-text) and then training a language model directly on such pseudo-text. Is such a discrete bottleneck necessary, potentially introducing irreversible errors in the encoding of the speech signal, or could we learn a language model without discrete units at all? In this work, we study the role of discrete versus continuous representations in spoken language modeling. We show that discretization is indeed essential for good results in spoken language modeling. We show that discretization removes linguistically irrelevant information from the continuous features, helping to improve language modeling performances. On the basis of this study, we train a language model on the discrete units of the HuBERT features, reaching new state-of-the-art results in the lexical, syntactic and semantic metrics of the Zero Resource Speech Challenge 2021 (Track 1 - Speech Only).  ( 2 min )
    Multiple Instance Neuroimage Transformer. (arXiv:2208.09567v1 [cs.CV])
    For the first time, we propose using a multiple instance learning based convolution-free transformer model, called Multiple Instance Neuroimage Transformer (MINiT), for the classification of T1weighted (T1w) MRIs. We first present several variants of transformer models adopted for neuroimages. These models extract non-overlapping 3D blocks from the input volume and perform multi-headed self-attention on a sequence of their linear projections. MINiT, on the other hand, treats each of the non-overlapping 3D blocks of the input MRI as its own instance, splitting it further into non-overlapping 3D patches, on which multi-headed self-attention is computed. As a proof-of-concept, we evaluate the efficacy of our model by training it to identify sex from T1w-MRIs of two public datasets: Adolescent Brain Cognitive Development (ABCD) and the National Consortium on Alcohol and Neurodevelopment in Adolescence (NCANDA). The learned attention maps highlight voxels contributing to identifying sex differences in brain morphometry. The code is available at https://github.com/singlaayush/MINIT.  ( 2 min )
    G2{\Phi}net: Relating Genotype and Biomechanical Phenotype of Tissues with Deep Learning. (arXiv:2208.09889v1 [q-bio.TO])
    Many genetic mutations adversely affect the structure and function of load-bearing soft tissues, with clinical sequelae often responsible for disability or death. Parallel advances in genetics and histomechanical characterization provide significant insight into these conditions, but there remains a pressing need to integrate such information. We present a novel genotype-to-biomechanical-phenotype neural network (G2{\Phi}net) for characterizing and classifying biomechanical properties of soft tissues, which serve as important functional readouts of tissue health or disease. We illustrate the utility of our approach by inferring the nonlinear, genotype-dependent constitutive behavior of the aorta for four mouse models involving defects or deficiencies in extracellular constituents. We show that G2{\Phi}net can infer the biomechanical response while simultaneously ascribing the associated genotype correctly by utilizing limited, noisy, and unstructured experimental data. More broadly, G2{\Phi}net provides a powerful method and a paradigm shift for correlating genotype and biomechanical phenotype quantitatively, promising a better understanding of their interplay in biological tissues.  ( 2 min )
    Provably Tightest Linear Approximation for Robustness Verification of Sigmoid-like Neural Networks. (arXiv:2208.09872v1 [cs.LG])
    The robustness of deep neural networks is crucial to modern AI-enabled systems and should be formally verified. Sigmoid-like neural networks have been adopted in a wide range of applications. Due to their non-linearity, Sigmoid-like activation functions are usually over-approximated for efficient verification, which inevitably introduces imprecision. Considerable efforts have been devoted to finding the so-called tighter approximations to obtain more precise verification results. However, existing tightness definitions are heuristic and lack theoretical foundations. We conduct a thorough empirical analysis of existing neuron-wise characterizations of tightness and reveal that they are superior only on specific neural networks. We then introduce the notion of network-wise tightness as a unified tightness definition and show that computing network-wise tightness is a complex non-convex optimization problem. We bypass the complexity from different perspectives via two efficient, provably tightest approximations. The results demonstrate the promising performance achievement of our approaches over state of the art: (i) achieving up to 251.28% improvement to certified lower robustness bounds; and (ii) exhibiting notably more precise verification results on convolutional networks.  ( 2 min )
    Emergence of hierarchical modes from deep learning. (arXiv:2208.09859v1 [cs.LG])
    Large-scale deep neural networks consume expensive training costs, but the training results in less-interpretable weight matrices constructing the networks. Here, we propose a mode decomposition learning that can interpret the weight matrices as a hierarchy of latent modes. These modes are akin to patterns in physics studies of memory networks. The mode decomposition learning not only saves a significant large amount of training costs, but also explains the network performance with the leading modes. The mode learning scheme shows a progressively compact latent space across the network hierarchy, and the least number of modes increases only logarithmically with the network width. Our mode decomposition learning is also studied in an analytic on-line learning setting, which reveals multi-stage of learning dynamics. Therefore, the proposed mode decomposition learning points to a cheap and interpretable route towards the magical deep learning.  ( 2 min )
    A biologically-inspired evaluation of molecular generative machine learning. (arXiv:2208.09658v1 [cs.LG])
    While generative models have recently become ubiquitous in many scientific areas, less attention has been paid to their evaluation. For molecular generative models, the state-of-the-art examines their output in isolation or in relation to its input. However, their biological and functional properties, such as ligand-target interaction is not being addressed. In this study, a novel biologically-inspired benchmark for the evaluation of molecular generative models is proposed. Specifically, three diverse reference datasets are designed and a set of metrics are introduced which are directly relevant to the drug discovery process. In particular we propose a recreation metric, apply drug-target affinity prediction and molecular docking as complementary techniques for the evaluation of generative outputs. While all three metrics show consistent results across the tested generative models, a more detailed comparison of drug-target affinity binding and molecular docking scores revealed that unimodal predictiors can lead to erroneous conclusions about target binding on a molecular level and a multi-modal approach is thus preferrable. The key advantage of this framework is that it incorporates prior physico-chemical domain knowledge into the benchmarking process by focusing explicitly on ligand-target interactions and thus creating a highly efficient tool not only for evaluating molecular generative outputs in particular, but also for enriching the drug discovery process in general.  ( 3 min )
    A Unified Analysis of Mixed Sample Data Augmentation: A Loss Function Perspective. (arXiv:2208.09913v1 [cs.LG])
    We propose the first unified theoretical analysis of mixed sample data augmentation (MSDA), such as Mixup and CutMix. Our theoretical results show that regardless of the choice of the mixing strategy, MSDA behaves as a pixel-level regularization of the underlying training loss and a regularization of the first layer parameters. Similarly, our theoretical results support that the MSDA training strategy can improve adversarial robustness and generalization compared to the vanilla training strategy. Using the theoretical results, we provide a high-level understanding of how different design choices of MSDA work differently. For example, we show that the most popular MSDA methods, Mixup and CutMix, behave differently, e.g., CutMix regularizes the input gradients by pixel distances, while Mixup regularizes the input gradients regardless of pixel distances. Our theoretical results also show that the optimal MSDA strategy depends on tasks, datasets, or model parameters. From these observations, we propose generalized MSDAs, a Hybrid version of Mixup and CutMix (HMix) and Gaussian Mixup (GMix), simple extensions of Mixup and CutMix. Our implementation can leverage the advantages of Mixup and CutMix, while our implementation is very efficient, and the computation cost is almost neglectable as Mixup and CutMix. Our empirical study shows that our HMix and GMix outperform the previous state-of-the-art MSDA methods in CIFAR-100 and ImageNet classification tasks. Source code is available at https://github.com/naver-ai/hmix-gmix  ( 3 min )
    Twin Papers: A Simple Framework of Causal Inference for Citations via Coupling. (arXiv:2208.09862v1 [cs.DL])
    The research process includes many decisions, e.g., how to entitle and where to publish the paper. In this paper, we introduce a general framework for investigating the effects of such decisions. The main difficulty in investigating the effects is that we need to know counterfactual results, which are not available in reality. The key insight of our framework is inspired by the existing counterfactual analysis using twins, where the researchers regard twins as counterfactual units. The proposed framework regards a pair of papers that cite each other as twins. Such papers tend to be parallel works, on similar topics, and in similar communities. We investigate twin papers that adopted different decisions, observe the progress of the research impact brought by these studies, and estimate the effect of decisions by the difference in the impacts of these studies. We release our code and data, which we believe are highly beneficial owing to the scarcity of the dataset on counterfactual studies.  ( 2 min )
    Combating Noisy-Labeled and Imbalanced Data by Two Stage Bi-Dimensional Sample Selection. (arXiv:2208.09833v1 [cs.LG])
    Robust learning on noisy-labeled data has been an important task in real applications, because label noise directly leads to the poor generalization of deep learning models. Existing label-noise learning methods usually assume that the ground-truth classes of the training data are balanced. However, the real-world data is often imbalanced, leading to the inconsistency between observed and intrinsic class distribution due to label noises. Distribution inconsistency makes the problem of label-noise learning more challenging because it is hard to distinguish clean samples from noisy samples on the intrinsic tail classes. In this paper, we propose a learning framework for label-noise learning with intrinsically long-tailed data. Specifically, we propose a robust sample selection method called two-stage bi-dimensional sample selection (TBSS) to better separate clean samples from noisy samples, especially for the tail classes. TBSS consists of two new separation metrics to jointly separate samples in each class. Extensive experiments on multiple noisy-labeled datasets with intrinsically long-tailed class distribution demonstrate the effectiveness of our method.  ( 2 min )
    Energy-aware Scheduling of Virtualized Base Stations in O-RAN with Online Learning. (arXiv:2208.09956v1 [cs.NI])
    The design of Open Radio Access Network (O-RAN) compliant systems for configuring the virtualized Base Stations (vBSs) is of paramount importance for network operators. This task is challenging since optimizing the vBS scheduling procedure requires knowledge of parameters, which are erratic and demanding to obtain in advance. In this paper, we propose an online learning algorithm for balancing the performance and energy consumption of a vBS. This algorithm provides performance guarantees under unforeseeable conditions, such as non-stationary traffic and network state, and is oblivious to the vBS operation profile. We study the problem in its most general form and we prove that the proposed technique achieves sub-linear regret (i.e., zero average optimality gap) even in a fast-changing environment. By using real-world data and various trace-driven evaluations, our findings indicate savings of up to 74.3% in the power consumption of a vBS in comparison with state-of-the-art benchmarks.  ( 2 min )
    The computational complexity of some explainable clustering problems. (arXiv:2208.09643v1 [cs.LG])
    We study the computational complexity of some explainable clustering problems in the framework proposed by [Dasgupta et al., ICML 2020], where explainability is achieved via axis-aligned decision trees. We consider the $k$-means, $k$-medians, $k$-centers and the spacing cost functions. We prove that the first three are hard to optimize while the latter can be optimized in polynomial time.  ( 2 min )
    MentorGNN: Deriving Curriculum for Pre-Training GNNs. (arXiv:2208.09905v1 [cs.LG])
    Graph pre-training strategies have been attracting a surge of attention in the graph mining community, due to their flexibility in parameterizing graph neural networks (GNNs) without any label information. The key idea lies in encoding valuable information into the backbone GNNs, by predicting the masked graph signals extracted from the input graphs. In order to balance the importance of diverse graph signals (e.g., nodes, edges, subgraphs), the existing approaches are mostly hand-engineered by introducing hyperparameters to re-weight the importance of graph signals. However, human interventions with sub-optimal hyperparameters often inject additional bias and deteriorate the generalization performance in the downstream applications. This paper addresses these limitations from a new perspective, i.e., deriving curriculum for pre-training GNNs. We propose an end-to-end model named MentorGNN that aims to supervise the pre-training process of GNNs across graphs with diverse structures and disparate feature spaces. To comprehend heterogeneous graph signals at different granularities, we propose a curriculum learning paradigm that automatically re-weighs graph signals in order to ensure a good generalization in the target domain. Moreover, we shed new light on the problem of domain adaption on relational data (i.e., graphs) by deriving a natural and interpretable upper bound on the generalization error of the pre-trained GNNs. Extensive experiments on a wealth of real graphs validate and verify the performance of MentorGNN.  ( 2 min )
    Composing RNNs and FSTs for Small Data: Recovering Missing Characters in Old Hawaiian Text. (arXiv:2208.10248v1 [cs.CL])
    In contrast to the older writing system of the 19th century, modern Hawaiian orthography employs characters for long vowels and glottal stops. These extra characters account for about one-third of the phonemes in Hawaiian, so including them makes a big difference to reading comprehension and pronunciation. However, transliterating between older and newer texts is a laborious task when performed manually. We introduce two related methods to help solve this transliteration problem automatically, given that there were not enough data to train an end-to-end deep learning model. One method is implemented, end-to-end, using finite state transducers (FSTs). The other is a hybrid deep learning approach which approximately composes an FST with a recurrent neural network (RNN). We find that the hybrid approach outperforms the end-to-end FST by partitioning the original problem into one part that can be modelled by hand, using an FST, and into another part, which is easily solved by an RNN trained on the available data.
    Instability and Local Minima in GAN Training with Kernel Discriminators. (arXiv:2208.09938v1 [cs.LG])
    Generative Adversarial Networks (GANs) are a widely-used tool for generative modeling of complex data. Despite their empirical success, the training of GANs is not fully understood due to the min-max optimization of the generator and discriminator. This paper analyzes these joint dynamics when the true samples, as well as the generated samples, are discrete, finite sets, and the discriminator is kernel-based. A simple yet expressive framework for analyzing training called the $\textit{Isolated Points Model}$ is introduced. In the proposed model, the distance between true samples greatly exceeds the kernel width, so each generated point is influenced by at most one true point. Our model enables precise characterization of the conditions for convergence, both to good and bad minima. In particular, the analysis explains two common failure modes: (i) an approximate mode collapse and (ii) divergence. Numerical simulations are provided that predictably replicate these behaviors.  ( 2 min )
    Representation Learning with Graph Neural Networks for Speech Emotion Recognition. (arXiv:2208.09830v1 [cs.SD])
    Learning expressive representation is crucial in deep learning. In speech emotion recognition (SER), vacuum regions or noises in the speech interfere with expressive representation learning. However, traditional RNN-based models are susceptible to such noise. Recently, Graph Neural Network (GNN) has demonstrated its effectiveness for representation learning, and we adopt this framework for SER. In particular, we propose a cosine similarity-based graph as an ideal graph structure for representation learning in SER. We present a Cosine similarity-based Graph Convolutional Network (CoGCN) that is robust to perturbation and noise. Experimental results show that our method outperforms state-of-the-art methods or provides competitive results with a significant model size reduction with only 1/30 parameters.  ( 2 min )
    The Saddle-Point Accountant for Differential Privacy. (arXiv:2208.09595v1 [cs.CR])
    We introduce a new differential privacy (DP) accountant called the saddle-point accountant (SPA). SPA approximates privacy guarantees for the composition of DP mechanisms in an accurate and fast manner. Our approach is inspired by the saddle-point method -- a ubiquitous numerical technique in statistics. We prove rigorous performance guarantees by deriving upper and lower bounds for the approximation error offered by SPA. The crux of SPA is a combination of large-deviation methods with central limit theorems, which we derive via exponentially tilting the privacy loss random variables corresponding to the DP mechanisms. One key advantage of SPA is that it runs in constant time for the $n$-fold composition of a privacy mechanism. Numerical experiments demonstrate that SPA achieves comparable accuracy to state-of-the-art accounting methods with a faster runtime.  ( 2 min )
    Machine learning based surrogate models for microchannel heat sink optimization. (arXiv:2208.09683v1 [physics.flu-dyn])
    In this paper, microchannel designs with secondary channels and with ribs are investigated using computational fluid dynamics and are coupled with a multi-objective optimization algorithm to determine and propose optimal solutions based on observed thermal resistance and pumping power. A workflow that combines Latin hypercube sampling, machine learning-based surrogate modeling and multi-objective optimization is proposed. Random forests, gradient boosting algorithms and neural networks were considered during the search for the best surrogate. We demonstrated that tuned neural networks can make accurate predictions and be used to create an acceptable surrogate model. Optimized solutions show a negligible difference in overall performance when compared to the conventional optimization approach. Additionally, solutions are calculated in one-fifth of the original time. Generated designs attain temperatures that are lower by more than 10% under the same pressure limits as a convectional microchannel design. When limited by temperature, pressure drops are reduced by more than 25%. Finally, the influence of each design variable on the thermal resistance and pumping power was investigated by employing the SHapley Additive exPlanations technique. Overall, we have demonstrated that the proposed framework has merit and can be used as a viable methodology in microchannel heat sink design optimization.  ( 2 min )
    A Multi-Head Model for Continual Learning via Out-of-Distribution Replay. (arXiv:2208.09734v1 [cs.LG])
    This paper studies class incremental learning (CIL) of continual learning (CL). Many approaches have been proposed to deal with catastrophic forgetting (CF) in CIL. Most methods incrementally construct a single classifier for all classes of all tasks in a single head network. To prevent CF, a popular approach is to memorize a small number of samples from previous tasks and replay them during training of the new task. However, this approach still suffers from serious CF as the parameters learned for previous tasks are updated or adjusted with only the limited number of saved samples in the memory. This paper proposes an entirely different approach that builds a separate classifier (head) for each task (called a multi-head model) using a transformer network, called MORE. Instead of using the saved samples in memory to update the network for previous tasks/classes in the existing approach, MORE leverages the saved samples to build a task specific classifier (adding a new classification head) without updating the network learned for previous tasks/classes. The model for the new task in MORE is trained to learn the classes of the task and also to detect samples that are not from the same data distribution (i.e., out-of-distribution (OOD)) of the task. This enables the classifier for the task to which the test instance belongs to produce a high score for the correct class and the classifiers of other tasks to produce low scores because the test instance is not from the data distributions of these classifiers. Experimental results show that MORE outperforms state-of-the-art baselines and is also naturally capable of performing OOD detection in the continual learning setting.  ( 3 min )
    Predicting Exotic Hadron Masses with Data Augmentation Using Multilayer Perceptron. (arXiv:2208.09538v1 [hep-ph])
    Recently, there have been significant developments in neural networks; thus, neural networks have been frequently used in the physics literature. This work estimates the masses of exotic hadrons, doubly charmed and bottomed baryons from the meson and baryon masses using neural networks. Subsequently, the number of data has been increased using the artificial data augmentation technique proposed recently. We have observed that the neural network's predictive ability increases using augmented data. This study has shown that data augmentation techniques play an essential role in improving neural network predictions; moreover, neural networks can make reasonable predictions for exotic hadrons, doubly charmed, and doubly bottomed baryons. The results are also comparable to Gaussian Process and Constituent Quark Model.  ( 2 min )
    Stop&Hop: Early Classification of Irregular Time Series. (arXiv:2208.09795v1 [cs.LG])
    Early classification algorithms help users react faster to their machine learning model's predictions. Early warning systems in hospitals, for example, let clinicians improve their patients' outcomes by accurately predicting infections. While early classification systems are advancing rapidly, a major gap remains: existing systems do not consider irregular time series, which have uneven and often-long gaps between their observations. Such series are notoriously pervasive in impactful domains like healthcare. We bridge this gap and study early classification of irregular time series, a new setting for early classifiers that opens doors to more real-world problems. Our solution, Stop&Hop, uses a continuous-time recurrent network to model ongoing irregular time series in real time, while an irregularity-aware halting policy, trained with reinforcement learning, predicts when to stop and classify the streaming series. By taking real-valued step sizes, the halting policy flexibly decides exactly when to stop ongoing series in real time. This way, Stop&Hop seamlessly integrates information contained in the timing of observations, a new and vital source for early classification in this setting, with the time series values to provide early classifications for irregular time series. Using four synthetic and three real-world datasets, we demonstrate that Stop&Hop consistently makes earlier and more-accurate predictions than state-of-the-art alternatives adapted to this new problem. Our code is publicly available at https://github.com/thartvigsen/StopAndHop.  ( 3 min )
    Transferable Cross-Tokamak Disruption Prediction with Deep Hybrid Neural Network Feature Extractor. (arXiv:2208.09594v1 [physics.plasm-ph])
    Predicting disruptions across different tokamaks is a great obstacle to overcome. Future tokamaks can hardly tolerate disruptions at high performance discharge. Few disruption discharges at high performance can hardly compose an abundant training set, which makes it difficult for current data-driven methods to obtain an acceptable result. A machine learning method capable of transferring a disruption prediction model trained on one tokamak to another is required to solve the problem. The key is a disruption prediction model containing a feature extractor that is able to extract common disruption precursor traces in tokamak diagnostic data, and a transferable disruption classifier. Based on the concerns above, the paper first presents a deep fusion feature extractor designed specifically for extracting disruption precursor features from common diagnostics on tokamaks according to currently known precursors of disruption, providing a promising foundation for transferable models. The fusion feature extractor is proved by comparing with manual feature extraction on J-TEXT. Based on the feature extractor trained on J-TEXT, the disruption prediction model was transferred to EAST data with mere 20 discharges from EAST experiment. The performance is comparable with a model trained with 1896 discharges from EAST. From the comparison among other model training scenarios, transfer learning showed its potential in predicting disruptions across different tokamaks.  ( 3 min )
    Neural network facilitated ab initio derivation of linear formula: A case study on formulating the relationship between DNA motifs and gene expression. (arXiv:2208.09559v1 [q-bio.QM])
    Developing models with high interpretability and even deriving formulas to quantify relationships between biological data is an emerging need. We propose here a framework for ab initio derivation of sequence motifs and linear formula using a new approach based on the interpretable neural network model called contextual regression model. We showed that this linear model could predict gene expression levels using promoter sequences with a performance comparable to deep neural network models. We uncovered a list of 300 motifs with important regulatory roles on gene expression and showed that they also had significant contributions to cell-type specific gene expression in 154 diverse cell types. This work illustrates the possibility of deriving formulas to represent biology laws that may not be easily elucidated. (https://github.com/Wang-lab-UCSD/Motif_Finding_Contextual_Regression)  ( 2 min )
    Few-Shot Learning of Accurate Folding Landscape for Protein Structure Prediction. (arXiv:2208.09652v1 [cs.LG])
    Data-driven predictive methods which can efficiently and accurately transform protein sequences into biologically active structures are highly valuable for scientific research and therapeutical development. Determining accurate folding landscape using co-evolutionary information is fundamental to the success of modern protein structure prediction methods. As the state of the art, AlphaFold2 has dramatically raised the accuracy without performing explicit co-evolutionary analysis. Nevertheless, its performance still shows strong dependence on available sequence homologs. We investigated the cause of such dependence and presented EvoGen, a meta generative model, to remedy the underperformance of AlphaFold2 for poor MSA targets. EvoGen allows us to manipulate the folding landscape either by denoising the searched MSA or by generating virtual MSA, and helps AlphaFold2 fold accurately in low-data regime or even achieve encouraging performance with single-sequence predictions. Being able to make accurate predictions with few-shot MSA not only generalizes AlphaFold2 better for orphan sequences, but also democratizes its use for high-throughput applications. Besides, EvoGen combined with AlphaFold2 yields a probabilistic structure generation method which could explore alternative conformations of protein sequences, and the task-aware differentiable algorithm for sequence generation will benefit other related tasks including protein design.  ( 3 min )
    TopoDiff: A Performance and Constraint-Guided Diffusion Model for Topology Optimization. (arXiv:2208.09591v1 [cs.LG])
    Structural topology optimization, which aims to find the optimal physical structure that maximizes mechanical performance, is vital in engineering design applications in aerospace, mechanical, and civil engineering. Generative adversarial networks (GANs) have recently emerged as a popular alternative to traditional iterative topology optimization methods. However, these models are often difficult to train, have limited generalizability, and due to their goal of mimicking optimal topologies, neglect manufacturability and performance objectives like mechanical compliance. We propose TopoDiff, a conditional diffusion-model-based architecture to perform performance-aware and manufacturability-aware topology optimization that overcomes these issues. Our model introduces a surrogate model-based guidance strategy that actively favors structures with low compliance and good manufacturability. Our method significantly outperforms a state-of-art conditional GAN by reducing the average error on physical performance by a factor of eight and by producing 11 times fewer infeasible samples. By introducing diffusion models to topology optimization, we show that conditional diffusion models have the ability to outperform GANs in engineering design synthesis applications too. Our work also suggests a general framework for engineering optimization problems using diffusion models and external performance and constraint-aware guidance.  ( 2 min )
    From Time Series to Networks in R with the ts2net Package. (arXiv:2208.09660v1 [cs.SI])
    Network science established itself as a prominent tool for modeling time series and complex systems. This modeling process consists of transforming a set or a single time series into a network. Nodes may represent complete time series, segments, or single values, while links define associations or similarities between the represented parts. R is one of the main programming languages used in data science, statistics, and machine learning, with many packages available. However, no single package provides the necessary methods to transform time series into networks. This paper presents ts2net, an R package for modeling one or multiple time series into networks. The package provides the time series distance functions that can be easily computed in parallel and in supercomputers to process larger data sets and methods to transform distance matrices into networks. Ts2net also provides methods to transform a single time series into a network, such as recurrence networks, visibility graphs, and transition networks. Together with other packages, ts2net permits using network science and graph mining tools to extract information from time series.  ( 2 min )
    FLIS: Clustered Federated Learning via Inference Similarity for Non-IID Data Distribution. (arXiv:2208.09754v1 [cs.LG])
    Classical federated learning approaches yield significant performance degradation in the presence of Non-IID data distributions of participants. When the distribution of each local dataset is highly different from the global one, the local objective of each client will be inconsistent with the global optima which incur a drift in the local updates. This phenomenon highly impacts the performance of clients. This is while the primary incentive for clients to participate in federated learning is to obtain better personalized models. To address the above-mentioned issue, we present a new algorithm, FLIS, which groups the clients population in clusters with jointly trainable data distributions by leveraging the inference similarity of clients' models. This framework captures settings where different groups of users have their own objectives (learning tasks) but by aggregating their data with others in the same cluster (same learning task) to perform more efficient and personalized federated learning. We present experimental results to demonstrate the benefits of FLIS over the state-of-the-art benchmarks on CIFAR-100/10, SVHN, and FMNIST datasets. Our code is available at https://github.com/MMorafah/FLIS.  ( 2 min )
    Near-Optimal $\Phi$-Regret Learning in Extensive-Form Games. (arXiv:2208.09747v1 [cs.GT])
    In this paper, we establish efficient and uncoupled learning dynamics so that, when employed by all players in multiplayer perfect-recall imperfect-information extensive-form games, the \emph{trigger regret} of each player grows as $O(\log T)$ after $T$ repetitions of play. This improves exponentially over the prior best known trigger-regret bound of $O(T^{1/4})$, and settles a recent open question by Bai et al. (2022). As an immediate consequence, we guarantee convergence to the set of \emph{extensive-form correlated equilibria} and \emph{coarse correlated equilibria} at a near-optimal rate of $\frac{\log T}{T}$. Building on prior work, at the heart of our construction lies a more general result regarding fixed points deriving from rational functions with \emph{polynomial degree}, a property that we establish for the fixed points of \emph{(coarse) trigger deviation functions}. Moreover, our construction leverages a refined \textit{regret circuit} for the convex hull, which -- unlike prior guarantees -- preserves the \emph{RVU property} introduced by Syrgkanis et al. (NIPS, 2015); this observation has an independent interest in establishing near-optimal regret under learning dynamics based on a CFR-type decomposition of the regret.  ( 2 min )
    Matrix Completion with Cross-Concentrated Sampling: Bridging Uniform Sampling and CUR Sampling. (arXiv:2208.09723v1 [cs.LG])
    While uniform sampling has been widely studied in the matrix completion literature, CUR sampling approximates a low-rank matrix via row and column samples. Unfortunately, both sampling models lack flexibility for various circumstances in real-world applications. In this work, we propose a novel and easy-to-implement sampling strategy, coined Cross-Concentrated Sampling (CCS). By bridging uniform sampling and CUR sampling, CCS provides extra flexibility that can potentially save sampling costs in applications. In addition, we also provide a sufficient condition for CCS-based matrix completion. Moreover, we propose a highly efficient non-convex algorithm, termed Iterative CUR Completion (ICURC), for the proposed CCS model. Numerical experiments verify the empirical advantages of CCS and ICURC against uniform sampling and its baseline algorithms, on both synthetic and real-world datasets.  ( 2 min )
    Last-Iterate Convergence with Full- and Noisy-Information Feedback in Two-Player Zero-Sum Games. (arXiv:2208.09855v1 [cs.GT])
    The theory of learning in games is prominent in the AI community, motivated by several rising applications such as multi-agent reinforcement learning and Generative Adversarial Networks. We propose Mutation-driven Multiplicative Weights Update (M2WU) for learning an equilibrium in two-player zero-sum normal-form games and prove that it exhibits the last-iterate convergence property in both full- and noisy-information feedback settings. In the full-information feedback setting, the players observe their exact gradient vectors of the utility functions. On the other hand, in the noisy-information feedback setting, they can only observe the noisy gradient vectors. Existing algorithms, including the well-known Multiplicative Weights Update (MWU) and Optimistic MWU (OMWU) algorithms, fail to converge to a Nash equilibrium with noisy-information feedback. In contrast, M2WU exhibits the last-iterate convergence to a stationary point near a Nash equilibrium in both of the feedback settings. We then prove that it converges to an exact Nash equilibrium by adapting the mutation term iteratively. We empirically confirm that M2WU outperforms MWU and OMWU in exploitability and convergence rates.  ( 2 min )
    Adversarial contamination of networks in the setting of vertex nomination: a new trimming method. (arXiv:2208.09710v1 [stat.ML])
    As graph data becomes more ubiquitous, the need for robust inferential graph algorithms to operate in these complex data domains is crucial. In many cases of interest, inference is further complicated by the presence of adversarial data contamination. The effect of the adversary is frequently to change the data distribution in ways that negatively affect statistical and algorithmic performance. We study this phenomenon in the context of vertex nomination, a semi-supervised information retrieval task for network data. Here, a common suite of methods relies on spectral graph embeddings, which have been shown to provide both good algorithmic performance and flexible settings in which regularization techniques can be implemented to help mitigate the effect of an adversary. Many current regularization methods rely on direct network trimming to effectively excise the adversarial contamination, although this direct trimming often gives rise to complicated dependency structures in the resulting graph. We propose a new trimming method that operates in model space which can address both block structure contamination and white noise contamination (contamination whose distribution is unknown). This model trimming is more amenable to theoretical analysis while also demonstrating superior performance in a number of simulations, compared to direct trimming.  ( 2 min )
    PointDP: Diffusion-driven Purification against Adversarial Attacks on 3D Point Cloud Recognition. (arXiv:2208.09801v1 [cs.CV])
    3D Point cloud is becoming a critical data representation in many real-world applications like autonomous driving, robotics, and medical imaging. Although the success of deep learning further accelerates the adoption of 3D point clouds in the physical world, deep learning is notorious for its vulnerability to adversarial attacks. In this work, we first identify that the state-of-the-art empirical defense, adversarial training, has a major limitation in applying to 3D point cloud models due to gradient obfuscation. We further propose PointDP, a purification strategy that leverages diffusion models to defend against 3D adversarial attacks. We extensively evaluate PointDP on six representative 3D point cloud architectures, and leverage 10+ strong and adaptive attacks to demonstrate its lower-bound robustness. Our evaluation shows that PointDP achieves significantly better robustness than state-of-the-art purification methods under strong attacks. Results of certified defenses on randomized smoothing combined with PointDP will be included in the near future.  ( 2 min )
    Effectiveness of Function Matching in Driving Scene Recognition. (arXiv:2208.09694v1 [cs.CV])
    Knowledge distillation is an effective approach for training compact recognizers required in autonomous driving. Recent studies on image classification have shown that matching student and teacher on a wide range of data points is critical for improving performance in distillation. This concept (called function matching) is suitable for driving scene recognition, where generally an almost infinite amount of unlabeled data are available. In this study, we experimentally investigate the impact of using such a large amount of unlabeled data for distillation on the performance of student models in structured prediction tasks for autonomous driving. Through extensive experiments, we demonstrate that the performance of the compact student model can be improved dramatically and even match the performance of the large-scale teacher by knowledge distillation with massive unlabeled data.  ( 2 min )
    Trigger-free Event Detection via Derangement Reading Comprehension. (arXiv:2208.09659v1 [cs.CL])
    Event detection (ED), aiming to detect events from texts and categorize them, is vital to understanding actual happenings in real life. However, mainstream event detection models require high-quality expert human annotations of triggers, which are often costly and thus deter the application of ED to new domains. Therefore, in this paper, we focus on low-resource ED without triggers and aim to tackle the following formidable challenges: multi-label classification, insufficient clues, and imbalanced events distribution. We propose a novel trigger-free ED method via Derangement mechanism on a machine Reading Comprehension (DRC) framework. More specifically, we treat the input text as Context and concatenate it with all event type tokens that are deemed as Answers with an omitted default question. So we can leverage the self-attention in pre-trained language models to absorb semantic relations between input text and the event types. Moreover, we design a simple yet effective event derangement module (EDM) to prevent major events from being excessively learned so as to yield a more balanced training process. The experiment results show that our proposed trigger-free ED model is remarkably competitive to mainstream trigger-based models, showing its strong performance on low-source event detection.  ( 2 min )
    Adam Can Converge Without Any Modification on Update Rules. (arXiv:2208.09632v1 [cs.LG])
    Ever since Reddi et al. 2018 pointed out the divergence issue of Adam, many new variants have been designed to obtain convergence. However, vanilla Adam remains exceptionally popular and it works well in practice. Why is there a gap between theory and practice? We point out there is a mismatch between the settings of theory and practice: Reddi et al. 2018 pick the problem after picking the hyperparameters of Adam, i.e., $(\beta_1, \beta_2)$; while practical applications often fix the problem first and then tune $(\beta_1, \beta_2)$. Due to this observation, we conjecture that the empirical convergence can be theoretically justified, only if we change the order of picking the problem and hyperparameter. In this work, we confirm this conjecture. We prove that, when $\beta_2$ is large and $\beta_1 < \sqrt{\beta_2}<1$, Adam converges to the neighborhood of critical points. The size of the neighborhood is propositional to the variance of stochastic gradients. Under an extra condition (strong growth condition), Adam converges to critical points. As $\beta_2$ increases, our convergence result can cover any $\beta_1 \in [0,1)$ including $\beta_1=0.9$, which is the default setting in deep learning libraries. Our result shows that Adam can converge under a wide range of hyperparameters without any modification on its update rules. To our knowledge, we are the first to prove this result without strong assumptions such as bounded gradients. When $\beta_2$ is small, we further point out a large region of $(\beta_1,\beta_2)$ where Adam can diverge to infinity. Our divergence result considers the same setting as our convergence result, indicating a phase transition from divergence to convergence when increasing $\beta_2$. These positive and negative results can provide suggestions on how to tune Adam hyperparameters.  ( 3 min )
    C$^{2}$IMUFS: Complementary and Consensus Learning-based Incomplete Multi-view Unsupervised Feature Selection. (arXiv:2208.09736v1 [cs.LG])
    Multi-view unsupervised feature selection (MUFS) has been demonstrated as an effective technique to reduce the dimensionality of multi-view unlabeled data. The existing methods assume that all of views are complete. However, multi-view data are usually incomplete, i.e., a part of instances are presented on some views but not all views. Besides, learning the complete similarity graph, as an important promising technology in existing MUFS methods, cannot achieve due to the missing views. In this paper, we propose a complementary and consensus learning-based incomplete multi-view unsupervised feature selection method (C$^{2}$IMUFS) to address the aforementioned issues. Concretely, C$^{2}$IMUFS integrates feature selection into an extended weighted non-negative matrix factorization model equipped with adaptive learning of view-weights and a sparse $\ell_{2,p}$-norm, which can offer better adaptability and flexibility. By the sparse linear combinations of multiple similarity matrices derived from different views, a complementary learning-guided similarity matrix reconstruction model is presented to obtain the complete similarity graph in each view. Furthermore, C$^{2}$IMUFS learns a consensus clustering indicator matrix across different views and embeds it into a spectral graph term to preserve the local geometric structure. Comprehensive experimental results on real-world datasets demonstrate the effectiveness of C$^{2}$IMUFS compared with state-of-the-art methods.  ( 2 min )
    Graph neural networks for materials science and chemistry. (arXiv:2208.09481v1 [physics.chem-ph])
    Machine learning plays an increasingly important role in many areas of chemistry and materials science, e.g. to predict materials properties, to accelerate simulations, to design new materials, and to predict synthesis routes of new materials. Graph neural networks (GNNs) are one of the fastest growing classes of machine learning models. They are of particular relevance for chemistry and materials science, as they directly work on a graph or structural representation of molecules and materials and therefore have full access to all relevant information required to characterize materials. In this review article, we provide an overview of the basic principles of GNNs, widely used datasets, and state-of-the-art architectures, followed by a discussion of a wide range of recent applications of GNNs in chemistry and materials science, and concluding with a road-map for the further development and application of GNNs.  ( 2 min )
    A Novel Hybrid Sampling Framework for Imbalanced Learning. (arXiv:2208.09619v1 [cs.LG])
    Class imbalance is a frequently occurring scenario in classification tasks. Learning from imbalanced data poses a major challenge, which has instigated a lot of research in this area. Data preprocessing using sampling techniques is a standard approach to deal with the imbalance present in the data. Since standard classification algorithms do not perform well on imbalanced data, the dataset needs to be adequately balanced before training. This can be accomplished by oversampling the minority class or undersampling the majority class. In this study, a novel hybrid sampling algorithm has been proposed. To overcome the limitations of the sampling techniques while ensuring the quality of the retained sampled dataset, a sophisticated framework has been developed to properly combine three different sampling techniques. Neighborhood Cleaning rule is first applied to reduce the imbalance. Random undersampling is then strategically coupled with the SMOTE algorithm to obtain an optimal balance in the dataset. This proposed hybrid methodology, termed "SMOTE-RUS-NC", has been compared with other state-of-the-art sampling techniques. The strategy is further incorporated into the ensemble learning framework to obtain a more robust classification algorithm, termed "SRN-BRF". Rigorous experimentation has been conducted on 26 imbalanced datasets with varying degrees of imbalance. In virtually all datasets, the proposed two algorithms outperformed existing sampling strategies, in many cases by a substantial margin. Especially in highly imbalanced datasets where popular sampling techniques failed utterly, they achieved unparalleled performance. The superior results obtained demonstrate the efficacy of the proposed models and their potential to be powerful sampling algorithms in imbalanced domain.  ( 3 min )
    Are You Comfortable Now: Deep Learning the Temporal Variation in Thermal Comfort in Winters. (arXiv:2208.09628v1 [cs.LG])
    Indoor thermal comfort in smart buildings has a significant impact on the health and performance of occupants. Consequently, machine learning (ML) is increasingly used to solve challenges related to indoor thermal comfort. Temporal variability of thermal comfort perception is an important problem that regulates occupant well-being and energy consumption. However, in most ML-based thermal comfort studies, temporal aspects such as the time of day, circadian rhythm, and outdoor temperature are not considered. This work addresses these problems. It investigates the impact of circadian rhythm and outdoor temperature on the prediction accuracy and classification performance of ML models. The data is gathered through month-long field experiments carried out in 14 classrooms of 5 schools, involving 512 primary school students. Four thermal comfort metrics are considered as the outputs of Deep Neural Networks and Support Vector Machine models for the dataset. The effect of temporal variability on school children's comfort is shown through a "time of day" analysis. Temporal variability in prediction accuracy is demonstrated (up to 80%). Furthermore, we show that outdoor temperature (varying over time) positively impacts the prediction performance of thermal comfort models by up to 30%. The importance of spatio-temporal context is demonstrated by contrasting micro-level (location specific) and macro-level (6 locations across a city) performance. The most important finding of this work is that a definitive improvement in prediction accuracy is shown with an increase in the time of day and sky illuminance, for multiple thermal comfort metrics.  ( 3 min )
    A Dual Modality Approach For (Zero-Shot) Multi-Label Classification. (arXiv:2208.09562v1 [cs.CV])
    In computer vision, multi-label classification, including zero-shot multi-label classification are important tasks with many real-world applications. In this paper, we propose a novel algorithm, Aligned Dual moDality ClaSsifier (ADDS), which includes a Dual-Modal decoder (DM-decoder) with alignment between visual and textual features, for multi-label classification tasks. Moreover, we design a simple and yet effective method called Pyramid-Forwarding to enhance the performance for inputs with high resolutions. Extensive experiments conducted on standard multi-label benchmark datasets, MS-COCO and NUS-WIDE, demonstrate that our approach significantly outperforms previous methods and provides state-of-the-art performance for conventional multi-label classification, zero-shot multi-label classification, and an extreme case called single-to-multi label classification where models trained on single-label datasets (ImageNet-1k, ImageNet-21k) are tested on multi-label ones (MS-COCO and NUS-WIDE). We also analyze how visual-textual alignment contributes to the proposed approach, validate the significance of the DM-decoder, and demonstrate the effectiveness of Pyramid-Forwarding on vision transformer.  ( 2 min )
    Data-Driven Causal Effect Estimation Based on Graphical Causal Modelling: A Survey. (arXiv:2208.09590v1 [cs.AI])
    In many fields of scientific research and real-world applications, unbiased estimation of causal effects from non-experimental data is crucial for understanding the mechanism underlying the data and for decision-making on effective responses or interventions. A great deal of research has been conducted on this challenging problem from different angles. For causal effect estimation in data, assumptions such as Markov property, faithfulness and causal sufficiency are always made. Under the assumptions, full knowledge such as, a set of covariates or an underlying causal graph, is still required. A practical challenge is that in many applications, no such full knowledge or only some partial knowledge is available. In recent years, research has emerged to use a search strategy based on graphical causal modelling to discover useful knowledge from data for causal effect estimation, with some mild assumptions, and has shown promose in tackling the practical challenge. In this survey, we review the methods and focus on the challenges the data-driven methods face. We discuss the assumptions, strengths and limitations of the data-driven methods. We hope this review will motivate more researchers to design better data-driven methods based on graphical causal modelling for the challenging problem of causal effect estimation.  ( 2 min )
    DenseShift: Towards Accurate and Transferable Low-Bit Shift Network. (arXiv:2208.09708v1 [cs.CV])
    Deploying deep neural networks on low-resource edge devices is challenging due to their ever-increasing resource requirements. Recent investigations propose multiplication-free neural networks to reduce computation and memory consumption. Shift neural network is one of the most effective tools towards these reductions. However, existing low-bit shift networks are not as accurate as their full precision counterparts and cannot efficiently transfer to a wide range of tasks due to their inherent design flaws. We propose DenseShift network that exploits the following novel designs. First, we demonstrate that the zero-weight values in low-bit shift networks are neither useful to the model capacity nor simplify the model inference. Therefore, we propose to use a zero-free shifting mechanism to simplify inference while increasing the model capacity. Second, we design a new metric to measure the weight freezing issue in training low-bit shift networks, and propose a sign-scale decomposition to improve the training efficiency. Third, we propose the low-variance random initialization strategy to improve the model's performance in transfer learning scenarios. We run extensive experiments on various computer vision and speech tasks. The experimental results show that DenseShift network significantly outperforms existing low-bit multiplication-free networks and can achieve competitive performance to the full-precision counterpart. It also exhibits strong transfer learning performance with no drop in accuracy.  ( 3 min )
    Recurrent Neural Network-based Anti-jamming Framework for Defense Against Multiple Jamming Policies. (arXiv:2208.09518v1 [cs.LG])
    Conventional anti-jamming methods mainly focus on preventing single jammer attacks with an invariant jamming policy or jamming attacks from multiple jammers with similar jamming policies. These anti-jamming methods are ineffective against a single jammer following several different jamming policies or multiple jammers with distinct policies. Therefore, this paper proposes an anti-jamming method that can adapt its policy to the current jamming attack. Moreover, for the multiple jammers scenario, an anti-jamming method that estimates the future occupied channels using the jammers' occupied channels in previous time slots is proposed. In both single and multiple jammers scenarios, the interaction between the users and jammers is modeled using recurrent neural networks (RNN)s. The performance of the proposed anti-jamming methods is evaluated by calculating the users' successful transmission rate (STR) and ergodic rate (ER), and compared to a baseline based on Q-learning (DQL). Simulation results show that for the single jammer scenario, all the considered jamming policies are perfectly detected and high STR and ER are maintained. Moreover, when 70 % of the spectrum is under jamming attacks from multiple jammers, the proposed method achieves an STR and ER greater than 75 % and 80 %, respectively. These values rise to 90 % when 30 % of the spectrum is under jamming attacks. In addition, the proposed anti-jamming methods significantly outperform the DQL method for all the considered cases and jamming scenarios.  ( 3 min )
    An ensemble meta-estimator to predict source code testability. (arXiv:2208.09614v1 [cs.SE])
    Software testing could be a lengthy and costly process, especially if the software under test is not testable. Refactoring techniques may enhance testability by improving the software metrics affecting testability. The metrics are determined while building regression models learning how to relate metrics computed for a source code to its testability. We identified 15 software metrics highly affecting testability while interpreting our testability prediction model. Our experiments with 42 java classes reveal that refactorings that improve these 15 metrics could enhance testability by an average of 15.57%, besides improving some other quality attributes. Our testability prediction model is trained to map source code metrics to test effectiveness and efficiency as two significant ingredients of testable software. Test effectiveness improves as the coverage gained by the test suite increases. On the other hand, the test efficiency reduces as the size of the test suite increases. This article offers a mathematical model to compute class testability in terms of the size and coverage of the test suite. We use this mathematical model to compute testability as the target of our testability prediction model. The mathematical model requires the execution of the class under test to compute test coverage, while our regression model measures testability statically. Prediction of test results in terms of testability should precede the test to avoid unnecessary costs. Our testability prediction model has been trained and tested on 23,886 Java classes and 262 software metrics. The learned model predicts testability with an R2 of 0.68 and a mean squared error of 0.03.  ( 3 min )
    Sudakov-Fernique post-AMP, and a new proof of the local convexity of the TAP free energy. (arXiv:2208.09550v1 [math.PR])
    In many problems in modern statistics and machine learning, it is often of interest to establish that a first order method on a non-convex risk function eventually enters a region of parameter space in which the risk is locally convex. We derive an asymptotic comparison inequality, which we call the Sudakov-Fernique post-AMP inequality, which, in a certain class of problems involving a GOE matrix, is able to probe properties of an optimization landscape locally around the iterates of an approximate message passing (AMP) algorithm. As an example of its use, we provide a new, and arguably simpler, proof of some of the results of Celentano et al. (2021), which establishes that the so-called TAP free energy in the $\mathbb{Z}_2$-synchronization problem is locally convex in the region to which AMP converges. We further prove a conjecture of El Alaoui et al. (2022) involving the local convexity of a related but distinct TAP free energy, which, as a consequence, confirms that their algorithm efficiently samples from the Sherrington-Kirkpatrick Gibbs measure throughout the "easy" regime.  ( 2 min )
    Blind Image Deblurring with Unknown Kernel Size and Substantial Noise. (arXiv:2208.09483v1 [eess.IV])
    Blind image deblurring (BID) has been extensively studied in computer vision and adjacent fields. Modern methods for BID can be grouped into two categories: single-instance methods that deal with individual instances using statistical inference and numerical optimization, and data-driven methods that train deep-learning models to deblur future instances directly. Data-driven methods can be free from the difficulty in deriving accurate blur models, but are fundamentally limited by the diversity and quality of the training data -- collecting sufficiently expressive and realistic training data is a standing challenge. In this paper, we focus on single-instance methods that remain competitive and indispensable. However, most such methods do not prescribe how to deal with unknown kernel size and substantial noise, precluding practical deployment. Indeed, we show that several state-of-the-art (SOTA) single-instance methods are unstable when the kernel size is overspecified, and/or the noise level is high. On the positive side, we propose a practical BID method that is stable against both, the first of its kind. Our method builds on the recent ideas of solving inverse problems by integrating the physical models and structured deep neural networks, without extra training data. We introduce several crucial modifications to achieve the desired stability. Extensive empirical tests on standard synthetic datasets, as well as real-world NTIRE2020 and RealBlur datasets, show the superior effectiveness and practicality of our BID method compared to SOTA single-instance as well as data-driven methods. The code of our method is available at: \url{https://github.com/sun-umn/Blind-Image-Deblurring}.  ( 3 min )
    Spectral Decomposition Representation for Reinforcement Learning. (arXiv:2208.09515v1 [cs.LG])
    Representation learning often plays a critical role in reinforcement learning by managing the curse of dimensionality. A representative class of algorithms exploits a spectral decomposition of the stochastic transition dynamics to construct representations that enjoy strong theoretical properties in an idealized setting. However, current spectral methods suffer from limited applicability because they are constructed for state-only aggregation and derived from a policy-dependent transition kernel, without considering the issue of exploration. To address these issues, we propose an alternative spectral method, Spectral Decomposition Representation (SPEDER), that extracts a state-action abstraction from the dynamics without inducing spurious dependence on the data collection policy, while also balancing the exploration-versus-exploitation trade-off during learning. A theoretical analysis establishes the sample efficiency of the proposed algorithm in both the online and offline settings. In addition, an experimental investigation demonstrates superior performance over current state-of-the-art algorithms across several benchmarks.  ( 2 min )
    Weighted Maximum Entropy Inverse Reinforcement Learning. (arXiv:2208.09611v1 [cs.LG])
    We study inverse reinforcement learning (IRL) and imitation learning (IM), the problems of recovering a reward or policy function from expert's demonstrated trajectories. We propose a new way to improve the learning process by adding a weight function to the maximum entropy framework, with the motivation of having the ability to learn and recover the stochasticity (or the bounded rationality) of the expert policy. Our framework and algorithms allow to learn both a reward (or policy) function and the structure of the entropy terms added to the Markov Decision Processes, thus enhancing the learning procedure. Our numerical experiments using human and simulated demonstrations and with discrete and continuous IRL/IM tasks show that our approach outperforms prior algorithms.  ( 2 min )
    Intersection of Parallels as an Early Stopping Criterion. (arXiv:2208.09529v1 [cs.LG])
    A common way to avoid overfitting in supervised learning is early stopping, where a held-out set is used for iterative evaluation during training to find a sweet spot in the number of training steps that gives maximum generalization. However, such a method requires a disjoint validation set, thus part of the labeled data from the training set is usually left out for this purpose, which is not ideal when training data is scarce. Furthermore, when the training labels are noisy, the performance of the model over a validation set may not be an accurate proxy for generalization. In this paper, we propose a method to spot an early stopping point in the training iterations without the need for a validation set. We first show that in the overparameterized regime the randomly initialized weights of a linear model converge to the same direction during training. Using this result, we propose to train two parallel instances of a linear model, initialized with different random seeds, and use their intersection as a signal to detect overfitting. In order to detect intersection, we use the cosine distance between the weights of the parallel models during training iterations. Noticing that the final layer of a NN is a linear map of pre-last layer activations to output logits, we build on our criterion for linear models and propose an extension to multi-layer networks, using the new notion of counterfactual weights. We conduct experiments on two areas that early stopping has noticeable impact on preventing overfitting of a NN: (i) learning from noisy labels; and (ii) learning to rank in IR. Our experiments on four widely used datasets confirm the effectiveness of our method for generalization. For a wide range of learning rates, our method, called Cosine-Distance Criterion (CDC), leads to better generalization on average than all the methods that we compare against in almost all of the tested cases.  ( 3 min )
    Exploring Popularity Bias in Music Recommendation Models and Commercial Steaming Services. (arXiv:2208.09517v1 [cs.IR])
    Popularity bias is the idea that a recommender system will unduly favor popular artists when recommending artists to users. As such, they may contribute to a winner-take-all marketplace in which a small number of artists receive nearly all of the attention, while similarly meritorious artists are unlikely to be discovered. In this paper, we attempt to measure popularity bias in three state-of-art recommender system models (e.g., SLIM, Multi-VAE, WRMF) and on three commercial music streaming services (Spotify, Amazon Music, YouTube). We find that the most accurate model (SLIM) also has the most popularity bias while less accurate models have less popularity bias. We also find no evidence of popularity bias in the commercial recommendations based on a simulated user experiment.  ( 2 min )
    Calculus on MDPs: Potential Shaping as a Gradient. (arXiv:2208.09570v1 [cs.LG])
    In reinforcement learning, different reward functions can be equivalent in terms of the optimal policies they induce. A particularly well-known and important example is potential shaping, a class of functions that can be added to any reward function without changing the optimal policy set under arbitrary transition dynamics. Potential shaping is conceptually similar to potentials, conservative vector fields and gauge transformations in math and physics, but this connection has not previously been formally explored. We develop a formalism for discrete calculus on graphs that abstract a Markov Decision Process, and show how potential shaping can be formally interpreted as a gradient within this framework. This allows us to strengthen results from Ng et al. (1999) describing conditions under which potential shaping is the only additive reward transformation to always preserve optimal policies. As an additional application of our formalism, we define a rule for picking a single unique reward function from each potential shaping equivalence class.  ( 2 min )
    In Silico Prediction of Blood-Brain Barrier Permeability of Chemical Compounds through Molecular Feature Modeling. (arXiv:2208.09484v1 [q-bio.QM])
    The introduction of computational techniques to analyze chemical data has given rise to the analytical study of biological systems, known as "bioinformatics". One facet of bioinformatics is using machine learning (ML) technology to detect multivariable trends in various cases. Amongst the most pressing cases is predicting blood-brain barrier (BBB) permeability. The development of new drugs to treat central nervous system disorders presents unique challenges due to poor penetration efficacy across the blood-brain barrier. In this research, we aim to mitigate this problem through an ML model that analyzes chemical features. To do so: (i) An overview into the relevant biological systems and processes as well as the use case is given. (ii) Second, an in-depth literature review of existing computational techniques for detecting BBB permeability is undertaken. From there, an aspect unexplored across current techniques is identified and a solution is proposed. (iii) Lastly, a two-part in silico model to quantify likelihood of permeability of drugs with defined features across the BBB through passive diffusion is developed, tested, and reflected on. Testing and validation with the dataset determined the predictive logBB model's mean squared error to be around 0.112 units and the neuroinflammation model's mean squared error to be approximately 0.3 units, outperforming all relevant studies found.  ( 3 min )
    Improving Multilayer-Perceptron(MLP)-based Network Anomaly Detection with Birch Clustering on CICIDS-2017 Dataset. (arXiv:2208.09711v1 [cs.CR])
    Machine learning algorithms have been widely used in intrusion detection systems, including Multi-layer Perceptron (MLP). In this study, we proposed a two-stage model that combines the Birch clustering algorithm and MLP classifier to improve the performance of network anomaly multi-classification. In our proposed method, we first apply Birch or Kmeans as an unsupervised clustering algorithm to the CICIDS-2017 dataset to pre-group the data. The generated pseudo-label is then added as an additional feature to the training of the MLP-based classifier. The experimental results show that using Birch and K-Means clustering for data pre-grouping can improve intrusion detection system performance. Our method can achieve 99.73% accuracy in multi-classification using Birch clustering, which is better than similar researches using a stand-alone MLP model.  ( 2 min )
    Learning to predict test effectiveness. (arXiv:2208.09623v1 [cs.SE])
    The high cost of the test can be dramatically reduced, provided that the coverability as an inherent feature of the code under test is predictable. This article offers a machine learning model to predict the extent to which the test could cover a class in terms of a new metric called Coverageability. The prediction model consists of an ensemble of four regression models. The learning samples consist of feature vectors, where features are source code metrics computed for a class. The samples are labeled by the Coverageability values computed for their corresponding classes. We offer a mathematical model to evaluate test effectiveness in terms of size and coverage of the test suite generated automatically for each class. We extend the size of the feature space by introducing a new approach to defining sub-metrics in terms of existing source code metrics. Using feature importance analysis on the learned prediction models, we sort source code metrics in the order of their impact on the test effectiveness. As a result of which, we found the class strict cyclomatic complexity as the most influential source code metric. Our experiments with the prediction models on a large corpus of Java projects containing about 23,000 classes demonstrate the Mean Absolute Error (MAE) of 0.032, Mean Squared Error (MSE) of 0.004, and an R2-score of 0.855. Compared with the state-of-the-art coverage prediction models, our models improve MAE, MSE, and an R2-score by 5.78%, 2.84%, and 20.71%, respectively.  ( 3 min )
  • Open

    Model-Free Non-Stationary RL: Near-Optimal Regret and Applications in Multi-Agent RL and Inventory Control. (arXiv:2010.03161v4 [cs.LG] UPDATED)
    We consider model-free reinforcement learning (RL) in non-stationary Markov decision processes. Both the reward functions and the state transition functions are allowed to vary arbitrarily over time as long as their cumulative variations do not exceed certain variation budgets. We propose Restarted Q-Learning with Upper Confidence Bounds (RestartQ-UCB), the first model-free algorithm for non-stationary RL, and show that it outperforms existing solutions in terms of dynamic regret. Specifically, RestartQ-UCB with Freedman-type bonus terms achieves a dynamic regret bound of $\widetilde{O}(S^{\frac{1}{3}} A^{\frac{1}{3}} \Delta^{\frac{1}{3}} H T^{\frac{2}{3}})$, where $S$ and $A$ are the numbers of states and actions, respectively, $\Delta>0$ is the variation budget, $H$ is the number of time steps per episode, and $T$ is the total number of time steps. We further present a parameter-free algorithm named Double-Restart Q-UCB that does not require prior knowledge of the variation budget. We show that our algorithms are \emph{nearly optimal} by establishing an information-theoretical lower bound of $\Omega(S^{\frac{1}{3}} A^{\frac{1}{3}} \Delta^{\frac{1}{3}} H^{\frac{2}{3}} T^{\frac{2}{3}})$, the first lower bound in non-stationary RL. Numerical experiments validate the advantages of RestartQ-UCB in terms of both cumulative rewards and computational efficiency. We demonstrate the power of our results in examples of multi-agent RL and inventory control across related products.  ( 3 min )
    Deciding What to Model: Value-Equivalent Sampling for Reinforcement Learning. (arXiv:2206.02072v1 [cs.LG] CROSS LISTED)
    The quintessential model-based reinforcement-learning agent iteratively refines its estimates or prior beliefs about the true underlying model of the environment. Recent empirical successes in model-based reinforcement learning with function approximation, however, eschew the true model in favor of a surrogate that, while ignoring various facets of the environment, still facilitates effective planning over behaviors. Recently formalized as the value equivalence principle, this algorithmic technique is perhaps unavoidable as real-world reinforcement learning demands consideration of a simple, computationally-bounded agent interacting with an overwhelmingly complex environment, whose underlying dynamics likely exceed the agent's capacity for representation. In this work, we consider the scenario where agent limitations may entirely preclude identifying an exactly value-equivalent model, immediately giving rise to a trade-off between identifying a model that is simple enough to learn while only incurring bounded sub-optimality. To address this problem, we introduce an algorithm that, using rate-distortion theory, iteratively computes an approximately-value-equivalent, lossy compression of the environment which an agent may feasibly target in lieu of the true model. We prove an information-theoretic, Bayesian regret bound for our algorithm that holds for any finite-horizon, episodic sequential decision-making problem. Crucially, our regret bound can be expressed in one of two possible forms, providing a performance guarantee for finding either the simplest model that achieves a desired sub-optimality gap or, alternatively, the best model given a limit on agent capacity.  ( 3 min )
    Why Robust Generalization in Deep Learning is Difficult: Perspective of Expressive Power. (arXiv:2205.13863v2 [cs.LG] UPDATED)
    It is well-known that modern neural networks are vulnerable to adversarial examples. To mitigate this problem, a series of robust learning algorithms have been proposed. However, although the robust training error can be near zero via some methods, all existing algorithms lead to a high robust generalization error. In this paper, we provide a theoretical understanding of this puzzling phenomenon from the perspective of expressive power for deep neural networks. Specifically, for binary classification problems with well-separated data, we show that, for ReLU networks, while mild over-parameterization is sufficient for high robust training accuracy, there exists a constant robust generalization gap unless the size of the neural network is exponential in the data dimension $d$. Even if the data is linear separable, which means achieving low clean generalization error is easy, we can still prove an $\exp({\Omega}(d))$ lower bound for robust generalization. In general, our exponential lower bounds hold true for a variety of neural network families and other function classes as well, as long as their VC dimension is at most polynomial in the number of parameters. Moreover, we establish an improved upper bound of $\exp({\mathcal{O}}(k))$ for the network size to achieve low robust generalization error when the data lies on a manifold with intrinsic dimension $k$ ($k \ll d$). Nonetheless, we also have a lower bound that grows exponentially with respect to $k$ -- the curse of dimensionality is inevitable. By demonstrating an exponential separation between the network size for achieving low robust training and generalization error, our results reveal that the hardness of robust generalization may stem from the expressive power of practical models.  ( 3 min )
    Regret Analysis of Certainty Equivalence Policies in Continuous-Time Linear-Quadratic Systems. (arXiv:2206.04434v2 [cs.LG] UPDATED)
    This work theoretically studies a ubiquitous reinforcement learning policy for controlling the canonical model of continuous-time stochastic linear-quadratic systems. We show that randomized certainty equivalent policy addresses the exploration-exploitation dilemma in linear control systems that evolve according to unknown stochastic differential equations and their operating cost is quadratic. More precisely, we establish square-root of time regret bounds, indicating that randomized certainty equivalent policy learns optimal control actions fast from a single state trajectory. Further, linear scaling of the regret with the number of parameters is shown. The presented analysis introduces novel and useful technical approaches, and sheds light on fundamental challenges of continuous-time reinforcement learning.  ( 2 min )
    Stability of Image-Reconstruction Algorithms. (arXiv:2206.07128v2 [math.OC] UPDATED)
    Robustness and stability of image-reconstruction algorithms have recently come under scrutiny. Their importance to medical imaging cannot be overstated. We review the known results for the topical variational regularization strategies ($\ell_2$ and $\ell_1$ regularization) and present novel stability results for $\ell_p$-regularized linear inverse problems for $p\in(1,\infty)$. Our results guarantee Lipschitz continuity for small $p$ and H\"{o}lder continuity for larger $p$. They generalize well to the $L_p(\Omega)$ function spaces.  ( 2 min )
    Scale invariant process regression. (arXiv:2208.10461v1 [stat.ML])
    Gaussian processes are the leading method for non-parametric regression on small to medium datasets. One main challenge is the choice of kernel and optimization of hyperparameters. We propose a novel regression method that does not require specification of a kernel, length scale, variance, nor prior mean. Its only hyperparameter is the assumed regularity (degree of differentiability) of the true function. We achieve this with a novel non-Gaussian stochastic process that we construct from minimal assumptions of translation and scale invariance. The process can be thought of as a hierarchical Gaussian process model, where the hyperparameters have been incorporated into the process itself. To perform inference with this process we develop the required mathematical tools. It turns out that for interpolation, the posterior is a t-process with a polyharmonic spline as mean. For regression, we state the exact posterior and find its mean (again a polyharmonic spline) and approximate variance with a sampling method. Experiments show a performance equal to that of Gaussian processes with optimized hyperparameters. The most important insight is that it is possible to derive a working machine learning method by assuming nothing but regularity and scale- and translation invariance, without any other model assumptions.  ( 2 min )
    Hierarchical Bayesian Modelling for Knowledge Transfer Across Engineering Fleets via Multitask Learning. (arXiv:2204.12404v2 [stat.ML] UPDATED)
    A population-level analysis is proposed to address data sparsity when building predictive models for engineering infrastructure. Utilising an interpretable hierarchical Bayesian approach and operational fleet data, domain expertise is naturally encoded (and appropriately shared) between different sub-groups, representing (i) use-type, (ii) component, or (iii) operating condition. Specifically, domain expertise is exploited to constrain the model via assumptions (and prior distributions) allowing the methodology to automatically share information between similar assets, improving the survival analysis of a truck fleet and power prediction in a wind farm. In each asset management example, a set of correlated functions is learnt over the fleet, in a combined inference, to learn a population model. Parameter estimation is improved when sub-fleets are allowed to share correlated information at different levels in the hierarchy. In turn, groups with incomplete data automatically borrow statistical strength from those that are data-rich. The statistical correlations enable knowledge transfer via Bayesian transfer learning, and the correlations can be inspected to inform which assets share information for which effect (i.e. parameter). Successes in both case studies demonstrate the wide applicability in practical infrastructure monitoring, since the approach is naturally adapted between interpretable fleet models of different in-situ examples.  ( 3 min )
    Uncertainty-Aware Mixed-Variable Machine Learning for Materials Design. (arXiv:2207.04994v2 [stat.ML] UPDATED)
    Data-driven design shows the promise of accelerating materials discovery but is challenging due to the prohibitive cost of searching the vast design space of chemistry, structure, and synthesis methods. Bayesian Optimization (BO) employs uncertainty-aware machine learning models to select promising designs to evaluate, hence reducing the cost. However, BO with mixed numerical and categorical variables, which is of particular interest in materials design, has not been well studied. In this work, we survey frequentist and Bayesian approaches to uncertainty quantification of machine learning with mixed variables. We then conduct a systematic comparative study of their performances in BO using a popular representative model from each group, the random forest-based Lolo model (frequentist) and the latent variable Gaussian process model (Bayesian). We examine the efficacy of the two models in the optimization of mathematical functions, as well as properties of structural and functional materials, where we observe performance differences as related to problem dimensionality and complexity. By investigating the machine learning models' predictive and uncertainty estimation capabilities, we provide interpretations of the observed performance differences. Our results provide practical guidance on choosing between frequentist and Bayesian uncertainty-aware machine learning models for mixed-variable BO in materials design.  ( 3 min )
    VC Theoretical Explanation of Double Descent. (arXiv:2205.15549v2 [stat.ML] UPDATED)
    There has been growing interest in generalization performance of large multilayer neural networks that can be trained to achieve zero training error, while generalizing well on test data. This regime is known as 'second descent' and it appears to contradict the conventional view that optimal model complexity should reflect an optimal balance between underfitting and overfitting, i.e., the bias-variance trade-off. This paper presents a VC-theoretical analysis of double descent and shows that it can be fully explained by classical VC-generalization bounds. We illustrate an application of analytic VC-bounds for modeling double descent for classification problems, using empirical results for several learning methods, such as SVM, Least Squares, and Multilayer Perceptron classifiers. In addition, we discuss several reasons for the misinterpretation of VC-theoretical results in Deep Learning community.  ( 2 min )
    Undersampling is a Minimax Optimal Robustness Intervention in Nonparametric Classification. (arXiv:2205.13094v2 [cs.LG] UPDATED)
    While a broad range of techniques have been proposed to tackle distribution shift, the simple baseline of training on an $\textit{undersampled}$ dataset often achieves close to state-of-the-art-accuracy across several popular benchmarks. This is rather surprising, since undersampling algorithms discard excess majority group data. To understand this phenomenon, we ask if learning is fundamentally constrained by a lack of minority group samples. We prove that this is indeed the case in the setting of nonparametric binary classification. Our results show that in the worst case, an algorithm cannot outperform undersampling unless there is a high degree of overlap between the train and test distributions (which is unlikely to be the case in real-world datasets), or if the algorithm leverages additional structure about the distribution shift. In particular, in the case of label shift we show that there is always an undersampling algorithm that is minimax optimal. While in the case of group-covariate shift we show that there is an undersampling algorithm that is minimax optimal when the overlap between the group distributions is small. We also perform an experimental case study on a label shift dataset and find that in line with our theory the test accuracy of robust neural network classifiers is constrained by the number of minority samples.  ( 3 min )
    Distributionally robust risk evaluation with causality constraint and structural information. (arXiv:2203.10571v2 [q-fin.MF] UPDATED)
    This work studies distributionally robust evaluation of expected function values over temporal data. A set of alternative measures is characterized by the causal optimal transport. We prove the strong duality and recast the causality constraint as minimization over an infinite-dimensional test function space. We approximate test functions by neural networks and prove the sample complexity with Rademacher complexity. Moreover, when structural information is available to further restrict the ambiguity set, we prove the dual formulation and provide efficient optimization methods. Empirical analysis on realized volatility and stock indices demonstrate that our framework offers an attractive alternative to the classic optimal transport formulation.  ( 2 min )
    Tight Bounds on the Smallest Eigenvalue of the Neural Tangent Kernel for Deep ReLU Networks. (arXiv:2012.11654v5 [stat.ML] UPDATED)
    A recent line of work has analyzed the theoretical properties of deep neural networks via the Neural Tangent Kernel (NTK). In particular, the smallest eigenvalue of the NTK has been related to the memorization capacity, the global convergence of gradient descent algorithms and the generalization of deep nets. However, existing results either provide bounds in the two-layer setting or assume that the spectrum of the NTK matrices is bounded away from 0 for multi-layer networks. In this paper, we provide tight bounds on the smallest eigenvalue of NTK matrices for deep ReLU nets, both in the limiting case of infinite widths and for finite widths. In the finite-width setting, the network architectures we consider are fairly general: we require the existence of a wide layer with roughly order of $N$ neurons, $N$ being the number of data samples; and the scaling of the remaining layer widths is arbitrary (up to logarithmic factors). To obtain our results, we analyze various quantities of independent interest: we give lower bounds on the smallest singular value of hidden feature matrices, and upper bounds on the Lipschitz constant of input-output feature maps.  ( 3 min )
    Attentive Walk-Aggregating Graph Neural Networks. (arXiv:2110.02667v2 [cs.LG] UPDATED)
    Graph neural networks (GNNs) have been shown to possess strong representation power, which can be exploited for downstream prediction tasks on graph-structured data, such as molecules and social networks. They typically learn representations by aggregating information from the $K$-hop neighborhood of individual vertices or from the enumerated walks in the graph. Prior studies have demonstrated the effectiveness of incorporating weighting schemes into GNNs; however, this has been primarily limited to $K$-hop neighborhood GNNs so far. In this paper, we aim to design an algorithm incorporating weighting schemes into walk-aggregating GNNs and analyze their effect. We propose a novel GNN model, called AWARE, that aggregates information about the walks in the graph using attention schemes. This leads to an end-to-end supervised learning method for graph-level prediction tasks in the standard setting where the input is the adjacency and vertex information of a graph, and the output is a predicted label for the graph. We then perform theoretical, empirical, and interpretability analyses of AWARE. Our theoretical analysis in a simplified setting identifies successful conditions for provable guarantees, demonstrating how the graph information is encoded in the representation, and how the weighting schemes in AWARE affect the representation and learning performance. Our experiments demonstrate the strong performance of AWARE in graph-level prediction tasks in the standard setting in the domains of molecular property prediction and social networks. Lastly, our interpretation study illustrates that AWARE can successfully capture the important substructures of the input graph. The code is available on $\href{https://github.com/mehmetfdemirel/aware}{GitHub}$.  ( 3 min )
    On the Theory of Reinforcement Learning with Once-per-Episode Feedback. (arXiv:2105.14363v3 [cs.LG] UPDATED)
    We study a theory of reinforcement learning (RL) in which the learner receives binary feedback only once at the end of an episode. While this is an extreme test case for theory, it is also arguably more representative of real-world applications than the traditional requirement in RL practice that the learner receive feedback at every time step. Indeed, in many real-world applications of reinforcement learning, such as self-driving cars and robotics, it is easier to evaluate whether a learner's complete trajectory was either "good" or "bad," but harder to provide a reward signal at each step. To show that learning is possible in this more challenging setting, we study the case where trajectory labels are generated by an unknown parametric model, and provide a statistically and computationally efficient algorithm that achieves sublinear regret.  ( 2 min )
    Multivariate Boosted Trees and Applications to Forecasting and Control. (arXiv:2003.03835v2 [cs.LG] UPDATED)
    Gradient boosted trees are competition-winning, general-purpose, non-parametric regressors, which exploit sequential model fitting and gradient descent to minimize a specific loss function. The most popular implementations are tailored to univariate regression and classification tasks, precluding the possibility of capturing multivariate target cross-correlations and applying structured penalties to the predictions. In this paper, we present a computationally efficient algorithm for fitting multivariate boosted trees. We show that multivariate trees can outperform their univariate counterpart when the predictions are correlated. Furthermore, the algorithm allows to arbitrarily regularize the predictions, so that properties like smoothness, consistency and functional relations can be enforced. We present applications and numerical results related to forecasting and control.  ( 2 min )
    Leveraging Cross Feedback of User and Item Embeddings with Attention for Variational Autoencoder based Collaborative Filtering. (arXiv:2002.09145v3 [cs.LG] UPDATED)
    Matrix factorization (MF) has been widely applied to collaborative filtering in recommendation systems. Its Bayesian variants can derive posterior distributions of user and item embeddings, and are more robust to sparse ratings. However, the Bayesian methods are restricted by their update rules for the posterior parameters due to the conjugacy of the priors and the likelihood. Variational autoencoders (VAE) can address this issue by capturing complex mappings between the posterior parameters and the data. However, current research on VAEs for collaborative filtering only considers the mappings based on the explicit data information while the implicit embedding information is overlooked. In this paper, we first derive evidence lower bounds (ELBO) for Bayesian MF models from two viewpoints: user-oriented and item-oriented. Based on the ELBOs, we propose a VAE-based Bayesian MF framework. It leverages not only the data but also the embedding information to approximate the user-item joint distribution. As suggested by the ELBOs, the approximation is iterative with cross feedback of user and item embeddings into each other's encoders. More specifically, user embeddings sampled at the previous iteration are fed to the item-side encoders to estimate the posterior parameters for the item embeddings at the current iteration, and vice versa. The estimation also attends to the cross-fed embeddings to further exploit useful information. The decoder then reconstructs the data via the matrix factorization over the currently re-sampled user and item embeddings.  ( 3 min )
    Quadratic Metric Elicitation for Fairness and Beyond. (arXiv:2011.01516v3 [stat.ML] UPDATED)
    Metric elicitation is a recent framework for eliciting classification performance metrics that best reflect implicit user preferences based on the task and context. However, available elicitation strategies have been limited to linear (or quasi-linear) functions of predictive rates, which can be practically restrictive for many applications including fairness. This paper develops a strategy for eliciting more flexible multiclass metrics defined by quadratic functions of rates, designed to reflect human preferences better. We show its application in eliciting quadratic violation-based group-fair metrics. Our strategy requires only relative preference feedback, is robust to noise, and achieves near-optimal query complexity. We further extend this strategy to eliciting polynomial metrics -- thus broadening the use cases for metric elicitation.  ( 2 min )
    Minimax-Optimal Multi-Agent RL in Zero-Sum Markov Games With a Generative Model. (arXiv:2208.10458v1 [cs.LG])
    This paper is concerned with two-player zero-sum Markov games -- arguably the most basic setting in multi-agent reinforcement learning -- with the goal of learning a Nash equilibrium (NE) sample-optimally. All prior results suffer from at least one of the two obstacles: the curse of multiple agents and the barrier of long horizon, regardless of the sampling protocol in use. We take a step towards settling this problem, assuming access to a flexible sampling mechanism: the generative model. Focusing on non-stationary finite-horizon Markov games, we develop a learning algorithm $\mathsf{Nash}\text{-}\mathsf{Q}\text{-}\mathsf{FTRL}$ and an adaptive sampling scheme that leverage the optimism principle in adversarial learning (particularly the Follow-the-Regularized-Leader (FTRL) method), with a delicate design of bonus terms that ensure certain decomposability under the FTRL dynamics. Our algorithm learns an $\varepsilon$-approximate Markov NE policy using $$ \widetilde{O}\bigg( \frac{H^4 S(A+B)}{\varepsilon^2} \bigg) $$ samples, where $S$ is the number of states, $H$ is the horizon, and $A$ (resp.~$B$) denotes the number of actions for the max-player (resp.~min-player). This is nearly un-improvable in a minimax sense. Along the way, we derive a refined regret bound for FTRL that makes explicit the role of variance-type quantities, which might be of independent interest.  ( 2 min )
    Confident Learning: Estimating Uncertainty in Dataset Labels. (arXiv:1911.00068v6 [stat.ML] UPDATED)
    Learning exists in the context of data, yet notions of confidence typically focus on model predictions, not label quality. Confident learning (CL) is an alternative approach which focuses instead on label quality by characterizing and identifying label errors in datasets, based on the principles of pruning noisy data, counting with probabilistic thresholds to estimate noise, and ranking examples to train with confidence. Whereas numerous studies have developed these principles independently, here, we combine them, building on the assumption of a class-conditional noise process to directly estimate the joint distribution between noisy (given) labels and uncorrupted (unknown) labels. This results in a generalized CL which is provably consistent and experimentally performant. We present sufficient conditions where CL exactly finds label errors, and show CL performance exceeding seven recent competitive approaches for learning with noisy labels on the CIFAR dataset. Uniquely, the CL framework is not coupled to a specific data modality or model (e.g., we use CL to find several label errors in the presumed error-free MNIST dataset and improve sentiment classification on text data in Amazon Reviews). We also employ CL on ImageNet to quantify ontological class overlap (e.g., estimating 645 "missile" images are mislabeled as their parent class "projectile"), and moderately increase model accuracy (e.g., for ResNet) by cleaning data prior to training. These results are replicable using the open-source cleanlab release.  ( 3 min )
    Group selection and shrinkage: Structured sparsity for semiparametric models. (arXiv:2105.12081v2 [stat.ME] UPDATED)
    Sparse regression and classification estimators that respect group structures have application to an assortment of statistical and machine learning problems, from multitask learning to sparse additive modeling to hierarchical selection. This work introduces structured sparse estimators that combine group subset selection with shrinkage. To accommodate sophisticated structures, our estimators allow for arbitrary overlap between groups. We develop an optimization framework for fitting the nonconvex regularization surface and present finite-sample error bounds for estimation of the regression function. As an application requiring structure, we study sparse semiparametric modeling, a procedure that allows the effect of each predictor to be zero, linear, or nonlinear. For this task, the new estimators improve across several metrics on synthetic data compared to alternatives. Finally, we demonstrate their efficacy in modeling supermarket foot traffic and economic recessions using many predictors. These demonstrations suggest sparse semiparametric models, fit using the new estimators, are an excellent compromise between fully linear and fully nonparametric alternatives. All of our algorithms are made available in the scalable implementation grpsel.  ( 2 min )
    GAT: Generative Adversarial Training for Adversarial Example Detection and Robust Classification. (arXiv:1905.11475v3 [cs.LG] UPDATED)
    The vulnerabilities of deep neural networks against adversarial examples have become a significant concern for deploying these models in sensitive domains. Devising a definitive defense against such attacks is proven to be challenging, and the methods relying on detecting adversarial samples are only valid when the attacker is oblivious to the detection mechanism. In this paper we propose a principled adversarial example detection method that can withstand norm-constrained white-box attacks. Inspired by one-versus-the-rest classification, in a K class classification problem, we train K binary classifiers where the i-th binary classifier is used to distinguish between clean data of class i and adversarially perturbed samples of other classes. At test time, we first use a trained classifier to get the predicted label (say k) of the input, and then use the k-th binary classifier to determine whether the input is a clean sample (of class k) or an adversarially perturbed example (of other classes). We further devise a generative approach to detecting/classifying adversarial examples by interpreting each binary classifier as an unnormalized density model of the class-conditional data. We provide comprehensive evaluation of the above adversarial example detection/classification methods, and demonstrate their competitive performances and compelling properties.  ( 3 min )
    Estimating Smooth GLM in Non-interactive Local Differential Privacy Model with Public Unlabeled Data. (arXiv:1910.00482v4 [cs.LG] UPDATED)
    In this paper, we study the problem of estimating smooth Generalized Linear Models (GLMs) in the Non-interactive Local Differential Privacy (NLDP) model. Different from its classical setting, our model allows the server to access some additional public but unlabeled data. In the first part of the paper we focus on GLMs. Specifically, we first consider the case where each data record is i.i.d. sampled from a zero-mean multivariate Gaussian distribution. Motivated by the Stein's lemma, we present an $(\epsilon, \delta)$-NLDP algorithm for GLMs. Moreover, the sample complexity of public and private data for the algorithm to achieve an $\ell_2$-norm estimation error of $\alpha$ (with high probability) is ${O}(p \alpha^{-2})$ and $\tilde{O}(p^3\alpha^{-2}\epsilon^{-2})$ respectively, where $p$ is the dimension of the feature vector. This is a significant improvement over the previously known exponential or quasi-polynomial in $\alpha^{-1}$, or exponential in $p$ sample complexities of GLMs with no public data. Then we consider a more general setting where each data record is i.i.d. sampled from some sub-Gaussian distribution with bounded $\ell_1$-norm. Based on a variant of Stein's lemma, we propose an $(\epsilon, \delta)$-NLDP algorithm for GLMs whose sample complexity of public and private data to achieve an $\ell_\infty$-norm estimation error of $\alpha$ is ${O}(p^2\alpha^{-2})$ and $\tilde{O}(p^2\alpha^{-2}\epsilon^{-2})$ respectively, under some mild assumptions and if $\alpha$ is not too small ({\em i.e.,} $\alpha\geq \Omega(\frac{1}{\sqrt{p}})$). In the second part of the paper, we extend our idea to the problem of estimating non-linear regressions and show similar results as in GLMs for both multivariate Gaussian and sub-Gaussian cases. Finally, we demonstrate the effectiveness of our algorithms through experiments on both synthetic and real-world datasets.  ( 3 min )
    Statistical Aspects of SHAP: Functional ANOVA for Model Interpretation. (arXiv:2208.09970v1 [stat.ME])
    SHAP is a popular method for measuring variable importance in machine learning models. In this paper, we study the algorithm used to estimate SHAP scores and show that it is a transformation of the functional ANOVA decomposition. We use this connection to show that challenges in SHAP approximations largely relate to the choice of a feature distribution and the number of $2^p$ ANOVA terms estimated. We argue that the connection between machine learning explainability and sensitivity analysis is illuminating in this case, but the immediate practical consequences are not obvious since the two fields face a different set of constraints. Machine learning explainability concerns models which are inexpensive to evaluate but often have hundreds, if not thousands, of features. Sensitivity analysis typically deals with models from physics or engineering which may be very time consuming to run, but operate on a comparatively small space of inputs.  ( 2 min )
    Learning Correlated Equilibria in Mean-Field Games. (arXiv:2208.10138v1 [cs.GT])
    The designs of many large-scale systems today, from traffic routing environments to smart grids, rely on game-theoretic equilibrium concepts. However, as the size of an $N$-player game typically grows exponentially with $N$, standard game theoretic analysis becomes effectively infeasible beyond a low number of players. Recent approaches have gone around this limitation by instead considering Mean-Field games, an approximation of anonymous $N$-player games, where the number of players is infinite and the population's state distribution, instead of every individual player's state, is the object of interest. The practical computability of Mean-Field Nash equilibria, the most studied Mean-Field equilibrium to date, however, typically depends on beneficial non-generic structural properties such as monotonicity or contraction properties, which are required for known algorithms to converge. In this work, we provide an alternative route for studying Mean-Field games, by developing the concepts of Mean-Field correlated and coarse-correlated equilibria. We show that they can be efficiently learnt in \emph{all games}, without requiring any additional assumption on the structure of the game, using three classical algorithms. Furthermore, we establish correspondences between our notions and those already present in the literature, derive optimality bounds for the Mean-Field - $N$-player transition, and empirically demonstrate the convergence of these algorithms on simple games.  ( 2 min )
    Invariant Inference via Residual Randomization. (arXiv:1908.04218v2 [stat.ME] UPDATED)
    The dominant paradigm in statistical inference relies on a structure of i.i.d. data from a hypothetical infinite population. Despite its success, this framework is inflexible under complex data structures, even in those cases where it is clear what the infinite population represents. In this paper, we explore an alternative framework whereby the basis of inference is only an invariance assumption on the model errors, such as exchangeability or sign symmetry. As a general method to address this problem of invariant inference, we propose a randomization-based procedure. We prove general conditions for asymptotic validity of this procedure, and illustrate in many data structures, including clustered errors in one-way and two-way layouts. We find that invariant inference via residual randomization has three appealing properties: (1) It is valid under weak and interpretable conditions, allowing for problems with heavy-tailed data, finite clustering, and even some high-dimensional settings. (2) It is robust in finite samples as it does not rely on the regularity conditions needed for classical asymptotics. (3) It addresses the problem of inference in a unified way that adapts to the data structure. Classical procedures like OLS or bootstrap, on the other hand, presuppose the i.i.d. structure, and need to be modified whenever the actual problem structure is different. This mismatch in the classical framework has led to a multitude of robust error techniques and bootstrap variants, which frequently confounds applied research. We corroborate these findings with extensive empirical evaluations. Residual randomization performs favorably against many alternatives, including robust error methods, bootstrap variants, and hierarchical models.  ( 3 min )
    Minimax AUC Fairness: Efficient Algorithm with Provable Convergence. (arXiv:2208.10451v1 [cs.LG])
    The use of machine learning models in consequential decision making often exacerbates societal inequity, in particular yielding disparate impact on members of marginalized groups defined by race and gender. The area under the ROC curve (AUC) is widely used to evaluate the performance of a scoring function in machine learning, but is studied in algorithmic fairness less than other performance metrics. Due to the pairwise nature of the AUC, defining an AUC-based group fairness metric is pairwise-dependent and may involve both \emph{intra-group} and \emph{inter-group} AUCs. Importantly, considering only one category of AUCs is not sufficient to mitigate unfairness in AUC optimization. In this paper, we propose a minimax learning and bias mitigation framework that incorporates both intra-group and inter-group AUCs while maintaining utility. Based on this Rawlsian framework, we design an efficient stochastic optimization algorithm and prove its convergence to the minimum group-level AUC. We conduct numerical experiments on both synthetic and real-world datasets to validate the effectiveness of the minimax framework and the proposed optimization algorithm.  ( 2 min )
    A Graphical Model for Fusing Diverse Microbiome Data. (arXiv:2208.09934v1 [stat.ME])
    This paper develops a Bayesian graphical model for fusing disparate types of count data. The motivating application is the study of bacterial communities from diverse high dimensional features, in this case transcripts, collected from different treatments. In such datasets, there are no explicit correspondences between the communities and each correspond to different factors, making data fusion challenging. We introduce a flexible multinomial-Gaussian generative model for jointly modeling such count data. This latent variable model jointly characterizes the observed data through a common multivariate Gaussian latent space that parameterizes the set of multinomial probabilities of the transcriptome counts. The covariance matrix of the latent variables induces a covariance matrix of co-dependencies between all the transcripts, effectively fusing multiple data sources. We present a computationally scalable variational Expectation-Maximization (EM) algorithm for inferring the latent variables and the parameters of the model. The inferred latent variables provide a common dimensionality reduction for visualizing the data and the inferred parameters provide a predictive posterior distribution. In addition to simulation studies that demonstrate the variational EM procedure, we apply our model to a bacterial microbiome dataset.  ( 2 min )
    Hierarchical Capsule Prediction Network for Marketing Campaigns Effect. (arXiv:2208.10113v1 [stat.ML])
    Marketing campaigns are a set of strategic activities that can promote a business's goal. The effect prediction for marketing campaigns in a real industrial scenario is very complex and challenging due to the fact that prior knowledge is often learned from observation data, without any intervention for the marketing campaign. Furthermore, each subject is always under the interference of several marketing campaigns simultaneously. Therefore, we cannot easily parse and evaluate the effect of a single marketing campaign. To the best of our knowledge, there are currently no effective methodologies to solve such a problem, i.e., modeling an individual-level prediction task based on a hierarchical structure with multiple intertwined events. In this paper, we provide an in-depth analysis of the underlying parse tree-like structure involved in the effect prediction task and we further establish a Hierarchical Capsule Prediction Network (HapNet) for predicting the effects of marketing campaigns. Extensive results based on both the synthetic data and real data demonstrate the superiority of our model over the state-of-the-art methods and show remarkable practicability in real industrial applications.  ( 2 min )
    Do-AIQ: A Design-of-Experiment Approach to Quality Evaluation of AI Mislabel Detection Algorithm. (arXiv:2208.09953v1 [stat.ML])
    The quality of Artificial Intelligence (AI) algorithms is of significant importance for confidently adopting algorithms in various applications such as cybersecurity, healthcare, and autonomous driving. This work presents a principled framework of using a design-of-experimental approach to systematically evaluate the quality of AI algorithms, named as Do-AIQ. Specifically, we focus on investigating the quality of the AI mislabel data algorithm against data poisoning. The performance of AI algorithms is affected by hyperparameters in the algorithm and data quality, particularly, data mislabeling, class imbalance, and data types. To evaluate the quality of the AI algorithms and obtain a trustworthy assessment on the quality of the algorithms, we establish a design-of-experiment framework to construct an efficient space-filling design in a high-dimensional constraint space and develop an effective surrogate model using additive Gaussian process to enable the emulation of the quality of AI algorithms. Both theoretical and numerical studies are conducted to justify the merits of the proposed framework. The proposed framework can set an exemplar for AI algorithm to enhance the AI assurance of robustness, reproducibility, and transparency.  ( 2 min )
    Bayesian Complementary Kernelized Learning for Multidimensional Spatiotemporal Data. (arXiv:2208.09978v1 [stat.ML])
    Probabilistic modeling of multidimensional spatiotemporal data is critical to many real-world applications. However, real-world spatiotemporal data often exhibits complex dependencies that are nonstationary, i.e., correlation structure varies with location/time, and nonseparable, i.e., dependencies exist between space and time. Developing effective and computationally efficient statistical models to accommodate nonstationary/nonseparable processes containing both long-range and short-scale variations becomes a challenging task, especially for large-scale datasets with various corruption/missing structures. In this paper, we propose a new statistical framework -- Bayesian Complementary Kernelized Learning (BCKL) -- to achieve scalable probabilistic modeling for multidimensional spatiotemporal data. To effectively describe complex dependencies, BCKL integrates kernelized low-rank factorization with short-range spatiotemporal Gaussian processes (GP), in which the two components complement each other. Specifically, we use a multi-linear low-rank factorization component to capture the global/long-range correlations in the data and introduce an additive short-scale GP based on compactly supported kernel functions to characterize the remaining local variabilities. We develop an efficient Markov chain Monte Carlo (MCMC) algorithm for model inference and evaluate the proposed BCKL framework on both synthetic and real-world spatiotemporal datasets. Our results confirm the superior performance of BCKL in providing accurate posterior mean and high-quality uncertainty estimates.  ( 2 min )
    On regression analysis with Pad\'e approximants. (arXiv:2208.09945v1 [stat.ME])
    The advantages and difficulties of application of Pad\'e approximants to two-dimensional regression analysis are discussed. New formulation of residuals is suggested in the method of least squares. It leads to a system of linear equations in case of rational functions. The possibility of using Tikhonov regularization technique to avoid overfitting is demonstrated in this approach. To illustrate the efficiency of the suggested method, several practical cases from physics and reliability theory are considered.  ( 2 min )
    AA-Forecast: Anomaly-Aware Forecast for Extreme Events. (arXiv:2208.09933v1 [stat.ML])
    Time series models often deal with extreme events and anomalies, both prevalent in real-world datasets. Such models often need to provide careful probabilistic forecasting, which is vital in risk management for extreme events such as hurricanes and pandemics. However, it is challenging to automatically detect and learn to use extreme events and anomalies for large-scale datasets, which often require manual effort. Hence, we propose an anomaly-aware forecast framework that leverages the previously seen effects of anomalies to improve its prediction accuracy during and after the presence of extreme events. Specifically, the framework automatically extracts anomalies and incorporates them through an attention mechanism to increase its accuracy for future extreme events. Moreover, the framework employs a dynamic uncertainty optimization algorithm that reduces the uncertainty of forecasts in an online manner. The proposed framework demonstrated consistent superior accuracy with less uncertainty on three datasets with different varieties of anomalies over the current prediction models.  ( 2 min )
    Multiple Descent in the Multiple Random Feature Model. (arXiv:2208.09897v1 [math.ST])
    Recent works have demonstrated a double descent phenomenon in over-parameterized learning: as the number of model parameters increases, the excess risk has a $\mathsf{U}$-shape at beginning, then decreases again when the model is highly over-parameterized. Although this phenomenon has been investigated by recent works under different settings such as linear models, random feature models and kernel methods, it has not been fully understood in theory. In this paper, we consider a double random feature model (DRFM) consisting of two types of random features, and study the excess risk achieved by the DRFM in ridge regression. We calculate the precise limit of the excess risk under the high dimensional framework where the training sample size, the dimension of data, and the dimension of random features tend to infinity proportionally. Based on the calculation, we demonstrate that the risk curves of DRFMs can exhibit triple descent. We then provide an explanation of the triple descent phenomenon, and discuss how the ratio between random feature dimensions, the regularization parameter and the signal-to-noise ratio control the shape of the risk curves of DRFMs. At last, we extend our study to the multiple random feature model (MRFM), and show that MRFMs with $K$ types of random features may exhibit $(K+1)$-fold descent. Our analysis points out that risk curves with a specific number of descent generally exist in random feature based regression. Another interesting finding is that our result can recover the risk peak locations reported in the literature when learning neural networks are in the "neural tangent kernel" regime.  ( 3 min )
    Near-optimal fitting of ellipsoids to random points. (arXiv:2208.09493v1 [cs.DS])
    Given independent standard Gaussian points $v_1, \ldots, v_n$ in dimension $d$, for what values of $(n, d)$ does there exist with high probability an origin-symmetric ellipsoid that simultaneously passes through all of the points? This basic problem of fitting an ellipsoid to random points has connections to low-rank matrix decompositions, independent component analysis, and principal component analysis. Based on strong numerical evidence, Saunderson, Parrilo, and Willsky [Proc. of Conference on Decision and Control, pp. 6031-6036, 2013] conjecture that the ellipsoid fitting problem transitions from feasible to infeasible as the number of points $n$ increases, with a sharp threshold at $n \sim d^2/4$. We resolve this conjecture up to logarithmic factors by constructing a fitting ellipsoid for some $n = \Omega( \, d^2/\log^5(d) \,)$, improving prior work of Ghosh et al. [Proc. of Symposium on Foundations of Computer Science, pp. 954-965, 2020] that requires $n = o(d^{3/2})$. Our proof demonstrates feasibility of the least squares construction of Saunderson et al. using a careful analysis of the eigenvectors and eigenvalues of a certain non-standard random matrix.  ( 2 min )
    Byzantines can also Learn from History: Fall of Centered Clipping in Federated Learning. (arXiv:2208.09894v1 [cs.LG])
    The increasing popularity of the federated learning framework due to its success in a wide range of collaborative learning tasks also induces certain security concerns regarding the learned model due to the possibility of malicious clients participating in the learning process. Hence, the objective is to neutralize the impact of the malicious participants and to ensure the final model is trustable. One common observation regarding the Byzantine attacks is that the higher the variance among the clients' models/updates, the more space for attacks to be hidden. To this end, it has been recently shown that by utilizing momentum, thus reducing the variance, it is possible to weaken the strength of the known Byzantine attacks. The Centered Clipping framework (ICML 2021) has further shown that, besides reducing the variance, the momentum term from the previous iteration can be used as a reference point to neutralize the Byzantine attacks and show impressive performance against well-known attacks. However, in the scope of this work, we show that the centered clipping framework has certain vulnerabilities, and existing attacks can be revised based on these vulnerabilities to circumvent the centered clipping defense. Hence, we introduce a strategy to design an attack to circumvent the centered clipping framework and numerically illustrate its effectiveness against centered clipping as well as other known defense strategies by reducing test accuracy to 5-40 on best-case scenarios.  ( 3 min )
    FastCPH: Efficient Survival Analysis for Neural Networks. (arXiv:2208.09793v1 [stat.ML])
    The Cox proportional hazards model is a canonical method in survival analysis for prediction of the life expectancy of a patient given clinical or genetic covariates -- it is a linear model in its original form. In recent years, several methods have been proposed to generalize the Cox model to neural networks, but none of these are both numerically correct and computationally efficient. We propose FastCPH, a new method that runs in linear time and supports both the standard Breslow and Efron methods for tied events. We also demonstrate the performance of FastCPH combined with LassoNet, a neural network that provides interpretability through feature sparsity, on survival datasets. The final procedure is efficient, selects useful covariates and outperforms existing CoxPH approaches.  ( 2 min )
    Sharp Analysis of Sketch-and-Project Methods via a Connection to Randomized Singular Value Decomposition. (arXiv:2208.09585v1 [math.OC])
    Sketch-and-project is a framework which unifies many known iterative methods for solving linear systems and their variants, as well as further extensions to non-linear optimization problems. It includes popular methods such as randomized Kaczmarz, coordinate descent, variants of the Newton method in convex optimization, and others. In this paper, we obtain sharp guarantees for the convergence rate of sketch-and-project methods via new tight spectral bounds for the expected sketched projection matrix. Our estimates reveal a connection between the sketch-and-project convergence rate and the approximation error of another well-known but seemingly unrelated family of algorithms, which use sketching to accelerate popular matrix factorizations such as QR and SVD. This connection brings us closer to precisely quantifying how the performance of sketch-and-project solvers depends on their sketch size. Our analysis covers not only Gaussian and sub-gaussian sketching matrices, but also a family of efficient sparse sketching methods known as LESS embeddings. Our experiments back up the theory and demonstrate that even extremely sparse sketches show the same convergence properties in practice.  ( 2 min )
    Adversarial contamination of networks in the setting of vertex nomination: a new trimming method. (arXiv:2208.09710v1 [stat.ML])
    As graph data becomes more ubiquitous, the need for robust inferential graph algorithms to operate in these complex data domains is crucial. In many cases of interest, inference is further complicated by the presence of adversarial data contamination. The effect of the adversary is frequently to change the data distribution in ways that negatively affect statistical and algorithmic performance. We study this phenomenon in the context of vertex nomination, a semi-supervised information retrieval task for network data. Here, a common suite of methods relies on spectral graph embeddings, which have been shown to provide both good algorithmic performance and flexible settings in which regularization techniques can be implemented to help mitigate the effect of an adversary. Many current regularization methods rely on direct network trimming to effectively excise the adversarial contamination, although this direct trimming often gives rise to complicated dependency structures in the resulting graph. We propose a new trimming method that operates in model space which can address both block structure contamination and white noise contamination (contamination whose distribution is unknown). This model trimming is more amenable to theoretical analysis while also demonstrating superior performance in a number of simulations, compared to direct trimming.  ( 2 min )
    Robust Tests in Online Decision-Making. (arXiv:2208.09819v1 [stat.ML])
    Bandit algorithms are widely used in sequential decision problems to maximize the cumulative reward. One potential application is mobile health, where the goal is to promote the user's health through personalized interventions based on user specific information acquired through wearable devices. Important considerations include the type of, and frequency with which data is collected (e.g. GPS, or continuous monitoring), as such factors can severely impact app performance and users' adherence. In order to balance the need to collect data that is useful with the constraint of impacting app performance, one needs to be able to assess the usefulness of variables. Bandit feedback data are sequentially correlated, so traditional testing procedures developed for independent data cannot apply. Recently, a statistical testing procedure was developed for the actor-critic bandit algorithm. An actor-critic algorithm maintains two separate models, one for the actor, the action selection policy, and the other for the critic, the reward model. The performance of the algorithm as well as the validity of the test are guaranteed only when the critic model is correctly specified. However, misspecification is frequent in practice due to incorrect functional form or missing covariates. In this work, we propose a modified actor-critic algorithm which is robust to critic misspecification and derive a novel testing procedure for the actor parameters in this case.  ( 3 min )
    Spectral Decomposition Representation for Reinforcement Learning. (arXiv:2208.09515v1 [cs.LG])
    Representation learning often plays a critical role in reinforcement learning by managing the curse of dimensionality. A representative class of algorithms exploits a spectral decomposition of the stochastic transition dynamics to construct representations that enjoy strong theoretical properties in an idealized setting. However, current spectral methods suffer from limited applicability because they are constructed for state-only aggregation and derived from a policy-dependent transition kernel, without considering the issue of exploration. To address these issues, we propose an alternative spectral method, Spectral Decomposition Representation (SPEDER), that extracts a state-action abstraction from the dynamics without inducing spurious dependence on the data collection policy, while also balancing the exploration-versus-exploitation trade-off during learning. A theoretical analysis establishes the sample efficiency of the proposed algorithm in both the online and offline settings. In addition, an experimental investigation demonstrates superior performance over current state-of-the-art algorithms across several benchmarks.  ( 2 min )

  • Open

    [P] How do I use zero shot learning to check if two sentences agree with one other.
    Basically,i have a dataframe with 3 columns,a complain ,a reason (both text) and a label which is 1 if both complain and reason agree to each other 0 otherwise. Now,in the training set only positive samples ,that is ,samples with label 1 are present . How do I deal with this? P.S.-Until now i have used sentence BERT in conjunction with cosine similarity to get the similarity between complaint and reason.But are there any better methods.It is for an assignment so I cannot exactly just show the 10 line program of using pretrained bert with cosine similarity. submitted by /u/enkrish258 [link] [comments]  ( 89 min )
    [P] Tutorial on using RLlib for deep hierarchical multiagent reinforcement learning
    I go over how to use RLlib for your project. It is particularly good for multiagent systems. I also briefly summarize 120 years of reinforcement learning. There is a link to the GitHub code to follow along. https://deumbra.com/2022/08/rllib-for-deep-hierarchical-multiagent-reinforcement-learning/ submitted by /u/jmugan [link] [comments]  ( 88 min )
    [D] How do you describe variable importance in a ML model?
    What are some ways that you explain variable importance when using black box models like neural networks or gradient boosted trees? I've been using a combination of Shapley additive explanations and correlation plots to narrow down the results. Granted I know SHAP isn't inferential by any means, and there must be a better method out there. Would it be prudent to use another more inferential model alongside your neural net to help with variable importance? Or would that not be helpful at all? submitted by /u/ethantenison [link] [comments]  ( 89 min )
    [D] Methods for Evaluating Model Calibration
    I am currently working on a problem where we have an extreme class imbalance and I am evaluating different evaluation metrics. The rather interesting observation that I am seeing is that some models yield a smaller brier score but yield very, very low probabilities for the positive class. Are there alternative metrics that could be better suited for this problem? submitted by /u/martin1285 [link] [comments]  ( 91 min )
    [D] Can you code a Neural Network using only high school mathematics?
    Recently, I was trying to explain how neural networks work to my mother but I couldn't go beyond the usual trope of inputs, hidden layers, and outputs. It lead me to question my understanding of the topic despite reading about it umpteen times. So, I decided to code the entire network using only Numpy w/o using any fancy frameworks such as Keras, Tensorflow, or PyTorch. It took me a while to figure out the maths for backpropagation but the exercise as a whole was rewarding. You can read about the entire experiment in the medium article here: https://towardsdatascience.com/can-you-code-a-neural-network-using-only-high-school-mathematics-ac9ad80f52f7 The notebook is present at https://www.kaggle.com/code/prashantmudgal/mnist-neural-network-from-scratch Please feel free to reach out in case of any questions. submitted by /u/prashantmdgl9 [link] [comments]  ( 89 min )
    [P] Real-time AI Processing at the edge
    This tech talk focuses on AI-powered processing at the edge. By running models on edge devices, you can reduce latency, increase security, while improving processing speeds. Learn how you can get started running AI models on thousands of devices from a central location using Modzy Edge. We demonstrate how you can run a computer vision model on an NVIDIA Jetson Nano to process video in real-time, and share an example of an ML model analyzing data from atmospheric sensors. submitted by /u/modzykirsten [link] [comments]  ( 89 min )
    [D] My cookbook for data creation
    I don't think we talk enough about how to create good data. Models and even deployments are more sexy than data creation, let's change this! Here's my conceptual framework for data creation: ​ define success in business terms map the data with stakeholders for buy in rapid prototype from data to deployment iterate on dataset creation Techniques to get more data bang for buck: ​ weak supervision active learning Considerations around dataset creation: ​ in-house vs crowdsourcing Tool recommendations My cookbook for data creation Defining success in business terms A good dataset is one that creates business value. Making sure stakeholders, domain experts, and engineers are on the same page is hard. I find sharing a mock of the model and data can really help move st…  ( 93 min )
    [D] EMNLP 2022 Review Day !!! Rebuttal
    EMNP 2022 reviews will be out soon. Good Luck !!! Discussion submitted by /u/errohan400 [link] [comments]  ( 88 min )
    [D] StableDiffusion v1.4 is entirely public. What do you think about Stability.ai ?
    In case you haven't noticed, stability.ai just open-sourced their latest version of StableDiffusion to the public. Here is the link: https://stability.ai/blog/stable-diffusion-public-release It is so fast and small (memory footprint) that it can run on consumer grade GPUs. I just generated my first "astronaut riding a horse on mars" on my local GTX3090. Astronaut riding a horse on mars So what is opinion on open-sourcing such powerful models ? And, what do you think about stability.ai as an organisation ? Do you feel they can potentially be the next OpenAI ? submitted by /u/dasayan05 [link] [comments]  ( 91 min )
    [D] I'm looking for a mentor that will give me tasks to complete a project that detects objects, then lets me choose one of them to track
    The project interests me, but I don't know how to start it, and then we could deploy it with a drone. My background in computer vision and machine learning is acceptable. I've trained a model to detect objects using YOLOV5 before, but I don't know how to start the project... if you're interested, please contact me. submitted by /u/meltingicecreem [link] [comments]  ( 89 min )
    Has anyone heard of the "Desperation Index" for evaluating loan applications? [D]
    Hi everyone, a few years ago I passed the DP-100: Designing and Implementing a Data Science Solution on Azure. It’s expired and recently I've decided that I will take the exam again. I went through my old emails to find the study materials which helped me pass the first time. One of the links in my email was a YouTube video. In the video there is a reference to something called the "Desperation Index". Apparently, this speaker worked with a team, in a consulting role, that had created a model that allowed them to qualify a loan in 4 minutes. I guess the client was a lender. The client then told him that they have their own method to evaluate risk which they call the Desperation Index and they wanted it to be incorporated into the model. The Desperation Index basically measures how desperate someone is for a loan. It tracks how often you call to check the status of a loan application or how often you check the website in a 24-hour period. Checking the status too often, is a red flag to the lender and can have your loan denied. This doesn’t seem ethical, fair, or scientific. How can you distinguish between excitement and desperation? How would they build this into a machine learning model? Just a supervised learning model with an extra feature with a count of how often you contacted the lender? Also, does anyone know which lenders use machine learning models with the desperation index or something similar? Edit: Whoops, I forgot to include the video. Skip to 17:28 https://www.youtube.com/watch?v=mM5o14i_BCM ​ Thanks, submitted by /u/Hadrami1 [link] [comments]  ( 95 min )
    [D] How are you learning/deciding on your end-to-end ML architecture
    We find content online is very much centered around the model building but does not focus on the before/after especially when it comes to productionizing it. The content that is out there is very fragmented, theoretical, and also not representative of tooling used in companies today. Also there is no content on typical problems companies have run into while implementing there solutions. How have you and your team been trying/implementing architecture? What are you using as a framework? We have started collecting content from friends at large companies trying to address the problems that others might run into. It would be great to get some feedback. You can check out the content here (Use the code: RFG7E29 to get to it for free). submitted by /u/mike157_za [link] [comments]  ( 108 min )
    [D] Private dataset stealing via model distillation?
    Are there any papers around the ethics/research of distillation on models trained on private datasets? (Or known occurrences where datasets have been stolen?) it would seem that if you have a proprietary dataset and you train a model A on it, you might be able to "steal" the dataset by training a model B on model's A predictions. Could imagine this happening in industry, with the proliferation of proprietary models. submitted by /u/unsolved_integral [link] [comments]  ( 91 min )
    [News] 📢 📢 Excited to announce the #NeurIPS22✨𝗚𝗮𝘇𝗲 𝗠𝗲𝗲𝘁𝘀 𝗠𝗟 𝗪𝗼𝗿𝗸𝘀𝗵𝗼𝗽✨bridging human and machine attention 🎯 We’ve got a fantastic lineup of speakers and PC members spanning machine learning, stats, neuroscience, etc. Link for details👇👇🏼👇🏾
    Webpage: https://gaze-meets-ml.github.io/ Submission site: https://openreview.net/group?id=NeurIPS.cc/2022/Workshop/GMML Submission deadline: September 22nd, 2022 Date: December 3rd, 2022 Location: New Orleans Convention Center, New Orleans, LA ​ Eye gaze has proven to be a cost-efficient way to collect large-scale physiological data that can reveal the underlying human attentional patterns in real life workflows, and thus has long been explored as a signal to directly measure human-related cognition in various domains Physiological data (including but not limited to eye gaze) offer new perception capabilities, which could be used in several ML domains, e.g., egocentric perception, embodiedAI, NLP, etc. They can help infer human perception, intentions, beliefs, goals and other cognition properties that are much needed for human-AI interactions and agent coordination. In addition, large collections of eye-tracking data have enabled data-driven modeling of human visual attention mechanisms, both for saliency or scanpath prediction, with twofold advantages: from the neuroscientific perspective to understand biological mechanisms better, from the AI perspective to equip agents with the ability to mimic or predict human behavior and improve interpretability and interactions. With the emergence of immersive technologies, now more than any time there is a need for experts of various backgrounds (e.g., machine learning, vision, and neuroscience communities) to share expertise and contribute to a deeper understanding of the intricacies of cost-efficient human supervision signals (e.g., eye-gaze) and their utilization towards by bridging human cognition and AI in machine learning research and development. The goal of this workshop is to bring together an active research community to collectively drive progress in defining and addressing core problems in gaze-assisted machine learning. submitted by /u/hawkeyedesi [link] [comments]  ( 114 min )
    [P] Imitation Learning (+RL) in Super Smash Bros Melee for Humanlike Agents
    Thread: https://twitter.com/otter_collapse/status/1561445156246753280?cxt=HHwWgMC8pZiGr6srAAAA Post: https://bycn.github.io/2022/08/19/project-nabla-writeup.html (technical writeup coming soon!) Project Nabla is an AI trained with deep neural networks using behavioral cloning and deep reinforcement learning self-play, similar to AlphaStar. It is enabled by the recent launch of a suite of software tools for the game known as "Slippi" which allow for us to save human replays. We train on a subset of ~100k tournament games. It is similar to the older Phillip project, which did not have the benefit of Slippi when it was created (and doesn't use any human replays). On the research side, many are concerned that enforcing a slower human reaction time is necessary for the AI to be "fair". Here we find that the prior of the imitation policy and using a light amount of RL fine tuning makes for convincing human-like AIs even with no delays. Of course, it does remain a difficult question how to preserve this as you make the agent stronger with more RL. An older paper by Vlad Firoui attacked this problem to some success https://arxiv.org/abs/1810.07286 https://www.youtube.com/watch?v=zHtgqxRxqYg There is much possible future work and Melee as a game for research is exciting! For those who are already fans of the game, you can play against it now at https://twitch.tv/rakkob submitted by /u/otter_collapse [link] [comments]  ( 111 min )
    [D] What are some of the conferences for real-life/business ML usage?
    I would love to hear about conferences where they emphasize the productization of ML models and real-life problems. submitted by /u/gabegabe6 [link] [comments]  ( 90 min )
    [Project] I made a conversational AI app that tutors you in math, science, history and computer science!
    submitted by /u/landongarrison [link] [comments]  ( 88 min )
  • Open

    build a web demo for stable diffusion in google colab in python
    submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 91 min )
    The False Prophecy of “AGI”: A Quick TL;DR
    submitted by /u/spincycle27 [link] [comments]  ( 90 min )
    Created with Stable Diffusion on pixelz.ai
    submitted by /u/PixelzJ [link] [comments]  ( 90 min )
    Real-time AI Processing at the Edge
    This tech talk focuses on AI-powered processing at the edge. Learn how you can get started running AI models on thousands of devices from a central location using Modzy Edge. By running models on edge devices, you can reduce latency, increase security, while improving processing speeds. We demonstrate how you can run a computer vision model on an NVIDIA Jetson Nano to process video in real-time, and share an example of an ML model analyzing data from atmospheric sensors. submitted by /u/modzykirsten [link] [comments]  ( 87 min )
    Stable Diffusion now available on Pixelz AI. Details in comments
    submitted by /u/pixelz_ai [link] [comments]  ( 87 min )
    I got Stable Diffusion Public Release working on an AMD GPU!
    submitted by /u/yahma [link] [comments]  ( 87 min )
    Creating and Illustrating a children’s book with Dall-E in less than a week
    10 years ago I had the idea to write a children’s book about the joy and difficulties of building something; That idea was there but the hassle and cost of illustration always prevented me from finishing the book. Recently, Dall-E was launched and I excitedly used it for my children's book illustrations. You can see the result and also issues I faced such as maintaining a consistent art style and face removal. https://medium.com/@rwanghacker/creating-and-illustrating-a-childrens-book-with-dall-e-in-less-than-a-week-813ee85f2225 submitted by /u/nychacker [link] [comments]  ( 87 min )
    I'm an experimental media art student, specialized in traditional illustrations but with a great interest in technological implementations in art. For my latest animation project I recompose, mix and embed various AI generated images combined with my own handdrawings. I hope you'll enjoy them.
    submitted by /u/bobitazia [link] [comments]  ( 92 min )
    AI scientists are studying the “emergent” abilities of large language models
    submitted by /u/bendee983 [link] [comments]  ( 87 min )
    Help finding a very specific AI Personal Assistant / conversational chatbot (or making one)
    I’d like to have a virtual personal assistant/companion/chatbot for my PC/Studio that: can perform at least some of the kind of tasks that Siri etc can (Google searches, music, reminders, etc) can hold an actual decently convincing conversation, the more personality the better, and can remember details between chats. will allow me to add a custom command to pull a random item from a list in a text document and read it out, with a custom voice command set to activate this Will allow me to name/refer to it however I like Can do voice recognition and TTS Is uncensored and able to discuss and research adult content Happy to pay, or if this doesn’t already exist together in one bot, how hard/costly would it be for me to either hire someone to make or take on myself as a long project and learn to build? Could folks please point me in a starting direction? Edit: it looks like I may be able to achieve what I want with UltraHal once I’ve learned it better Thanks so much for your time! submitted by /u/ZytaLyonne [link] [comments]  ( 96 min )
    What would your ideal future look like?
    Imagine if we somehow managed to create an aligned super intelligence and had a genuine shot at Utopia. What would you want that world to look like? I'm working on a Utopiaography project, and I'd love to hear what you guys think. submitted by /u/LondonIsButOneCity [link] [comments]  ( 89 min )
    Min3Flow: A Multistage Text to Image Framework. Built using an inference-stripped subset of min-dalle, glid-3-xl, and SwinIR.
    submitted by /u/BiasedVariance [link] [comments]  ( 87 min )
    Walking on latent impressionist landscapes [Études #1, Mt.Sinai]
    submitted by /u/evangelart [link] [comments]  ( 88 min )
    I tried to recreate a comic book using DALL-E and Alan Moore’s script
    submitted by /u/RubiksCodeNMZ [link] [comments]  ( 87 min )
    Selling Dall E 2 Account With Email Access For Low Price (Paypal Accepted) -DM me
    submitted by /u/Sqchinthq [link] [comments]  ( 87 min )
    I upscaled a low res image and then found a high res image of the same, here's a side-by-side comparison.
    submitted by /u/TheZephyr2003 [link] [comments]  ( 88 min )
    5 Key Components of Conversational AI & its benefits
    submitted by /u/PerformanceHopeful15 [link] [comments]  ( 87 min )
  • Open

    The benefits of reducing critic variance
    A lot of work in RL goes into reducing the variance of the critic, which I take primarily to mean that the sample complexity of computing the expected value of the critic under the policy distribution is lower. My question is this: I understand why lowering the critic’s variance gives higher quality policy gradients, in the sense that the update will be more likely to improve the value estimated by the critic, but does having lower variance mean anything regarding whether these “higher quality” gradients are more accurate in the first place? Mostly, I’m wondering if having lower variance (assuming that the critic we are using can learn a model of the true value function that has negligible bias) actually helps the critic learn faster? submitted by /u/vandelay_inds [link] [comments]  ( 88 min )
    Storing latent states in Dyna-like MBRL?
    Hi, I am interested in model-based reinforcement learning and have a general question concerning Dyna-like MBRL algorithms (Dyna, MBPO, SimPLe, ...). Each of these algorithms uses a world model to generate samples that replace interactions with the real environment. The samples are then stored in a replay buffer and later used to update a model-free policy (SAC, PPO, ...). These algorithms generally use an encoder-decoder world model, i.e., a model that learns to encode observations to a compact latent state and later recovers the information with the decoder. These latent states are typically of lower dimensionality and therefore often better suited as inputs to the agent. Why do the algorithms typically store the decoded observations in the replay buffer? Couldn't we store the latent states and use them as inputs to the agent? What would be the trade-offs? submitted by /u/Internal-Brush4929 [link] [comments]  ( 107 min )
    Elo ratings for rewards?
    Is there any study on the use of Elo ratings as rewards in RL? I am looking at this kind of situation: I want to train an agent to play Pokémon. For simplicity, let's assume that the battles are 1v1 (no swaps or whatever). Naturally, some Pokémon are going to be stronger than others on average: a Mewtwo is far stronger than a Magikarp, for example. Given that, I want to assign different rewards to the winner: if Mewtwo wins against Magikarp, that is no big deal. However, if the opposite happens, something went horribly wrong and should be avoided. But how do I model that without knowing who is stronger beforehand? That's what I want to represent with Elo --- at first, everyone would start with the same scores, but over the course of training things should fall into place (ideally). submitted by /u/burning-ship [link] [comments]  ( 105 min )
    Rage Against the Machine. An analysis of AI models displacing artists and programmers.
    submitted by /u/HowlHouse123 [link] [comments]  ( 106 min )
    The documentation for Gym, the RL library, has been moved to a new address
    Due to domain issues, the up-to-date documentation for Gym is now hosted at https://gymlibrary.dev The documentation is maintained by the Farama Foundation on GitHub, and contributions are always welcome! The best way to get in touch with the team is on the Discord server submitted by /u/RedTachyon [link] [comments]  ( 87 min )
    Any website that gives paper recommendations from search history?
    I want to find cool new papers, but I'm not sure where to look. Is there a website that could recommend me new papers to read? If not, how do you all find rl papers? submitted by /u/himty [link] [comments]  ( 88 min )
    Are Py-Bullet and MuJoCo states equivalent?
    Hey, If I have a state in a MuJoCo environment would the equivalent state in the Py-Bullet environment be represented the same way value-wise? submitted by /u/Dragonrooster [link] [comments]  ( 87 min )
    Where to place assets for registered custom Gym env?
    Hi all, I've built a custom env that I have registered as per Gym documentation, so I can now create an instance of it using its ID and the gym.make() function. I use assets for the env (meshes of various 3D objects) and I'm wondering where I am suppose to store such assets in relation to the registered env? I can't find any examples online or anything in the documentation... can someone help me out? 🙏🏽 I'm getting a "segmentation fault (core dumped)" and I'm wondering if it's due to this. Cheers! submitted by /u/leozinho2r [link] [comments]  ( 88 min )
  • Open

    Can you code a Neural Network using only high school mathematics?
    Recently, I was trying to explain how neural networks work to my mother but I couldn't go beyond the usual trope of inputs, hidden layers, and outputs. It lead me to question my understanding of the topic despite reading about it umpteen times. So, I decided to code the entire network using only Numpy w/o using any fancy frameworks such as Keras, Tensorflow, or PyTorch. It took me a while to figure out the maths for backpropagation but the exercise as a whole was rewarding. You can read about the entire experiment in the medium article here: https://towardsdatascience.com/can-you-code-a-neural-network-using-only-high-school-mathematics-ac9ad80f52f7 The notebook is present at https://www.kaggle.com/code/prashantmudgal/mnist-neural-network-from-scratch Please feel free to reach out in case of any questions. submitted by /u/prashantmdgl9 [link] [comments]  ( 90 min )
    Rage Against the Machine. An analysis of AI models displacing artists and programmers.
    submitted by /u/HowlHouse123 [link] [comments]  ( 87 min )
    Can this subreddit access this article and let me know the rating out of 10, much appreciate
    submitted by /u/JoshuaDaD [link] [comments]  ( 87 min )
    I tried to recreate a comic book using DALL-E and Alan Moore’s script
    submitted by /u/RubiksCodeNMZ [link] [comments]  ( 87 min )
  • Open

    Blockchain Technology: The Potential to Change the Future
    Blockchain appears to be establishing the foundation of a new economic system, though significant challenges remain. The post Blockchain Technology: The Potential to Change the Future appeared first on Data Science Central.  ( 20 min )
    Top 10 Projects for Data Science and Machine Learning
    Building machine learning projects can give you a much more comprehensive education about how they work. The post Top 10 Projects for Data Science and Machine Learning appeared first on Data Science Central.  ( 26 min )
    How Knowledge Management Can Reshape Business Operations
    Knowledge management is refactoring the way that organizations work. The post How Knowledge Management Can Reshape Business Operations appeared first on Data Science Central.  ( 19 min )
    Application Refactoring: All You Need to Know
    Application refactoring can often be radically simplified by using cloud components. The post Application Refactoring: All You Need to Know appeared first on Data Science Central.  ( 18 min )
    From Knowledge Graphs To Knowledge Portals
    While Knowledge Graph hype is nowhere near as loud as AI hype, there is no question that more and more organizations are turning to knowledge graphs to solve real-world problems. The post From Knowledge Graphs To Knowledge Portals appeared first on Data Science Central.  ( 22 min )
    Five AI applications That Are Changing Our World
    Artificial intelligence is concerned with finding ways for computers to perform the same functions as humans. However, computers cannot make decisions on their own, i.e., they cannot make decisions without human intervention. The ability of computers to make decisions with their own intelligence makes machines operate and think like machines, which is what, in general… Read More »Five AI applications That Are Changing Our World The post Five AI applications That Are Changing Our World appeared first on Data Science Central.  ( 20 min )
    Watching the Shift Towards More Symbolic AGI
    For a long time, Gary Marcus was a lonely voice in favour of symbolic(more specifically hybrid) AI models Recent events seem to have emboldened his view The debate is significant because it has implications for the future of AI Introduction From the dawn of AI, the debate between symbolic vs neural network approaches of AI… Read More »Watching the Shift Towards More Symbolic AGI The post Watching the Shift Towards More Symbolic AGI appeared first on Data Science Central.  ( 18 min )
    A Guide To How Text-To-Speech Works
    The ability to convert text to speech in meaningful, realistic ways is transforming how we interact with computer systems and one another. The post A Guide To How Text-To-Speech Works appeared first on Data Science Central.  ( 19 min )
    Is God an Economist?
    Every now and then, a conversation with my kids goes off the rails.  And this was one of those conversations.  We tend to have unusual conversations ranging from “Is Big Foot from outer space (like Predator)?” to “If man evolved from monkeys, then why are there still monkeys?” And here is our latest conversation: Is… Read More »Is God an Economist? The post Is God an Economist? appeared first on Data Science Central.  ( 21 min )
    Capturing Methane
    Europe is facing one of the worst droughts in recent history. More than 60 percent of the EU and UK are trying to fight this climatic event, the effects of which are far-reaching. The post Capturing Methane appeared first on Data Science Central.  ( 19 min )
  • Open

    Intelligently search Alfresco content using Amazon Kendra
    Amazon Kendra is an intelligent search service powered by machine learning (ML). With Amazon Kendra, you can easily aggregate content from a variety of content repositories into a centralized index that lets you quickly search all your enterprise data and find the most accurate answer. Many organizations use the content management platform Alfresco to store […]  ( 5 min )
  • Open

    Artificial intelligence model can detect Parkinson’s from breathing patterns
    An MIT-developed device with the appearance of a Wi-Fi router uses a neural network to discern the presence and severity of one of the fastest-growing neurological diseases in the world.  ( 6 min )
  • Open

    Using Edge Biometrics For Better AI Security System Development
    Workspace security can be a fiddly money drain, especially for corporations that deal with sensitive information, or run multiple offices…  ( 19 min )
  • Open

    An AI-Enabled Drone Could Soon Become Every Rhino Poacher’s… Horn Enemy
    Watching out for the nearly-extinct two-ton beasts may be the ultimate example of a job best done remotely. The post An AI-Enabled Drone Could Soon Become Every Rhino Poacher’s… Horn Enemy appeared first on NVIDIA Blog.  ( 7 min )
  • Open

    GAETS: A Graph Autoencoder Time Series Approach Towards Battery Parameter Estimation. (arXiv:2111.09314v2 [cs.LG] UPDATED)
    Lithium-ion batteries are powering the ongoing transportation electrification revolution. Lithium-ion batteries possess higher energy density and favourable electrochemical properties which make it a preferable energy source for electric vehicles. Precise estimation of battery parameters (Charge capacity, voltage etc) is vital to estimate the available range in an electric vehicle. Graph-based estimation techniques enable us to understand the variable dependencies underpinning them to improve estimates. In this paper we employ Graph Neural Networks for battery parameter estimation, we introduce a unique graph autoencoder time series estimation approach. Variables in battery measurements are known to have an underlying relationship with each other in a certain correlation within variables of interest. We use graph autoencoder based on a non-linear version of NOTEARS as this allowed us to perform gradient-descent in learning the structure (instead of treating it as a combinatorial optimisation problem). The proposed architecture outperforms the state-of-the-art Graph Time Series (GTS) architecture for battery parameter estimation. We call our method GAETS (Graph AutoEncoder Time Series).  ( 2 min )
    Deep Learning for Choice Modeling. (arXiv:2208.09325v1 [stat.ML])
    Choice modeling has been a central topic in the study of individual preference or utility across many fields including economics, marketing, operations research, and psychology. While the vast majority of the literature on choice models has been devoted to the analytical properties that lead to managerial and policy-making insights, the existing methods to learn a choice model from empirical data are often either computationally intractable or sample inefficient. In this paper, we develop deep learning-based choice models under two settings of choice modeling: (i) feature-free and (ii) feature-based. Our model captures both the intrinsic utility for each candidate choice and the effect that the assortment has on the choice probability. Synthetic and real data experiments demonstrate the performances of proposed models in terms of the recovery of the existing choice models, sample complexity, assortment effect, architecture design, and model interpretation.  ( 2 min )
    A Physics-based Domain Adaptation framework for modelling and forecasting building energy systems. (arXiv:2208.09456v1 [cs.LG])
    State-of-the-art machine-learning based models are a popular choice for modelling and forecasting energy behaviour in buildings because given enough data, they are good at finding spatiotemporal patterns and structures even in scenarios where the complexity prohibits analytical descriptions. However, machine-learning based models for building energy forecasting have difficulty generalizing to out-of-sample scenarios that are not represented in the data because their architecture typically does not hold physical correspondence to mechanistic structures linked with governing phenomena of energy transfer. Thus, their ability to forecast for unseen initial conditions and boundary conditions wholly depends on the representativeness in the data, which is not guaranteed in building measurement data. Consequently, these limitations impede their application to real-world engineering applications such as energy management in Digital Twins. In response, we present a Domain Adaptation framework that aims to leverage well-known understanding of phenomenon governing energy behavior in buildings to forecast for out of sample scenarios beyond building measurement data. More specifically, we represent mechanistic knowledge of energy behavior using low-rank linear time-invariant state space models and subsequently leverage their governing structure to forecast for a target energy system for which only building measurement data is available. We achieve this by aligning the Physics-derived subspace that governs global state space behavior closer towards the target subspace derived from the measurement data. In this initial exploration we focus on linear energy systems; we test the subspace-based DA framework on a 1D heat conduction scenario by varying the thermophysical properties of the source and target systems to demonstrate the transferability of mechanistic models from Physics to measurement data.  ( 3 min )
    SGDE: Secure Generative Data Exchange for Cross-Silo Federated Learning. (arXiv:2109.12062v2 [cs.LG] UPDATED)
    Privacy regulation laws, such as GDPR, impose transparency and security as design pillars for data processing algorithms. In this context, federated learning is one of the most influential frameworks for privacy-preserving distributed machine learning, achieving astounding results in many natural language processing and computer vision tasks. Several federated learning frameworks employ differential privacy to prevent private data leakage to unauthorized parties and malicious attackers. Many studies, however, highlight the vulnerabilities of standard federated learning to poisoning and inference, thus, raising concerns about potential risks for sensitive data. To address this issue, we present SGDE, a generative data exchange protocol that improves user security and machine learning performance in a cross-silo federation. The core of SGDE is to share data generators with strong differential privacy guarantees trained on private data instead of communicating explicit gradient information. These generators synthesize an arbitrarily large amount of data that retain the distinctive features of private samples but differ substantially. We show how the inclusion of SGDE into a cross-silo federated network improves resilience to the most influential attacks to federated learning. We test our approach on images and tabular datasets, exploiting beta-variational autoencoders as data generators and highlighting fairness and performance improvements over local and federated learning on non-generated data.  ( 3 min )
    Suboptimal Performance of the Bayes Optimal Algorithm in Frequentist Best Arm Identification. (arXiv:2202.05193v2 [stat.ML] UPDATED)
    We consider the fixed-budget best-arm identification problem with Normal reward distributions. In this problem, the forecaster is given $K$ arms (or treatments) and $T$ time steps. The forecaster attempts to find the best arm, defined by the largest mean, via an adaptive experiment conducted using an algorithm. The algorithm's performance is measured by the simple regret, that is, the quality of the estimated best arm. The frequentist simple regret can be exponentially small to $T$, whereas the Bayesian simple regret is polynomially small to $T$. This paper demonstrates that Bayes optimal algorithm, which minimizes the Bayesian simple regret, does not produce an exponential simple regret for some parameters, a finding that contrasts with the many results indicating the asymptotic equivalence of Bayesian and frequentist algorithms in the context of fixed sampling regimes. While the Bayes optimal algorithm is described in terms of a recursive equation that is virtually impossible to compute exactly, we establish the foundations for further analysis by introducing a key quantity that we call the expected Bellman improvement.  ( 2 min )
    Federated Select: A Primitive for Communication- and Memory-Efficient Federated Learning. (arXiv:2208.09432v1 [cs.LG])
    Federated learning (FL) is a framework for machine learning across heterogeneous client devices in a privacy-preserving fashion. To date, most FL algorithms learn a "global" server model across multiple rounds. At each round, the same server model is broadcast to all participating clients, updated locally, and then aggregated across clients. In this work, we propose a more general procedure in which clients "select" what values are sent to them. Notably, this allows clients to operate on smaller, data-dependent slices. In order to make this practical, we outline a primitive, federated select, which enables client-specific selection in realistic FL systems. We discuss how to use federated select for model training and show that it can lead to drastic reductions in communication and client memory usage, potentially enabling the training of models too large to fit on-device. We also discuss the implications of federated select on privacy and trust, which in turn affect possible system constraints and design. Finally, we discuss open questions concerning model architectures, privacy-preserving technologies, and practical FL systems.
    Landslide Susceptibility Modeling by Interpretable Neural Network. (arXiv:2201.06837v2 [cs.LG] UPDATED)
    Landslides are notoriously difficult to predict because numerous spatially and temporally varying factors contribute to slope stability. Artificial neural networks (ANN) have been shown to improve prediction accuracy. However, traditional ANNs are uninterpretable, complex black box models. This makes it difficult to extract mechanistic information about landslide controls in the modeled region or trust the outcome in this high-stakes application. Herein we present the first application of an interpretable additive neural network to landslide susceptibility modeling. We introduce a new additive ANN optimization framework, as well as new dataset division and outcome interpretation techniques uniquely suitable for modeling applications with spatially dependent data structures such as landslide susceptibility. We refer to our approach which features full interpretability, high accuracy, high generalizability and low model complexity as superposable neural network (SNN) optimization. We validate our approach by training models to assess landslide susceptibility in three different regions of the easternmost Himalaya that are highly susceptible to landslides. The interpretable neural network models generated by the SNN outperform physically-based stability and statistical models and achieve similar performance to state-of the-art deep neural networks while offering insight regarding the relative importance of landslide control factors. The SNN models found the product of slope and precipitation and hillslope aspect to be important primary contributors to high landslide susceptibility in the studied regions. These identified controls suggest that strong slope-climate couplings, along with microclimates, play dominant roles in landslide occurrences of the easternmost Himalaya.
    Gender Bias and Universal Substitution Adversarial Attacks on Grammatical Error Correction Systems for Automated Assessment. (arXiv:2208.09466v1 [cs.CL])
    Grammatical Error Correction (GEC) systems perform a sequence-to-sequence task, where an input word sequence containing grammatical errors, is corrected for these errors by the GEC system to output a grammatically correct word sequence. With the advent of deep learning methods, automated GEC systems have become increasingly popular. For example, GEC systems are often used on speech transcriptions of English learners as a form of assessment and feedback - these powerful GEC systems can be used to automatically measure an aspect of a candidate's fluency. The count of \textit{edits} from a candidate's input sentence (or essay) to a GEC system's grammatically corrected output sentence is indicative of a candidate's language ability, where fewer edits suggest better fluency. The count of edits can thus be viewed as a \textit{fluency score} with zero implying perfect fluency. However, although deep learning based GEC systems are extremely powerful and accurate, they are susceptible to adversarial attacks: an adversary can introduce a small, specific change at the input of a system that causes a large, undesired change at the output. When considering the application of GEC systems to automated language assessment, the aim of an adversary could be to cheat by making a small change to a grammatically incorrect input sentence that conceals the errors from a GEC system, such that no edits are found and the candidate is unjustly awarded a perfect fluency score. This work examines a simple universal substitution adversarial attack that non-native speakers of English could realistically employ to deceive GEC systems used for assessment.
    Learning based Age of Information Minimization in UAV-relayed IoT Networks. (arXiv:2203.04227v2 [cs.IT] UPDATED)
    Unmanned Aerial Vehicles (UAVs) are used as aerial base-stations to relay time-sensitive packets from IoT devices to the nearby terrestrial base-station (TBS). Scheduling of packets in such UAV-relayed IoT-networks to ensure fresh (or up-to-date) IoT devices' packets at the TBS is a challenging problem as it involves two simultaneous steps of (i) sampling of packets generated at IoT devices by the UAVs [hop-1] and (ii) updating of sampled packets from UAVs to the TBS [hop-2]. To address this, we propose Age-of-Information (AoI) scheduling algorithms for two-hop UAV-relayed IoT-networks. First, we propose a low-complexity AoI scheduler, termed, MAF-MAD that employs Maximum AoI First (MAF) policy for sampling of IoT devices at UAV (hop-1) and Maximum AoI Difference (MAD) policy for updating sampled packets from UAV to the TBS (hop-2). We prove that MAF-MAD is the optimal AoI scheduler under ideal conditions (lossless wireless channels and generate-at-will traffic-generation at IoT devices). On the contrary, for general conditions (lossy channel conditions and varying periodic traffic-generation at IoT devices), a deep reinforcement learning algorithm, namely, Proximal Policy Optimization (PPO)-based scheduler is proposed. Simulation results show that the proposed PPO-based scheduler outperforms other schedulers like MAF-MAD, MAF, and round-robin in all considered general scenarios.
    Domain Adversarial Spatial-Temporal Network: A Transferable Framework for Short-term Traffic Forecasting across Cities. (arXiv:2202.03630v2 [cs.LG] UPDATED)
    Accurate real-time traffic forecast is critical for intelligent transportation systems (ITS) and it serves as the cornerstone of various smart mobility applications. Though this research area is dominated by deep learning, recent studies indicate that the accuracy improvement by developing new model structures is becoming marginal. Instead, we envision that the improvement can be achieved by transferring the "forecasting-related knowledge" across cities with different data distributions and network topologies. To this end, this paper aims to propose a novel transferable traffic forecasting framework: Domain Adversarial Spatial-Temporal Network (DASTNet). DASTNet is pre-trained on multiple source networks and fine-tuned with the target network's traffic data. Specifically, we leverage the graph representation learning and adversarial domain adaptation techniques to learn the domain-invariant node embeddings, which are further incorporated to model the temporal traffic data. To the best of our knowledge, we are the first to employ adversarial multi-domain adaptation for network-wide traffic forecasting problems. DASTNet consistently outperforms all state-of-the-art baseline methods on three benchmark datasets. The trained DASTNet is applied to Hong Kong's new traffic detectors, and accurate traffic predictions can be delivered immediately (within one day) when the detector is available. Overall, this study suggests an alternative to enhance the traffic forecasting methods and provides practical implications for cities lacking historical traffic data.
    Bayesian Active Learning for Scanning Probe Microscopy: from Gaussian Processes to Hypothesis Learning. (arXiv:2205.15458v2 [cond-mat.mtrl-sci] UPDATED)
    Recent progress in machine learning methods, and the emerging availability of programmable interfaces for scanning probe microscopes (SPMs), have propelled automated and autonomous microscopies to the forefront of attention of the scientific community. However, enabling automated microscopy requires the development of task-specific machine learning methods, understanding the interplay between physics discovery and machine learning, and fully defined discovery workflows. This, in turn, requires balancing the physical intuition and prior knowledge of the domain scientist with rewards that define experimental goals and machine learning algorithms that can translate these to specific experimental protocols. Here, we discuss the basic principles of Bayesian active learning and illustrate its applications for SPM. We progress from the Gaussian Process as a simple data-driven method and Bayesian inference for physical models as an extension of physics-based functional fits to more complex deep kernel learning methods, structured Gaussian Processes, and hypothesis learning. These frameworks allow for the use of prior data, the discovery of specific functionalities as encoded in spectral data, and exploration of physical laws manifesting during the experiment. The discussed framework can be universally applied to all techniques combining imaging and spectroscopy, SPM methods, nanoindentation, electron microscopy and spectroscopy, and chemical imaging methods, and can be particularly impactful for destructive or irreversible measurements.
    Discovery and density estimation of latent confounders in Bayesian networks with evidence lower bound. (arXiv:2206.05490v3 [cs.LG] UPDATED)
    Discovering and parameterising latent confounders represent important and challenging problems in causal structure learning and density estimation respectively. In this paper, we focus on both discovering and learning the distribution of latent confounders. This task requires solutions that come from different areas of statistics and machine learning. We combine elements of variational Bayesian methods, expectation-maximisation, hill-climbing search, and structure learning under the assumption of causal insufficiency. We propose two learning strategies; one that maximises model selection accuracy, and another that improves computational efficiency in exchange for minor reductions in accuracy. The former strategy is suitable for small networks and the latter for moderate size networks. Both learning strategies perform well relative to existing solutions.
    Estimating a potential without the agony of the partition function. (arXiv:2208.09433v1 [cs.LG])
    Estimating a Gibbs density function given a sample is an important problem in computational statistics and statistical learning. Although the well established maximum likelihood method is commonly used, it requires the computation of the partition function (i.e., the normalization of the density). This function can be easily calculated for simple low-dimensional problems but its computation is difficult or even intractable for general densities and high-dimensional problems. In this paper we propose an alternative approach based on Maximum A-Posteriori (MAP) estimators, we name Maximum Recovery MAP (MR-MAP), to derive estimators that do not require the computation of the partition function, and reformulate the problem as an optimization problem. We further propose a least-action type potential that allows us to quickly solve the optimization problem as a feed-forward hyperbolic neural network. We demonstrate the effectiveness of our methods on some standard data sets.
    On the Surprising Behaviour of node2vec. (arXiv:2206.08252v2 [cs.LG] UPDATED)
    Graph embedding techniques are a staple of modern graph learning research. When using embeddings for downstream tasks such as classification, information about their stability and robustness, i.e., their susceptibility to sources of noise, stochastic effects, or specific parameter choices, becomes increasingly important. As one of the most prominent graph embedding schemes, we focus on node2vec and analyse its embedding quality from multiple perspectives. Our findings indicate that embedding quality is unstable with respect to parameter choices, and we propose strategies to remedy this in practice.
    Arrhythmia Classification using CGAN-augmented ECG Signals. (arXiv:2202.00569v2 [eess.SP] UPDATED)
    ECG databases are usually highly imbalanced due to the abundance of Normal ECG and scarcity of abnormal cases. As such, deep learning classifiers trained on imbalanced datasets usually perform poorly, especially on minor classes. One solution is to generate realistic synthetic ECG signals using Generative Adversarial Networks (GAN) to augment imbalanced datasets. In this study, we combined conditional GAN with WGAN-GP and developed AC-WGAN-GP in 1D form for the first time to be applied on MIT-BIH Arrhythmia dataset. We investigated the impact of data augmentation on arrhythmia classification. We employed two models for ECG generation: (i) unconditional GAN; Wasserstein GAN with gradient penalty (WGAN-GP) is trained on each class individually; (ii) conditional GAN; one Auxiliary Classifier WGAN-GP (AC-WGAN-GP) model is trained on all classes and then used to generate synthetic beats in all classes. Two scenarios are defined for each case: (a) unscreened; all the generated synthetic beats were used, and (b) screened; only a portion of generated beats are selected and used, based on their Dynamic Time Warping (DTW) distance to a designated template. A state-of-the-art ResNet classifier (EcgResNet34) is trained on each of the augmented datasets and the performance metrics (precision/recall/F1-Score micro- and macro-averaged, confusion matrices, multiclass precision-recall curves) were compared with those of the unaugmented imbalanced case. We also used a simple metric Net Improvement. All the three metrics show consistently that net improvement (total and minor-class), unconditional GAN with raw generated data (not screened) creates the best improvements.
    Deletion and Insertion Tests in Regression Models. (arXiv:2205.12423v2 [cs.LG] UPDATED)
    A basic task in explainable AI (XAI) is to identify the most important features behind a prediction made by a black box function $f$. The insertion and deletion tests of Petsiuk et al. (2018) are used to judge the quality of algorithms that rank pixels from most to least important for a classification. Motivated by regression problems we establish a formula for their area under the curve (AUC) criteria in terms of certain main effects and interactions in an anchored decomposition of $f$. We find an expression for the expected value of the AUC under a random ordering of inputs to $f$ and propose an alternative area above a straight line for the regression setting. We use this criterion to compare feature importances computed by integrated gradients (IG) to those computed by Kernel SHAP (KS) as well as LIME, DeepLIFT, vanilla gradient and input$\times$gradient methods. KS has the best overall performance in two datasets we consider but it is very expensive to compute. We find that IG is nearly as good as KS while being much faster. Our comparison problems include some binary inputs that pose a challenge to IG because it must use values between the possible variable levels and so we consider ways to handle binary variables in IG. We show that sorting variables by their Shapley value does not necessarily give the optimal ordering for an insertion-deletion test. It will however do that for monotone functions of additive models, such as logistic regression.
    How to Fine-tune Models with Few Samples: Update, Data Augmentation, and Test-time Augmentation. (arXiv:2205.07874v3 [cs.LG] UPDATED)
    Most of the recent few-shot learning (FSL) algorithms are based on transfer learning, where a model is pre-trained using a large amount of source data, and the pre-trained model is fine-tuned using a small amount of target data. In transfer learning-based FSL, sophisticated pre-training methods have been widely studied for universal representation. Therefore, it has become more important to utilize the universal representation for downstream tasks, but there are few studies on fine-tuning in FSL. In this paper, we focus on how to transfer pre-trained models to few-shot downstream tasks from the three perspectives: update, data augmentation, and test-time augmentation. First, we compare the two popular update methods, full fine-tuning (i.e., updating the entire network, FT) and linear probing (i.e., updating only a linear classifier, LP). We find that LP is better than FT with extremely few samples, whereas FT outperforms LP as training samples increase. Next, we show that data augmentation cannot guarantee few-shot performance improvement and investigate the effectiveness of data augmentation based on the intensity of augmentation. Finally, we adopt augmentation to both a support set for update (i.e., data augmentation) as well as a query set for prediction (i.e., test-time augmentation), considering support-query distribution shifts, and improve few-shot performance. The code is available at https://github.com/kimyuji/updating_FSL.
    A Tutorial on the Spectral Theory of Markov Chains. (arXiv:2207.02296v2 [cs.LG] UPDATED)
    Markov chains are a class of probabilistic models that have achieved widespread application in the quantitative sciences. This is in part due to their versatility, but is compounded by the ease with which they can be probed analytically. This tutorial provides an in-depth introduction to Markov chains, and explores their connection to graphs and random walks. We utilize tools from linear algebra and graph theory to describe the transition matrices of different types of Markov chains, with a particular focus on exploring properties of the eigenvalues and eigenvectors corresponding to these matrices. The results presented are relevant to a number of methods in machine learning and data mining, which we describe at various stages. Rather than being a novel academic study in its own right, this text presents a collection of known results, together with some new concepts. Moreover, the tutorial focuses on offering intuition to readers rather than formal understanding, and only assumes basic exposure to concepts from linear algebra and probability theory. It is therefore accessible to students and researchers from a wide variety of disciplines.
    WeShort: Out-of-distribution Detection With Weak Shortcut structure. (arXiv:2207.05055v3 [cs.LG] UPDATED)
    Neural networks have achieved impressive performance for data in the distribution which is the same as the training set but can produce an overconfident incorrect result for the data these networks have never seen. Therefore, it is essential to detect whether inputs come from out-of-distribution(OOD) in order to guarantee the safety of neural networks deployed in the real world. In this paper, we propose a simple and effective post-hoc technique, WeShort, to reduce the overconfidence of neural networks on OOD data. Our method is inspired by the observation of the internal residual structure, which shows the separation of the OOD and in-distribution (ID) data in the shortcut layer. Our method is compatible with different OOD detection scores and can generalize well to different architectures of networks. We demonstrate our method on various OOD datasets to show its competitive performances and provide reasonable hypotheses to explain why our method works. On the ImageNet benchmark, Weshort achieves state-of-the-art performance on the false positive rate (FPR95) and the area under the receiver operating characteristic (AUROC) on the family of post-hoc methods.
    ALBU: An approximate Loopy Belief message passing algorithm for LDA to improve performance on small data sets. (arXiv:2110.00635v2 [cs.LG] UPDATED)
    Variational Bayes (VB) applied to latent Dirichlet allocation (LDA) has become the most popular algorithm for aspect modeling. While sufficiently successful in text topic extraction from large corpora, VB is less successful in identifying aspects in the presence of limited data. We present a novel variational message passing algorithm as applied to Latent Dirichlet Allocation (LDA) and compare it with the gold standard VB and collapsed Gibbs sampling. In situations where marginalisation leads to non-conjugate messages, we use ideas from sampling to derive approximate update equations. In cases where conjugacy holds, Loopy Belief update (LBU) (also known as Lauritzen-Spiegelhalter) is used. Our algorithm, ALBU (approximate LBU), has strong similarities with Variational Message Passing (VMP) (which is the message passing variant of VB). To compare the performance of the algorithms in the presence of limited data, we use data sets consisting of tweets and news groups. Additionally, to perform more fine grained evaluations and comparisons, we use simulations that enable comparisons with the ground truth via Kullback-Leibler divergence (KLD). Using coherence measures for the text corpora and KLD with the simulations we show that ALBU learns latent distributions more accurately than does VB, especially for smaller data sets.
    Few-Shot Class-Incremental Learning by Sampling Multi-Phase Tasks. (arXiv:2203.17030v2 [cs.CV] UPDATED)
    New classes arise frequently in our ever-changing world, e.g., emerging topics in social media and new types of products in e-commerce. A model should recognize new classes and meanwhile maintain discriminability over old classes. Under severe circumstances, only limited novel instances are available to incrementally update the model. The task of recognizing few-shot new classes without forgetting old classes is called few-shot class-incremental learning (FSCIL). In this work, we propose a new paradigm for FSCIL based on meta-learning by LearnIng Multi-phase Incremental Tasks (LIMIT), which synthesizes fake FSCIL tasks from the base dataset. The data format of fake tasks is consistent with the `real' incremental tasks, and we can build a generalizable feature space for the unseen tasks through meta-learning. Besides, LIMIT also constructs a calibration module based on transformer, which calibrates the old class classifiers and new class prototypes into the same scale and fills in the semantic gap. The calibration module also adaptively contextualizes the instance-specific embedding with a set-to-set function. LIMIT efficiently adapts to new classes and meanwhile resists forgetting over old classes. Experiments on three benchmark datasets (CIFAR100, miniImageNet, and CUB200) and large-scale dataset, i.e., ImageNet ILSVRC2012 validate that LIMIT achieves state-of-the-art performance.
    Augmenting Message Passing by Retrieving Similar Graphs. (arXiv:2206.00362v2 [cs.LG] UPDATED)
    Graph Neural Networks~(GNNs) are effective tools for graph representation learning. Most GNNs rely on a recursive neighborhood aggregation scheme, named message passing, thereby their theoretical expressive power is limited to the first order Weisfeiler-Lehman test (1-WL). Motivated by the success of retrieval-based models and off-the-shelf high-performance retrieval systems, we propose a non-parametric and model-agnostic scheme called GraphRetrieval to boost existing GNN models. In GraphRetrieval, similar training graphs associated with their ground-truth labels are retrieved as an enhancement to be jointly utilized with the input graph representation to complete various graph property predictive tasks. In particular, to effectively "absorb" useful information from retrieved graphs and "ignore" possible noise, we introduce an adapter based on self-attention to explicitly learn the interaction between an input graph and its retrieved similar graphs. By experimenting with three classic GNN models on 12 different datasets, we have demonstrated GraphRetrieval is able to bring substantial improvements to existing GNN models without comprising the model size and the prediction efficiency. Our work also firstly validates the feasibility and effectiveness of retrieved-enhanced graph neural networks.
    Bi-fidelity Modeling of Uncertain and Partially Unknown Systems using DeepONets. (arXiv:2204.00997v2 [stat.ML] UPDATED)
    Recent advances in modeling large-scale complex physical systems have shifted research focuses towards data-driven techniques. However, generating datasets by simulating complex systems can require significant computational resources. Similarly, acquiring experimental datasets can prove difficult as well. For these systems, often computationally inexpensive, but in general inaccurate, models, known as the low-fidelity models, are available. In this paper, we propose a bi-fidelity modeling approach for complex physical systems, where we model the discrepancy between the true system's response and low-fidelity response in the presence of a small training dataset from the true system's response using a deep operator network (DeepONet), a neural network architecture suitable for approximating nonlinear operators. We apply the approach to model systems that have parametric uncertainty and are partially unknown. Three numerical examples are used to show the efficacy of the proposed approach to model uncertain and partially unknown complex physical systems.
    Parametric and Multivariate Uncertainty Calibration for Regression and Object Detection. (arXiv:2207.01242v2 [cs.LG] UPDATED)
    Reliable spatial uncertainty evaluation of object detection models is of special interest and has been subject of recent work. In this work, we review the existing definitions for uncertainty calibration of probabilistic regression tasks. We inspect the calibration properties of common detection networks and extend state-of-the-art recalibration methods. Our methods use a Gaussian process (GP) recalibration scheme that yields parametric distributions as output (e.g. Gaussian or Cauchy). The usage of GP recalibration allows for a local (conditional) uncertainty calibration by capturing dependencies between neighboring samples. The use of parametric distributions such as as Gaussian allows for a simplified adaption of calibration in subsequent processes, e.g., for Kalman filtering in the scope of object tracking. In addition, we use the GP recalibration scheme to perform covariance estimation which allows for post-hoc introduction of local correlations between the output quantities, e.g., position, width, or height in object detection. To measure the joint calibration of multivariate and possibly correlated data, we introduce the quantile calibration error which is based on the Mahalanobis distance between the predicted distribution and the ground truth to determine whether the ground truth is within a predicted quantile. Our experiments show that common detection models overestimate the spatial uncertainty in comparison to the observed error. We show that the simple Isotonic Regression recalibration method is sufficient to achieve a good uncertainty quantification in terms of calibrated quantiles. In contrast, if normal distributions are required for subsequent processes, our GP-Normal recalibration method yields the best results. Finally, we show that our covariance estimation method is able to achieve best calibration results for joint multivariate calibration.
    Dance Style Transfer with Cross-modal Transformer. (arXiv:2208.09406v1 [cs.LG])
    We present CycleDance, a dance style transfer system to transform an existing motion clip in one dance style to a motion clip in another dance style while attempting to preserve motion context of the dance. Our method extends an existing CycleGAN architecture for modeling audio sequences and integrates multimodal transformer encoders to account for music context. We adopt sequence length-based curriculum learning to stabilize training. Our approach captures rich and long-term intra-relations between motion frames, which is a common challenge in motion transfer and synthesis work. We further introduce new metrics for gauging transfer strength and content preservation in the context of dance movements. We perform an extensive ablation study as well as a human study including 30 participants with 5 or more years of dance experience. The results demonstrate that CycleDance generates realistic movements with the target style, significantly outperforming the baseline CycleGAN on naturalness, transfer strength, and content preservation.
    Accelerated MRI With Deep Linear Convolutional Transform Learning. (arXiv:2204.07923v2 [eess.IV] UPDATED)
    Recent studies show that deep learning (DL) based MRI reconstruction outperforms conventional methods, such as parallel imaging and compressed sensing (CS), in multiple applications. Unlike CS that is typically implemented with pre-determined linear representations for regularization, DL inherently uses a non-linear representation learned from a large database. Another line of work uses transform learning (TL) to bridge the gap between these two approaches by learning linear representations from data. In this work, we combine ideas from CS, TL and DL reconstructions to learn deep linear convolutional transforms as part of an algorithm unrolling approach. Using end-to-end training, our results show that the proposed technique can reconstruct MR images to a level comparable to DL methods, while supporting uniform undersampling patterns unlike conventional CS methods. Our proposed method relies on convex sparse image reconstruction with linear representation at inference time, which may be beneficial for characterizing robustness, stability and generalizability.
    NARX Identification using Derivative-Based Regularized Neural Networks. (arXiv:2204.05892v2 [eess.SY] UPDATED)
    This work presents a novel regularization method for the identification of Nonlinear Autoregressive eXogenous (NARX) models. The regularization method promotes the exponential decay of the influence of past input samples on the current model output. This is done by penalizing the sensitivity of the NARX model simulated output with respect to the past inputs. This promotes the stability of the estimated models and improves the obtained model quality. The effectiveness of the approach is demonstrated through a simulation example, where a neural network NARX model is identified with this novel method. Moreover, it is shown that the proposed regularization approach improves the model accuracy in terms of simulation error performance compared to that of other regularization methods and model classes.
    A Novel Plug-and-Play Approach for Adversarially Robust Generalization. (arXiv:2208.09449v1 [cs.LG])
    In this work, we propose a robust framework that employs adversarially robust training to safeguard the machine learning models against perturbed testing data. We achieve this by incorporating the worst-case additive adversarial error within a fixed budget for each sample during model estimation. Our main focus is to provide a plug-and-play solution that can be incorporated in the existing machine learning algorithms with minimal changes. To that end, we derive the closed-form ready-to-use solution for several widely used loss functions with a variety of norm constraints on adversarial perturbation. Finally, we validate our approach by showing significant performance improvement on real-world datasets for supervised problems such as regression and classification, as well as for unsupervised problems such as matrix completion and learning graphical models, with very little computational overhead.
    Physics-informed neural networks for PDE-constrained optimization and control. (arXiv:2205.03377v2 [cs.LG] UPDATED)
    A fundamental problem in science and engineering is designing optimal control policies that steer a given system towards a desired outcome. This work proposes Control Physics-Informed Neural Networks (Control PINNs) that simultaneously solve for a given system state, and for the optimal control signal, in a one-stage framework that conforms to the underlying physical laws. Prior approaches use a two-stage framework that first models and then controls a system in sequential order. In contrast, a Control PINN incorporates the required optimality conditions in its architecture and in its loss function. The success of Control PINNs is demonstrated by solving the following open-loop optimal control problems: (i) an analytical problem, (ii) a one-dimensional heat equation, and (iii) a two-dimensional predator-prey problem.
    Deep Signature FBSDE Algorithm. (arXiv:2108.10504v2 [cs.LG] UPDATED)
    We propose a deep signature/log-signature FBSDE algorithm to solve forward-backward stochastic differential equations (FBSDEs) with state and path dependent features. By incorporating the deep signature/log-signature transformation into the recurrent neural network (RNN) model, our algorithm shortens the training time, improves the accuracy, and extends the time horizon comparing to methods in the existing literature. Moreover, our algorithms can be applied to a wide range of applications such as state and path dependent option pricing involving high-frequency data, model ambiguity, and stochastic games, which are linked to parabolic partial differential equations (PDEs), and path-dependent PDEs (PPDEs). Lastly, we also derive the convergence analysis of the deep signature/log-signature FBSDE algorithm.
    Echofilter: A Deep Learning Segmentation Model Improves the Automation, Standardization, and Timeliness for Post-Processing Echosounder Data in Tidal Energy Streams. (arXiv:2202.09648v2 [cs.LG] UPDATED)
    Understanding the abundance and distribution of fish in tidal energy streams is important to assess risks presented by introducing tidal energy devices to the habitat. However tidal current flows suitable for tidal energy are often highly turbulent, complicating the interpretation of echosounder data. The portion of the water column contaminated by returns from entrained air must be excluded from data used for biological analyses. Application of a single conventional algorithm to identify the depth-of-penetration of entrained air is insufficient for a boundary that is discontinuous, depth-dynamic, porous, and varies with tidal flow speed. Using a case study at a tidal energy demonstration site in the Bay of Fundy, we describe the development and application of a deep machine learning model with a U-Net based architecture. Our model, Echofilter, was highly responsive to the dynamic range of turbulence conditions and sensitive to the fine-scale nuances in the boundary position, producing an entrained-air boundary line with an average error of 0.33m on mobile downfacing and 0.5-1.0m on stationary upfacing data, less than half that of existing algorithmic solutions. The model's overall annotations had a high level of agreement with the human segmentation, with an intersection-over-union score of 99% for mobile downfacing recordings and 92-95% for stationary upfacing recordings. This resulted in a 50% reduction in the time required for manual edits when compared to the time required to manually edit the line placement produced by the currently available algorithms. Because of the improved initial automated placement, the implementation of the models permits an increase in the standardization and repeatability of line placement.
    Communication Size Reduction of Federated Learning based on Neural ODE Model. (arXiv:2208.09478v1 [cs.LG])
    Federated learning is a machine learning method in which data is not aggregated on a server, but is distributed to the edges, in consideration of security and privacy. ResNet is a classic but representative neural network that succeeds in deepening the neural network by learning a residual function that adds the inputs and outputs together. In federated learning, communication is performed between the server and edge devices to exchange weight parameters, but ResNet has deep layers and a large number of parameters, so communication size becomes large. In this paper, we use Neural ODE as a lightweight model of ResNet to reduce communication size in federated learning. In addition, we newly introduce a flexible federated learning using Neural ODE models with different number of iterations, which correspond to ResNet with different depths. The CIFAR-10 dataset is used in the evaluation, and the use of Neural ODE reduces communication size by approximately 90% compared to ResNet. We also show that the proposed flexible federated learning can merge models with different iteration counts.
    Kernel PCA with the Nystr\"om method. (arXiv:2109.05578v3 [stat.ML] UPDATED)
    The Nystr\"om method is one of the most popular techniques for improving the scalability of kernel methods. However, it has not yet been derived for kernel PCA in line with classical PCA. In this paper we derive kernel PCA with the Nystr\"om method, thereby providing one of the few available options to make kernel PCA scalable. We further study its statistical accuracy through a finite-sample confidence bound on the empirical reconstruction error compared to the full method. The behaviours of the method and bound are illustrated through computer experiments on multiple real-world datasets. As an application of the method we present kernel principal component regression with the Nystr\"om method, as an alternative to Nystr\"om kernel ridge regression for efficient regularized regression with kernels.
    CECILIA: Comprehensive Secure Machine Learning Framework. (arXiv:2202.03023v2 [cs.LG] UPDATED)
    Since ML algorithms have proven their success in many different applications, there is also a big interest in privacy preserving (PP) ML methods for building models on sensitive data. Moreover, the increase in the number of data sources and the high computational power required by those algorithms force individuals to outsource the training and/or the inference of a ML model to the clouds providing such services. To address this, we propose a secure 3-party computation framework, CECILIA, offering PP building blocks to enable complex operations privately. In addition to the adapted and common operations like addition and multiplication, it offers multiplexer, most significant bit and modulus conversion. The first two are novel in terms of methodology and the last one is novel in terms of both functionality and methodology. CECILIA also has two complex novel methods, which are the exact exponential of a public base raised to the power of a secret value and the inverse square root of a secret Gram matrix. We use CECILIA to realize the private inference on pre-trained RKNs, which require more complex operations than most other DNNs, on the structural classification of proteins as the first study ever accomplishing the PP inference on RKNs. In addition to the successful private computation of basic building blocks, the results demonstrate that we perform the exact and fully private exponential computation, which is done by approximation in the literature so far. Moreover, they also show that we compute the exact inverse square root of a secret Gram matrix up to a certain privacy level, which has not been addressed in the literature at all. We also analyze the scalability of CECILIA to various settings on a synthetic dataset. The framework shows a great promise to make other ML algorithms as well as further computations privately computable by the building blocks of the framework.
    Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning. (arXiv:2109.11978v3 [cs.RO] UPDATED)
    In this work, we present and study a training set-up that achieves fast policy generation for real-world robotic tasks by using massive parallelism on a single workstation GPU. We analyze and discuss the impact of different training algorithm components in the massively parallel regime on the final policy performance and training times. In addition, we present a novel game-inspired curriculum that is well suited for training with thousands of simulated robots in parallel. We evaluate the approach by training the quadrupedal robot ANYmal to walk on challenging terrain. The parallel approach allows training policies for flat terrain in under four minutes, and in twenty minutes for uneven terrain. This represents a speedup of multiple orders of magnitude compared to previous work. Finally, we transfer the policies to the real robot to validate the approach. We open-source our training code to help accelerate further research in the field of learned legged locomotion.
    Local Calibration: Metrics and Recalibration. (arXiv:2102.10809v3 [cs.LG] UPDATED)
    Probabilistic classifiers output confidence scores along with their predictions, and these confidence scores should be calibrated, i.e., they should reflect the reliability of the prediction. Confidence scores that minimize standard metrics such as the expected calibration error (ECE) accurately measure the reliability on average across the entire population. However, it is in general impossible to measure the reliability of an individual prediction. In this work, we propose the local calibration error (LCE) to span the gap between average and individual reliability. For each individual prediction, the LCE measures the average reliability of a set of similar predictions, where similarity is quantified by a kernel function on a pretrained feature space and by a binning scheme over predicted model confidences. We show theoretically that the LCE can be estimated sample-efficiently from data, and empirically find that it reveals miscalibration modes that are more fine-grained than the ECE can detect. Our key result is a novel local recalibration method LoRe, to improve confidence scores for individual predictions and decrease the LCE. Experimentally, we show that our recalibration method produces more accurate confidence scores, which improves downstream fairness and decision making on classification tasks with both image and tabular data.
    OpenCoS: Contrastive Semi-supervised Learning for Handling Open-set Unlabeled Data. (arXiv:2107.08943v3 [cs.CV] UPDATED)
    Semi-supervised learning (SSL) has been a powerful strategy to incorporate few labels in learning better representations. In this paper, we focus on a practical scenario that one aims to apply SSL when unlabeled data may contain out-of-class samples - those that cannot have one-hot encoded labels from a closed-set of classes in label data, i.e., the unlabeled data is an open-set. Specifically, we introduce OpenCoS, a simple framework for handling this realistic semi-supervised learning scenario based upon a recent framework of self-supervised visual representation learning. We first observe that the out-of-class samples in the open-set unlabeled dataset can be identified effectively via self-supervised contrastive learning. Then, OpenCoS utilizes this information to overcome the failure modes in the existing state-of-the-art semi-supervised methods, by utilizing one-hot pseudo-labels and soft-labels for the identified in- and out-of-class unlabeled data, respectively. Our extensive experimental results show the effectiveness of OpenCoS under the presence of out-of-class samples, fixing up the state-of-the-art semi-supervised methods to be suitable for diverse scenarios involving open-set unlabeled data.
    Empirical or Invariant Risk Minimization? A Sample Complexity Perspective. (arXiv:2010.16412v2 [cs.LG] UPDATED)
    Recently, invariant risk minimization (IRM) was proposed as a promising solution to address out-of-distribution (OOD) generalization. However, it is unclear when IRM should be preferred over the widely-employed empirical risk minimization (ERM) framework. In this work, we analyze both these frameworks from the perspective of sample complexity, thus taking a firm step towards answering this important question. We find that depending on the type of data generation mechanism, the two approaches might have very different finite sample and asymptotic behavior. For example, in the covariate shift setting we see that the two approaches not only arrive at the same asymptotic solution, but also have similar finite sample behavior with no clear winner. For other distribution shifts such as those involving confounders or anti-causal variables, however, the two approaches arrive at different asymptotic solutions where IRM is guaranteed to be close to the desired OOD solutions in the finite sample regime, while ERM is biased even asymptotically. We further investigate how different factors -- the number of environments, complexity of the model, and IRM penalty weight -- impact the sample complexity of IRM in relation to its distance from the OOD solutions
    A scalable and fast artificial neural network syndrome decoder for surface codes. (arXiv:2110.05854v3 [quant-ph] UPDATED)
    Surface code error correction offers a highly promising pathway to achieve scalable fault-tolerant quantum computing. When operated as stabilizer codes, surface code computations consist of a syndrome decoding step where measured stabilizer operators are used to determine appropriate corrections for errors in physical qubits. Decoding algorithms have undergone substantial development, with recent work incorporating machine learning (ML) techniques. Despite promising initial results, the ML-based syndrome decoders are still limited to small scale demonstrations with low latency and are incapable of handling surface codes with boundary conditions and various shapes needed for lattice surgery and braiding. Here, we report the development of an artificial neural network (ANN) based scalable and fast syndrome decoder capable of decoding surface codes of arbitrary shape and size with data qubits suffering from the depolarizing error model. Based on rigorous training over 50 million random quantum error instances, our ANN decoder is shown to work with code distances exceeding 1000 (more than 4 million physical qubits), which is the largest ML-based decoder demonstration to-date. The established ANN decoder demonstrates an execution time in principle independent of code distance, implying that its implementation on dedicated hardware could potentially offer surface code decoding times of O($\mu$sec), commensurate with the experimentally realisable qubit coherence times. With the anticipated scale-up of quantum processors within the next decade, their augmentation with a fast and scalable syndrome decoder such as developed in our work is expected to play a decisive role towards experimental implementation of fault-tolerant quantum information processing.
    A Knowledge Graph-Enhanced Tensor Factorisation Model for Discovering Drug Targets. (arXiv:2105.10578v3 [q-bio.QM] UPDATED)
    The drug discovery and development process is a long and expensive one, costing over 1 billion USD on average per drug and taking 10-15 years. To reduce the high levels of attrition throughout the process, there has been a growing interest in applying machine learning methodologies to various stages of drug discovery and development in the recent decade, especially at the earliest stage identification of druggable disease genes. In this paper, we have developed a new tensor factorisation model to predict potential drug targets (genes or proteins) for treating diseases. We created a three dimensional data tensor consisting of 1,048 gene targets, 860 diseases and 230,011 evidence attributes and clinical outcomes connecting them, using data extracted from the Open Targets and PharmaProjects databases. We enriched the data with gene target representations learned from a drug discovery oriented knowledge graph and applied our proposed method to predict the clinical outcomes for unseen gene target and disease pairs. We designed three evaluation strategies to measure the prediction performance and benchmarked several commonly used machine learning classifiers together with Bayesian matrix and tensor factorisation methods. The result shows that incorporating knowledge graph embeddings significantly improves the prediction accuracy and that training tensor factorisation alongside a dense neural network outperforms all other baselines. In summary, our framework combines two actively studied machine learning approaches to disease target identification, namely tensor factorisation and knowledge graph representation learning, which could be a promising avenue for further exploration in data driven drug discovery.
    Small random initialization is akin to spectral learning: Optimization and generalization guarantees for overparameterized low-rank matrix reconstruction. (arXiv:2106.15013v3 [cs.LG] UPDATED)
    Recently there has been significant theoretical progress on understanding the convergence and generalization of gradient-based methods on nonconvex losses with overparameterized models. Nevertheless, many aspects of optimization and generalization and in particular the critical role of small random initialization are not fully understood. In this paper, we take a step towards demystifying this role by proving that small random initialization followed by a few iterations of gradient descent behaves akin to popular spectral methods. We also show that this implicit spectral bias from small random initialization, which is provably more prominent for overparameterized models, also puts the gradient descent iterations on a particular trajectory towards solutions that are not only globally optimal but also generalize well. Concretely, we focus on the problem of reconstructing a low-rank matrix from a few measurements via a natural nonconvex formulation. In this setting, we show that the trajectory of the gradient descent iterations from small random initialization can be approximately decomposed into three phases: (I) a spectral or alignment phase where we show that that the iterates have an implicit spectral bias akin to spectral initialization allowing us to show that at the end of this phase the column space of the iterates and the underlying low-rank matrix are sufficiently aligned, (II) a saddle avoidance/refinement phase where we show that the trajectory of the gradient iterates moves away from certain degenerate saddle points, and (III) a local refinement phase where we show that after avoiding the saddles the iterates converge quickly to the underlying low-rank matrix. Underlying our analysis are insights for the analysis of overparameterized nonconvex optimization schemes that may have implications for computational problems beyond low-rank reconstruction.
    Finding groups of cross-correlated features in bi-view data. (arXiv:2009.05079v3 [stat.ME] UPDATED)
    Data sets in which measurements of two (or more) types are obtained from a common set of samples arise in many scientific applications. A common problem in the exploratory analysis of such data is to identify groups of features of different data types that are strongly associated. A bimodule is a pair (A, B) of feature sets from two data types such that the aggregate cross-correlation between the features in A and those in B is large. A bimodule (A, B) is stable if A coincides with the set of features that have significant aggregate correlation with the features in B, and vice-versa. In this paper we propose and investigate an iterative testing-based procedure (BSP) to identify stable bimodules in bi-view data. We carry out a thorough simulation study to assess the performance of BSP, and present an extended application to the problem of expression quantitative trait loci (eQTL) analysis using recent data from the GTEx project. In addition, we apply BSP to climatology data to identify regions in North America where annual temperature variation affects precipitation.
    Entropy Augmented Reinforcement Learning. (arXiv:2208.09322v1 [cs.LG])
    Deep reinforcement learning has gained a lot of success with the presence of trust region policy optimization (TRPO) and proximal policy optimization (PPO), for their scalability and efficiency. However, the pessimism of both algorithms, among which it either is constrained in a trust region or strictly excludes all suspicious gradients, has been proven to suppress the exploration and harm the performance of the agent. To address those issues, we propose a shifted Markov decision process (MDP), or rather, with entropy augmentation, to encourage the exploration and reinforce the ability of escaping from suboptimums. Our method is extensible and adapts to either reward shaping or bootstrapping. With convergence analysis given, we find it is crucial to control the temperature coefficient. However, if appropriately tuning it, we can achieve remarkable performance, even on other algorithms, since it is simple yet effective. Our experiments test augmented TRPO and PPO on MuJoCo benchmark tasks, of an indication that the agent is heartened towards higher reward regions, and enjoys a balance between exploration and exploitation. We verify the exploration bonus of our method on two grid world environments.
    Non-Stationary Dynamic Pricing Via Actor-Critic Information-Directed Pricing. (arXiv:2208.09372v1 [stat.ML])
    This paper presents a novel non-stationary dynamic pricing algorithm design, where pricing agents face incomplete demand information and market environment shifts. The agents run price experiments to learn about each product's demand curve and the profit-maximizing price, while being aware of market environment shifts to avoid high opportunity costs from offering sub-optimal prices. The proposed ACIDP extends information-directed sampling (IDS) algorithms from statistical machine learning to include microeconomic choice theory, with a novel pricing strategy auditing procedure to escape sub-optimal pricing after market environment shift. The proposed ACIDP outperforms competing bandit algorithms including Upper Confidence Bound (UCB) and Thompson sampling (TS) in a series of market environment shifts.
    Feature Selection for Fault Detection and Prediction based on Event Log Analysis. (arXiv:2208.09440v1 [cs.LG])
    Event logs are widely used for anomaly detection and prediction in complex systems. Existing log-based anomaly detection methods usually consist of four main steps: log collection, log parsing, feature extraction, and anomaly detection, wherein the feature extraction step extracts useful features for anomaly detection by counting log events. For a complex system, such as a lithography machine consisting of a large number of subsystems, its log may contain thousands of different events, resulting in abounding extracted features. However, when anomaly detection is performed at the subsystem level, analyzing all features becomes expensive and unnecessary. To mitigate this problem, we develop a feature selection method for log-based anomaly detection and prediction, largely improving the effectiveness and efficiency.
    Learning in Stackelberg Games with Non-myopic Agents. (arXiv:2208.09407v1 [cs.GT])
    We study Stackelberg games where a principal repeatedly interacts with a long-lived, non-myopic agent, without knowing the agent's payoff function. Although learning in Stackelberg games is well-understood when the agent is myopic, non-myopic agents pose additional complications. In particular, non-myopic agents may strategically select actions that are inferior in the present to mislead the principal's learning algorithm and obtain better outcomes in the future. We provide a general framework that reduces learning in presence of non-myopic agents to robust bandit optimization in the presence of myopic agents. Through the design and analysis of minimally reactive bandit algorithms, our reduction trades off the statistical efficiency of the principal's learning algorithm against its effectiveness in inducing near-best-responses. We apply this framework to Stackelberg security games (SSGs), pricing with unknown demand curve, strategic classification, and general finite Stackelberg games. In each setting, we characterize the type and impact of misspecifications present in near-best-responses and develop a learning algorithm robust to such misspecifications. Along the way, we improve the query complexity of learning in SSGs with $n$ targets from the state-of-the-art $O(n^3)$ to a near-optimal $\widetilde{O}(n)$ by uncovering a fundamental structural property of such games. This result is of independent interest beyond learning with non-myopic agents.
    Expressing Multivariate Time Series as Graphs with Time Series Attention Transformer. (arXiv:2208.09300v1 [cs.LG])
    A reliable and efficient representation of multivariate time series is crucial in various downstream machine learning tasks. In multivariate time series forecasting, each variable depends on its historical values and there are inter-dependencies among variables as well. Models have to be designed to capture both intra- and inter-relationships among the time series. To move towards this goal, we propose the Time Series Attention Transformer (TSAT) for multivariate time series representation learning. Using TSAT, we represent both temporal information and inter-dependencies of multivariate time series in terms of edge-enhanced dynamic graphs. The intra-series correlations are represented by nodes in a dynamic graph; a self-attention mechanism is modified to capture the inter-series correlations by using the super-empirical mode decomposition (SMD) module. We applied the embedded dynamic graphs to times series forecasting problems, including two real-world datasets and two benchmark datasets. Extensive experiments show that TSAT clearly outerperforms six state-of-the-art baseline methods in various forecasting horizons. We further visualize the embedded dynamic graphs to illustrate the graph representation power of TSAT. We share our code at https://github.com/RadiantResearch/TSAT.  ( 2 min )
    Semi-analytic PINN methods for singularly perturbed boundary value problems. (arXiv:2208.09145v1 [math.NA])
    We propose a new semi-analytic physics informed neural network (PINN) to solve singularly perturbed boundary value problems. The PINN is a scientific machine learning framework that offers a promising perspective for finding numerical solutions to partial differential equations. The PINNs have shown impressive performance in solving various differential equations including time-dependent and multi-dimensional equations involved in a complex geometry of the domain. However, when considering stiff differential equations, neural networks in general fail to capture the sharp transition of solutions, due to the spectral bias. To resolve this issue, here we develop the semi-analytic PINN methods, enriched by using the so-called corrector functions obtained from the boundary layer analysis. Our new enriched PINNs accurately predict numerical solutions to the singular perturbation problems. Numerical experiments include various types of singularly perturbed linear and nonlinear differential equations.  ( 2 min )
    Almost Cost-Free Communication in Federated Best Arm Identification. (arXiv:2208.09215v1 [cs.LG])
    We study the problem of best arm identification in a federated learning multi-armed bandit setup with a central server and multiple clients. Each client is associated with a multi-armed bandit in which each arm yields {\em i.i.d.}\ rewards following a Gaussian distribution with an unknown mean and known variance. The set of arms is assumed to be the same at all the clients. We define two notions of best arm -- local and global. The local best arm at a client is the arm with the largest mean among the arms local to the client, whereas the global best arm is the arm with the largest average mean across all the clients. We assume that each client can only observe the rewards from its local arms and thereby estimate its local best arm. The clients communicate with a central server on uplinks that entail a cost of $C\ge0$ units per usage per uplink. The global best arm is estimated at the server. The goal is to identify the local best arms and the global best arm with minimal total cost, defined as the sum of the total number of arm selections at all the clients and the total communication cost, subject to an upper bound on the error probability. We propose a novel algorithm {\sc FedElim} that is based on successive elimination and communicates only in exponential time steps and obtain a high probability instance-dependent upper bound on its total cost. The key takeaway from our paper is that for any $C\geq 0$ and error probabilities sufficiently small, the total number of arm selections (resp.\ the total cost) under {\sc FedElim} is at most~$2$ (resp.~$3$) times the maximum total number of arm selections under its variant that communicates in every time step. Additionally, we show that the latter is optimal in expectation up to a constant factor, thereby demonstrating that communication is almost cost-free in {\sc FedElim}. We numerically validate the efficacy of {\sc FedElim}.  ( 3 min )
    Evaluating Explainability for Graph Neural Networks. (arXiv:2208.09339v1 [cs.LG])
    As post hoc explanations are increasingly used to understand the behavior of graph neural networks (GNNs), it becomes crucial to evaluate the quality and reliability of GNN explanations. However, assessing the quality of GNN explanations is challenging as existing graph datasets have no or unreliable ground-truth explanations for a given task. Here, we introduce a synthetic graph data generator, ShapeGGen, which can generate a variety of benchmark datasets (e.g., varying graph sizes, degree distributions, homophilic vs. heterophilic graphs) accompanied by ground-truth explanations. Further, the flexibility to generate diverse synthetic datasets and corresponding ground-truth explanations allows us to mimic the data generated by various real-world applications. We include ShapeGGen and several real-world graph datasets into an open-source graph explainability library, GraphXAI. In addition to synthetic and real-world graph datasets with ground-truth explanations, GraphXAI provides data loaders, data processing functions, visualizers, GNN model implementations, and evaluation metrics to benchmark the performance of GNN explainability methods.
    Carefully choose the baseline: Lessons learned from applying XAI attribution methods for regression tasks in geoscience. (arXiv:2208.09473v1 [physics.geo-ph])
    Methods of eXplainable Artificial Intelligence (XAI) are used in geoscientific applications to gain insights into the decision-making strategy of Neural Networks (NNs) highlighting which features in the input contribute the most to a NN prediction. Here, we discuss our lesson learned that the task of attributing a prediction to the input does not have a single solution. Instead, the attribution results and their interpretation depend greatly on the considered baseline (sometimes referred to as reference point) that the XAI method utilizes; a fact that has been overlooked so far in the literature. This baseline can be chosen by the user or it is set by construction in the method s algorithm, often without the user being aware of that choice. We highlight that different baselines can lead to different insights for different science questions and, thus, should be chosen accordingly. To illustrate the impact of the baseline, we use a large ensemble of historical and future climate simulations forced with the SSP3-7.0 scenario and train a fully connected NN to predict the ensemble- and global-mean temperature (i.e., the forced global warming signal) given an annual temperature map from an individual ensemble member. We then use various XAI methods and different baselines to attribute the network predictions to the input. We show that attributions differ substantially when considering different baselines, as they correspond to answering different science questions. We conclude by discussing some important implications and considerations about the use of baselines in XAI research.
    SimLDA: A tool for topic model evaluation. (arXiv:2208.09299v1 [cs.LG])
    Variational Bayes (VB) applied to latent Dirichlet allocation (LDA) has become the most popular algorithm for aspect modeling. While sufficiently successful in text topic extraction from large corpora, VB is less successful in identifying aspects in the presence of limited data. We present a novel variational message passing algorithm as applied to Latent Dirichlet Allocation (LDA) and compare it with the gold standard VB and collapsed Gibbs sampling. In situations where marginalisation leads to non-conjugate messages, we use ideas from sampling to derive approximate update equations. In cases where conjugacy holds, Loopy Belief update (LBU) (also known as Lauritzen-Spiegelhalter) is used. Our algorithm, ALBU (approximate LBU), has strong similarities with Variational Message Passing (VMP) (which is the message passing variant of VB). To compare the performance of the algorithms in the presence of limited data, we use data sets consisting of tweets and news groups. Using coherence measures we show that ALBU learns latent distributions more accurately than does VB, especially for smaller data sets.
    Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise. (arXiv:2208.09392v1 [cs.CV])
    Standard diffusion models involve an image transform -- adding Gaussian noise -- and an image restoration operator that inverts this degradation. We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact an entire family of generative models can be constructed by varying this choice. Even when using completely deterministic degradations (e.g., blur, masking, and more), the training and test-time update rules that underlie diffusion models can be easily generalized to create generative models. The success of these fully deterministic models calls into question the community's understanding of diffusion models, which relies on noise in either gradient Langevin dynamics or variational inference, and paves the way for generalized diffusion models that invert arbitrary processes. Our code is available at https://github.com/arpitbansal297/Cold-Diffusion-Models
    Reproducibility Report: Contrastive Learning of Socially-aware Motion Representations. (arXiv:2208.09284v1 [cs.CV])
    The following paper is a reproducibility report for "Social NCE: Contrastive Learning of Socially-aware Motion Representations" {\cite{liu2020snce}} published in ICCV 2021 as part of the ML Reproducibility Challenge 2021. The original code was made available by the author \footnote{\href{https://github.com/vita-epfl/social-nce}{https://github.com/vita-epfl/social-nce}}. We attempted to verify the results claimed by the authors and reimplemented their code in PyTorch Lightning.  ( 2 min )
    Graph Convolutional Networks from the Perspective of Sheaves and the Neural Tangent Kernel. (arXiv:2208.09309v1 [cs.LG])
    Graph convolutional networks are a popular class of deep neural network algorithms which have shown success in a number of relational learning tasks. Despite their success, graph convolutional networks exhibit a number of peculiar features, including a bias towards learning oversmoothed and homophilic functions, which are not easily diagnosed due to the complex nature of these algorithms. We propose to bridge this gap in understanding by studying the neural tangent kernel of sheaf convolutional networks--a topological generalization of graph convolutional networks. To this end, we derive a parameterization of the neural tangent kernel for sheaf convolutional networks which separates the function into two parts: one driven by a forward diffusion process determined by the graph, and the other determined by the composite effect of nodes' activations on the output layer. This geometrically-focused derivation produces a number of immediate insights which we discuss in detail.
    Disentangled Representation with Causal Constraints for Counterfactual Fairness. (arXiv:2208.09147v1 [cs.LG])
    Much research has been devoted to the problem of learning fair representations; however, they do not explicitly the relationship between latent representations. In many real-world applications, there may be causal relationships between latent representations. Furthermore, most fair representation learning methods focus on group-level fairness and are based on correlations, ignoring the causal relationships underlying the data. In this work, we theoretically demonstrate that using the structured representations enable downstream predictive models to achieve counterfactual fairness, and then we propose the Counterfactual Fairness Variational AutoEncoder (CF-VAE) to obtain structured representations with respect to domain knowledge. The experimental results show that the proposed method achieves better fairness and accuracy performance than the benchmark fairness methods.  ( 2 min )
    Discovering Faint and High Apparent Motion Rate Near-Earth Asteroids Using A Deep Learning Program. (arXiv:2208.09098v1 [astro-ph.IM])
    Although many near-Earth objects have been found by ground-based telescopes, some fast-moving ones, especially those near detection limits, have been missed by observatories. We developed a convolutional neural network for detecting faint fast-moving near-Earth objects. It was trained with artificial streaks generated from simulations and was able to find these asteroid streaks with an accuracy of 98.7% and a false positive rate of 0.02% on simulated data. This program was used to search image data from the Zwicky Transient Facility (ZTF) in four nights in 2019, and it identified six previously undiscovered asteroids. The visual magnitudes of our detections range from ~19.0 - 20.3 and motion rates range from ~6.8 - 24 deg/day, which is very faint compared to other ZTF detections moving at similar motion rates. Our asteroids are also ~1 - 51 m diameter in size and ~5 - 60 lunar distances away at close approach, assuming their albedo values follow the albedo distribution function of known asteroids. The use of a purely simulated dataset to train our model enables the program to gain sensitivity in detecting faint and fast-moving objects while still being able to recover nearly all discoveries made by previously designed neural networks which used real detections to train neural networks. Our approach can be adopted by any observatory for detecting fast-moving asteroid streaks.  ( 3 min )
    Unified Policy Optimization for Continuous-action Reinforcement Learning in Non-stationary Tasks and Games. (arXiv:2208.09452v1 [cs.LG])
    This paper addresses policy learning in non-stationary environments and games with continuous actions. Rather than the classical reward maximization mechanism, inspired by the ideas of follow-the-regularized-leader (FTRL) and mirror descent (MD) update, we propose a no-regret style reinforcement learning algorithm PORL for continuous action tasks. We prove that PORL has a last-iterate convergence guarantee, which is important for adversarial and cooperative games. Empirical studies show that, in stationary environments such as MuJoCo locomotion controlling tasks, PORL performs equally well as, if not better than, the soft actor-critic (SAC) algorithm; in non-stationary environments including dynamical environments, adversarial training, and competitive games, PORL is superior to SAC in both a better final policy performance and a more stable training process.
    Demystifying Randomly Initialized Networks for Evaluating Generative Models. (arXiv:2208.09218v1 [cs.LG])
    Evaluation of generative models is mostly based on the comparison between the estimated distribution and the ground truth distribution in a certain feature space. To embed samples into informative features, previous works often use convolutional neural networks optimized for classification, which is criticized by recent studies. Therefore, various feature spaces have been explored to discover alternatives. Among them, a surprising approach is to use a randomly initialized neural network for feature embedding. However, the fundamental basis to employ the random features has not been sufficiently justified. In this paper, we rigorously investigate the feature space of models with random weights in comparison to that of trained models. Furthermore, we provide an empirical evidence to choose networks for random features to obtain consistent and reliable results. Our results indicate that the features from random networks can evaluate generative models well similarly to those from trained networks, and furthermore, the two types of features can be used together in a complementary way.  ( 2 min )
    Classification Performance Metric Elicitation and its Applications. (arXiv:2208.09142v1 [stat.ML])
    Given a learning problem with real-world tradeoffs, which cost function should the model be trained to optimize? This is the metric selection problem in machine learning. Despite its practical interest, there is limited formal guidance on how to select metrics for machine learning applications. This thesis outlines metric elicitation as a principled framework for selecting the performance metric that best reflects implicit user preferences. Once specified, the evaluation metric can be used to compare and train models. In this manuscript, we formalize the problem of Metric Elicitation and devise novel strategies for eliciting classification performance metrics using pairwise preference feedback over classifiers. Specifically, we provide novel strategies for eliciting linear and linear-fractional metrics for binary and multiclass classification problems, which are then extended to a framework that elicits group-fair performance metrics in the presence of multiple sensitive groups. All the elicitation strategies that we discuss are robust to both finite sample and feedback noise, thus are useful in practice for real-world applications. Using the tools and the geometric characterizations of the feasible confusion statistics sets from the binary, multiclass, and multiclass-multigroup classification setups, we further provide strategies to elicit from a wider range of complex, modern multiclass metrics defined by quadratic functions of confusion statistics by exploiting their local linear structure. From application perspective, we also propose to use the metric elicitation framework in optimizing complex black box metrics that is amenable to deep network training. Lastly, to bring theory closer to practice, we conduct a preliminary real-user study that shows the efficacy of the metric elicitation framework in recovering the users' preferred performance metric in a binary classification setup.  ( 3 min )
    SAFARI: Versatile and Efficient Evaluations for Robustness of Interpretability. (arXiv:2208.09418v1 [cs.LG])
    Interpretability of Deep Learning (DL) models is arguably the barrier in front of trustworthy AI. Despite great efforts made by the Explainable AI (XAI) community, explanations lack robustness--indistinguishable input perturbations may lead to different XAI results. Thus, it is vital to assess how robust DL interpretability is, given an XAI technique. To this end, we identify the following challenges that state-of-the-art is unable to cope with collectively: i) XAI techniques are highly heterogeneous; ii) misinterpretations are normally rare events; iii) both worst-case and overall robustness are of practical interest. In this paper, we propose two evaluation methods to tackle them--i) they are of black-box nature, based on Genetic Algorithm (GA) and Subset Simulation (SS); ii) bespoke fitness functions are used by GA to solve a constrained optimisation efficiently, while SS is dedicated to estimating rare event probabilities; iii) two diverse metrics are introduced, concerning the worst-case interpretation discrepancy and a probabilistic notion of \textit{how} robust in general, respectively. We conduct experiments to study the accuracy, sensitivity and efficiency of our methods that outperform state-of-the-arts. Finally, we show two applications of our methods for ranking robust XAI methods and selecting training schemes to improve both classification and interpretation robustness.
    Atomistic structure search using local surrogate mode. (arXiv:2208.09273v1 [physics.chem-ph])
    We describe a local surrogate model for use in conjunction with global structure search methods. The model follows the Gaussian approximation potential (GAP) formalism and is based on a the smooth overlap of atomic positions descriptor with sparsification in terms of a reduced number of local environments using mini-batch $k$-means. The model is implemented in the Atomistic Global Optimization X framework and used as a partial replacement of the local relaxations in basin hopping structure search. The approach is shown to be robust for a wide range of atomistic system including molecules, nano-particles, surface supported clusters and surface thin films. The benefits in a structure search context of a local surrogate model are demonstrated. This includes the ability to transfer learning from smaller systems as well as the possibility to perform concurrent multi-stoichiometry searches.
    Real-Time Robust Video Object Detection System Against Physical-World Adversarial Attacks. (arXiv:2208.09195v1 [cs.CV])
    DNN-based video object detection (VOD) powers autonomous driving and video surveillance industries with rising importance and promising opportunities. However, adversarial patch attack yields huge concern in live vision tasks because of its practicality, feasibility, and powerful attack effectiveness. This work proposes Themis, a software/hardware system to defend against adversarial patches for real-time robust video object detection. We observe that adversarial patches exhibit extremely localized superficial feature importance in a small region with non-robust predictions, and thus propose the adversarial region detection algorithm for adversarial effect elimination. Themis also proposes a systematic design to efficiently support the algorithm by eliminating redundant computations and memory traffics. Experimental results show that the proposed methodology can effectively recover the system from the adversarial attack with negligible hardware overhead.
    Towards Daily High-resolution Inundation Observations using Deep Learning and EO. (arXiv:2208.09135v1 [physics.geo-ph])
    Satellite remote sensing presents a cost-effective solution for synoptic flood monitoring, and satellite-derived flood maps provide a computationally efficient alternative to numerical flood inundation models traditionally used. While satellites do offer timely inundation information when they happen to cover an ongoing flood event, they are limited by their spatiotemporal resolution in terms of their ability to dynamically monitor flood evolution at various scales. Constantly improving access to new satellite data sources as well as big data processing capabilities has unlocked an unprecedented number of possibilities in terms of data-driven solutions to this problem. Specifically, the fusion of data from satellites, such as the Copernicus Sentinels, which have high spatial and low temporal resolution, with data from NASA SMAP and GPM missions, which have low spatial but high temporal resolutions could yield high-resolution flood inundation at a daily scale. Here a Convolutional-Neural-Network is trained using flood inundation maps derived from Sentinel-1 Synthetic Aperture Radar and various hydrological, topographical, and land-use based predictors for the first time, to predict high-resolution probabilistic maps of flood inundation. The performance of UNet and SegNet model architectures for this task is evaluated, using flood masks derived from Sentinel-1 and Sentinel-2, separately with 95 percent-confidence intervals. The Area under the Curve (AUC) of the Precision Recall Curve (PR-AUC) is used as the main evaluation metric, due to the inherently imbalanced nature of classes in a binary flood mapping problem, with the best model delivering a PR-AUC of 0.85.  ( 3 min )
    A Review of Uncertainty for Deep Reinforcement Learning. (arXiv:2208.09052v1 [cs.LG])
    Uncertainty is ubiquitous in games, both in the agents playing games and often in the games themselves. Working with uncertainty is therefore an important component of successful deep reinforcement learning agents. While there has been substantial effort and progress in understanding and working with uncertainty for supervised learning, the body of literature for uncertainty aware deep reinforcement learning is less developed. While many of the same problems regarding uncertainty in neural networks for supervised learning remain for reinforcement learning, there are additional sources of uncertainty due to the nature of an interactable environment. In this work, we provide an overview motivating and presenting existing techniques in uncertainty aware deep reinforcement learning. These works show empirical benefits on a variety of reinforcement learning tasks. This work serves to help to centralize the disparate results and promote future research in this area.  ( 2 min )
    Mitigating Disparity while Maximizing Reward: Tight Anytime Guarantee for Improving Bandits. (arXiv:2208.09254v1 [cs.LG])
    We study the Improving Multi-Armed Bandit (IMAB) problem, where the reward obtained from an arm increases with the number of pulls it receives. This model provides an elegant abstraction for many real-world problems in domains such as education and employment, where decisions about the distribution of opportunities can affect the future capabilities of communities and the disparity between them. A decision-maker in such settings must consider the impact of her decisions on future rewards in addition to the standard objective of maximizing her cumulative reward at any time. In many of these applications, the time horizon is unknown to the decision-maker beforehand, which motivates the study of the IMAB problem in the technically more challenging horizon-unaware setting. We study the tension that arises between two seemingly conflicting objectives in the horizon-unaware setting: a) maximizing the cumulative reward at any time based on current rewards of the arms, and b) ensuring that arms with better long-term rewards get sufficient opportunities even if they initially have low rewards. We show that, surprisingly, the two objectives are aligned with each other in this setting. Our main contribution is an anytime algorithm for the IMAB problem that achieves the best possible cumulative reward while ensuring that the arms reach their true potential given sufficient time. Our algorithm mitigates the initial disparity due to lack of opportunity and continues pulling an arm till it stops improving. We prove the optimality of our algorithm by showing that a) any algorithm for the IMAB problem, no matter how utilitarian, must suffer $\Omega(T)$ policy regret and $\Omega(k)$ competitive ratio with respect to the optimal offline policy, and b) the competitive ratio of our algorithm is $O(k)$.
    A Physics-informed Deep Learning Approach for Minimum Effort Stochastic Control of Colloidal Self-Assembly. (arXiv:2208.09182v1 [math.OC])
    We propose formulating the finite-horizon stochastic optimal control problem for colloidal self-assembly in the space of probability density functions (PDFs) of the underlying state variables (namely, order parameters). The control objective is formulated in terms of steering the state PDFs from a prescribed initial probability measure towards a prescribed terminal probability measure with minimum control effort. For specificity, we use a univariate stochastic state model from the literature. Both the analysis and the computational steps for control synthesis as developed in this paper generalize for multivariate stochastic state dynamics given by generic nonlinear in state and non-affine in control models. We derive the conditions of optimality for the associated optimal control problem. This derivation yields a system of three coupled partial differential equations together with the boundary conditions at the initial and terminal times. The resulting system is a generalized instance of the so-called Schr\"{o}dinger bridge problem. We then determine the optimal control policy by training a physics-informed deep neural network, where the "physics" are the derived conditions of optimality. The performance of the proposed solution is demonstrated via numerical simulations on a benchmark colloidal self-assembly problem.
    Learn to Detect and Detect to Learn: Structure Learning and Decision Feedback for MIMO-OFDM Receive Processing. (arXiv:2208.09287v1 [eess.SP])
    One of the major open challenges in MIMO-OFDM receive processing is how to efficiently and effectively utilize the extremely limited over-the-air pilot symbols to detect the transmitted data symbols. Recent advances have been devoted to investigating effective ways to utilize the limited pilots. However, we notice that besides exploiting the pilots, one can take advantage of the data symbols to improve the detection performance. Thus, this paper introduces an online subframe-based approach, namely RC-StructNet, that can efficiently learn from the precious pilot symbols and be dynamically updated with the detected payload data using the decision feedback (DF) approach. The network consists of a reservoir computing (RC) module in the time domain and a neural network StructNet in the frequency domain. The unique design of the network allows it to be dynamically updated with the changes of the channel by learning from the detected data symbols. Experiments demonstrate the effectiveness of RC-StructNet in detection under dynamic transmission modes and in reducing the training overhead requirement when taking the DF approach.
    An Unsupervised Short- and Long-Term Mask Representation for Multivariate Time Series Anomaly Detection. (arXiv:2208.09240v1 [cs.LG])
    Anomaly detection of multivariate time series is meaningful for system behavior monitoring. This paper proposes an anomaly detection method based on unsupervised Short- and Long-term Mask Representation learning (SLMR). The main idea is to extract short-term local dependency patterns and long-term global trend patterns of the multivariate time series by using multi-scale residual dilated convolution and Gated Recurrent Unit(GRU) respectively. Furthermore, our approach can comprehend temporal contexts and feature correlations by combining spatial-temporal masked self-supervised representation learning and sequence split. It considers the importance of features is different, and we introduce the attention mechanism to adjust the contribution of each feature. Finally, a forecasting-based model and a reconstruction-based model are integrated to focus on single timestamp prediction and latent representation of time series. Experiments show that the performance of our method outperforms other state-of-the-art models on three real-world datasets. Further analysis shows that our method is good at interpretability.
    Cross-Domain Evaluation of a Deep Learning-Based Type Inference System. (arXiv:2208.09189v1 [cs.SE])
    Optional type annotations allow for enriching dynamic programming languages with static typing features like better Integrated Development Environment (IDE) support, more precise program analysis, and early detection and prevention of type-related runtime errors. Machine learning-based type inference promises interesting results for automating this task. However, the practical usage of such systems depends on their ability to generalize across different domains, as they are often applied outside their training domain. In this work, we investigate the generalization ability of Type4Py as a representative for state-of-the-art deep learning-based type inference systems, by conducting extensive cross-domain experiments. Thereby, we address the following problems: dataset shifts, out-of-vocabulary words, unknown classes, and rare classes. To perform such experiments, we use the datasets ManyTypes4Py and CrossDomainTypes4Py. The latter we introduce in this paper. Our dataset has over 1,000,000 type annotations and enables cross-domain evaluation of type inference systems in different domains of software projects using data from the two domains web development and scientific calculation. Through our experiments, we detect shifts in the dataset and that it has a long-tailed distribution with many rare and unknown data types which decreases the performance of the deep learning-based type inference system drastically. In this context, we test unsupervised domain adaptation methods and fine-tuning to overcome the issues. Moreover, we investigate the impact of out-of-vocabulary words.
    Diffusion-based Time Series Imputation and Forecasting with Structured State Space Models. (arXiv:2208.09399v1 [cs.LG])
    The imputation of missing values represents a significant obstacle for many real-world data analysis pipelines. Here, we focus on time series data and put forward SSSD, an imputation model that relies on two emerging technologies, (conditional) diffusion models as state-of-the-art generative models and structured state space models as internal model architecture, which are particularly suited to capture long-term dependencies in time series data. We demonstrate that SSSD matches or even exceeds state-of-the-art probabilistic imputation and forecasting performance on a broad range of data sets and different missingness scenarios, including the challenging blackout-missing scenarios, where prior approaches failed to provide meaningful results.
    Journal Impact Factor and Peer Review Thoroughness and Helpfulness: A Supervised Machine Learning Study. (arXiv:2207.09821v3 [cs.DL] UPDATED)
    The journal impact factor (JIF) is often equated with journal quality and the quality of the peer review of the papers submitted to the journal. We examined the association between the content of peer review and JIF by analysing 10,000 peer review reports submitted to 1,644 medical and life sciences journals. Two researchers hand-coded a random sample of 2,000 sentences. We then trained machine learning models to classify all 187,240 sentences as contributing or not contributing to content categories. We examined the association between ten groups of journals defined by JIF deciles and the content of peer reviews using linear mixed-effects models, adjusting for the length of the review. The JIF ranged from 0.21 to 74.70. The length of peer reviews increased from the lowest (median number of words 185) to the JIF group (387 words). The proportion of sentences allocated to different content categories varied widely, even within JIF groups. For thoroughness, sentences on 'Materials and Methods' were more common in the highest JIF journals than in the lowest JIF group (difference of 7.8 percentage points; 95% CI 4.9 to 10.7%). The trend for 'Presentation and Reporting' went in the opposite direction, with the highest JIF journals giving less emphasis to such content (difference -8.9%; 95% CI -11.3 to -6.5%). For helpfulness, reviews for higher JIF journals devoted less attention to 'Suggestion and Solution' and provided fewer Examples than lower impact factor journals. No, or only small differences were evident for other content categories. In conclusion, peer review in journals with higher JIF tends to be more thorough in discussing the methods used but less helpful in terms of suggesting solutions and providing examples. Differences were modest and variability high, indicating that the JIF is a bad predictor for the quality of peer review of an individual manuscript.
    Graph-Augmented Cyclic Learning Framework for Similarity Estimation of Medical Clinical Notes. (arXiv:2208.09437v1 [cs.CL])
    Semantic textual similarity (STS) in the clinical domain helps improve diagnostic efficiency and produce concise texts for downstream data mining tasks. However, given the high degree of domain knowledge involved in clinic text, it remains challenging for general language models to infer implicit medical relationships behind clinical sentences and output similarities correctly. In this paper, we present a graph-augmented cyclic learning framework for similarity estimation in the clinical domain. The framework can be conveniently implemented on a state-of-art backbone language model, and improve its performance by leveraging domain knowledge through co-training with an auxiliary graph convolution network (GCN) based network. We report the success of introducing domain knowledge in GCN and the co-training framework by improving the Bio-clinical BERT baseline by 16.3% and 27.9%, respectively.
    Proposal-Free Temporal Action Detection via Global Segmentation Mask Learning. (arXiv:2207.06580v2 [cs.CV] UPDATED)
    Existing temporal action detection (TAD) methods rely on generating an overwhelmingly large number of proposals per video. This leads to complex model designs due to proposal generation and/or per-proposal action instance evaluation and the resultant high computational cost. In this work, for the first time, we propose a proposal-free Temporal Action detection model with Global Segmentation mask (TAGS). Our core idea is to learn a global segmentation mask of each action instance jointly at the full video length. The TAGS model differs significantly from the conventional proposal-based methods by focusing on global temporal representation learning to directly detect local start and end points of action instances without proposals. Further, by modeling TAD holistically rather than locally at the individual proposal level, TAGS needs a much simpler model architecture with lower computational cost. Extensive experiments show that despite its simpler design, TAGS outperforms existing TAD methods, achieving new state-of-the-art performance on two benchmarks. Importantly, it is ~ 20x faster to train and ~1.6x more efficient for inference. Our PyTorch implementation of TAGS is available at https://github.com/sauradip/TAGS .
    Pessimistic Off-Policy Optimization for Learning to Rank. (arXiv:2206.02593v2 [cs.LG] UPDATED)
    Off-policy learning is a framework for optimizing policies without deploying them, using data collected by another policy. In recommender systems, this is especially challenging due to the imbalance in logged data: some items are recommended and thus logged more frequently than others. This is further perpetuated when recommending a list of items, as the action space is combinatorial. To address this challenge, we study pessimistic off-policy optimization for learning to rank. The key idea is to compute lower confidence bounds on parameters of click models and then return the list with the highest pessimistic estimate of its value. This approach is computationally efficient and we analyze it. We study its Bayesian and frequentist variants, and overcome the limitation of unknown prior by incorporating empirical Bayes. To show the empirical effectiveness of our approach, we compare it to off-policy optimizers that use inverse propensity scores or neglect uncertainty. Our approach outperforms all baselines, is robust, and is also general.
    Unbox the Blackbox: Predict and Interpret YouTube Viewership Using Deep Learning. (arXiv:2101.01076v6 [cs.LG] UPDATED)
    Predicting video viewership is a top priority for content creators and video-sharing sites. Content creators live on such predictions to maximize influences and minimize budgets. Video-sharing sites rely on this prediction to promote credible videos and curb violative videos. Although deep learning champions viewership prediction, it lacks interpretability, which is fundamental to increasing the adoption of predictive models and prescribing measurements to improve viewership. Following the design-science paradigm, we propose a novel interpretable IT system, Precise Wide and Deep Learning (PrecWD), to precisely interpret viewership prediction. Improving upon state-of-the-art frameworks, PrecWD offers precise feature effects and designs an unstructured component. PrecWD outperforms benchmarks in two contexts: health video viewership prediction and misinformation viewership prediction. A user study confirms the superior interpretability of PrecWD. This study contributes to IS design theory with generalizable design principles and an interpretable predictive framework. Our findings provide implications to improve video viewership and credibility.
    Federated Learning with Noisy Labels. (arXiv:2208.09378v1 [cs.LG])
    Federated Learning (FL) is a distributed machine learning paradigm that enables learning models from decentralized private datasets, where the labeling effort is entrusted to the clients. While most existing FL approaches assume high-quality labels are readily available on users' devices; in reality, label noise can naturally occur in FL and follows a non-i.i.d. distribution among clients. Due to the non-iid-ness challenges, existing state-of-the-art centralized approaches exhibit unsatisfactory performance, while previous FL studies rely on data exchange or repeated server-side aid to improve model's performance. Here, we propose FedLN, a framework to deal with label noise across different FL training stages; namely, FL initialization, on-device model training, and server model aggregation. Specifically, FedLN computes per-client noise-level estimation in a single federated round and improves the models' performance by correcting (or limiting the effect of) noisy samples. Extensive experiments on various publicly available vision and audio datasets demonstrate a 24% improvement on average compared to other existing methods for a label noise level of 70%. We further validate the efficiency of FedLN in human-annotated real-world noisy datasets and report a 9% increase on average in models' recognition rate, highlighting that FedLN can be useful for improving FL services provided to everyday users.
    Ginex: SSD-enabled Billion-scale Graph Neural Network Training on a Single Machine via Provably Optimal In-memory Caching. (arXiv:2208.09151v1 [cs.LG])
    Recently, Graph Neural Networks (GNNs) have been receiving a spotlight as a powerful tool that can effectively serve various inference tasks on graph structured data. As the size of real-world graphs continues to scale, the GNN training system faces a scalability challenge. Distributed training is a popular approach to address this challenge by scaling out CPU nodes. However, not much attention has been paid to disk-based GNN training, which can scale up the single-node system in a more cost-effective manner by leveraging high-performance storage devices like NVMe SSDs. We observe that the data movement between the main memory and the disk is the primary bottleneck in the SSD-based training system, and that the conventional GNN training pipeline is sub-optimal without taking this overhead into account. Thus, we propose Ginex, the first SSD-based GNN training system that can process billion-scale graph datasets on a single machine. Inspired by the inspector-executor execution model in compiler optimization, Ginex restructures the GNN training pipeline by separating sample and gather stages. This separation enables Ginex to realize a provably optimal replacement algorithm, known as Belady's algorithm, for caching feature vectors in memory, which account for the dominant portion of I/O accesses. According to our evaluation with four billion-scale graph datasets, Ginex achieves 2.11x higher training throughput on average (up to 2.67x at maximum) than the SSD-extended PyTorch Geometric.
    DAFT: Distilling Adversarially Fine-tuned Models for Better OOD Generalization. (arXiv:2208.09139v1 [cs.LG])
    We consider the problem of OOD generalization, where the goal is to train a model that performs well on test distributions that are different from the training distribution. Deep learning models are known to be fragile to such shifts and can suffer large accuracy drops even for slightly different test distributions. We propose a new method - DAFT - based on the intuition that adversarially robust combination of a large number of rich features should provide OOD robustness. Our method carefully distills the knowledge from a powerful teacher that learns several discriminative features using standard training while combining them using adversarial training. The standard adversarial training procedure is modified to produce teachers which can guide the student better. We evaluate DAFT on standard benchmarks in the DomainBed framework, and demonstrate that DAFT achieves significant improvements over the current state-of-the-art OOD generalization methods. DAFT consistently out-performs well-tuned ERM and distillation baselines by up to 6%, with more pronounced gains for smaller networks.
    Nonlinear Optical Data Transformer for Machine Learning. (arXiv:2208.09398v1 [physics.optics])
    Modern machine learning models use an ever-increasing number of parameters to train (175 billion parameters for GPT-3) with large datasets to obtain better performance. Bigger is better has been the norm. Optical computing has been reawakened as a potential solution to large-scale computing through optical accelerators that carry out linear operations while reducing electrical power. However, to achieve efficient computing with light, creating and controlling nonlinearity optically rather than electronically remains a challenge. This study explores a reservoir computing (RC) approach whereby a 14 mm long few-mode waveguide in LiNbO3 on insulator is used as a complex nonlinear optical processor. A dataset is encoded digitally on the spectrum of a femtosecond pulse which is then launched in the waveguide. The output spectrum depends nonlinearly on the input. We experimentally show that a simple digital linear classifier with 784 parameters using the output spectrum from the waveguide as input increased the classification accuracy of several databases compared to non-transformed data, approximately 10$\%$. In comparison, a deep digital neural network (NN) with 40000 parameters was necessary to achieve the same accuracy. Reducing the number of parameters by a factor of $\sim$50 illustrates that a compact optical RC approach can perform on par with a deep digital NN.
    FP8 Quantization: The Power of the Exponent. (arXiv:2208.09225v1 [cs.LG])
    When quantizing neural networks for efficient inference, low-bit integers are the go-to format for efficiency. However, low-bit floating point numbers have an extra degree of freedom, assigning some bits to work on an exponential scale instead. This paper in-depth investigates this benefit of the floating point format for neural network inference. We detail the choices that can be made for the FP8 format, including the important choice of the number of bits for the mantissa and exponent, and show analytically in which settings these choices give better performance. Then we show how these findings translate to real networks, provide an efficient implementation for FP8 simulation, and a new algorithm that enables the learning of both the scale parameters and the number of exponent bits in the FP8 format. Our chief conclusion is that when doing post-training quantization for a wide range of networks, the FP8 format is better than INT8 in terms of accuracy, and the choice of the number of exponent bits is driven by the severity of outliers in the network. We also conduct experiments with quantization-aware training where the difference in formats disappears as the network is trained to reduce the effect of outliers.
    Background Invariance Testing According to Semantic Proximity. (arXiv:2208.09286v1 [cs.CV])
    In many applications, machine learned (ML) models are required to hold some invariance qualities, such as rotation, size, intensity, and background invariance. Unlike many types of variance, the variants of background scenes cannot be ordered easily, which makes it difficult to analyze the robustness and biases of the models concerned. In this work, we present a technical solution for ordering background scenes according to their semantic proximity to a target image that contains a foreground object being tested. We make use of the results of object recognition as the semantic description of each image, and construct an ontology for storing knowledge about relationships among different objects using association analysis. This ontology enables (i) efficient and meaningful search for background scenes of different semantic distances to a target image, (ii) quantitative control of the distribution and sparsity of the sampled background scenes, and (iii) quality assurance using visual representations of invariance testing results (referred to as variance matrices). In this paper, we also report the training of an ML4ML assessor to evaluate the invariance quality of ML models automatically.
    GraTO: Graph Neural Network Framework Tackling Over-smoothing with Neural Architecture Search. (arXiv:2208.09027v1 [cs.LG])
    Current Graph Neural Networks (GNNs) suffer from the over-smoothing problem, which results in indistinguishable node representations and low model performance with more GNN layers. Many methods have been put forward to tackle this problem in recent years. However, existing tackling over-smoothing methods emphasize model performance and neglect the over-smoothness of node representations. Additional, different approaches are applied one at a time, while there lacks an overall framework to jointly leverage multiple solutions to the over-smoothing challenge. To solve these problems, we propose GraTO, a framework based on neural architecture search to automatically search for GNNs architecture. GraTO adopts a novel loss function to facilitate striking a balance between model performance and representation smoothness. In addition to existing methods, our search space also includes DropAttribute, a novel scheme for alleviating the over-smoothing challenge, to fully leverage diverse solutions. We conduct extensive experiments on six real-world datasets to evaluate GraTo, which demonstrates that GraTo outperforms baselines in the over-smoothing metrics and achieves competitive performance in accuracy. GraTO is especially effective and robust with increasing numbers of GNN layers. Further experiments bear out the quality of node representations learned with GraTO and the effectiveness of model architecture. We make cide of GraTo available at Github (\url{https://github.com/fxsxjtu/GraTO}).  ( 3 min )
    Feature Selection Enhancement and Feature Space Visualization for Speech-Based Emotion Recognition. (arXiv:2208.09269v1 [eess.SP])
    Robust speech emotion recognition relies on the quality of the speech features. We present speech features enhancement strategy that improves speech emotion recognition. We used the INTERSPEECH 2010 challenge feature-set. We identified subsets from the features set and applied Principle Component Analysis to the subsets. Finally, the features are fused horizontally. The resulting feature set is analyzed using t-distributed neighbour embeddings (t-SNE) before the application of features for emotion recognition. The method is compared with the state-of-the-art methods used in the literature. The empirical evidence is drawn using two well-known datasets: Emotional Speech Dataset (EMO-DB) and Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) for two languages, German and English, respectively. Our method achieved an average recognition gain of 11.5\% for six out of seven emotions for the EMO-DB dataset, and 13.8\% for seven out of eight emotions for the RAVDESS dataset as compared to the baseline study.
    A Causality-Based Learning Approach for Discovering the Underlying Dynamics of Complex Systems from Partial Observations with Stochastic Parameterization. (arXiv:2208.09104v1 [math.DS])
    Discovering the underlying dynamics of complex systems from data is an important practical topic. Constrained optimization algorithms are widely utilized and lead to many successes. Yet, such purely data-driven methods may bring about incorrect physics in the presence of random noise and cannot easily handle the situation with incomplete data. In this paper, a new iterative learning algorithm for complex turbulent systems with partial observations is developed that alternates between identifying model structures, recovering unobserved variables, and estimating parameters. First, a causality-based learning approach is utilized for the sparse identification of model structures, which takes into account certain physics knowledge that is pre-learned from data. It has unique advantages in coping with indirect coupling between features and is robust to the stochastic noise. A practical algorithm is designed to facilitate the causal inference for high-dimensional systems. Next, a systematic nonlinear stochastic parameterization is built to characterize the time evolution of the unobserved variables. Closed analytic formula via an efficient nonlinear data assimilation is exploited to sample the trajectories of the unobserved variables, which are then treated as synthetic observations to advance a rapid parameter estimation. Furthermore, the localization of the state variable dependence and the physics constraints are incorporated into the learning procedure, which mitigate the curse of dimensionality and prevent the finite time blow-up issue. Numerical experiments show that the new algorithm succeeds in identifying the model structure and providing suitable stochastic parameterizations for many complex nonlinear systems with chaotic dynamics, spatiotemporal multiscale structures, intermittency, and extreme events.  ( 3 min )
    A Risk-Sensitive Approach to Policy Optimization. (arXiv:2208.09106v1 [cs.LG])
    Standard deep reinforcement learning (DRL) aims to maximize expected reward, considering collected experiences equally in formulating a policy. This differs from human decision-making, where gains and losses are valued differently and outlying outcomes are given increased consideration. It also fails to capitalize on opportunities to improve safety and/or performance through the incorporation of distributional context. Several approaches to distributional DRL have been investigated, with one popular strategy being to evaluate the projected distribution of returns for possible actions. We propose a more direct approach whereby risk-sensitive objectives, specified in terms of the cumulative distribution function (CDF) of the distribution of full-episode rewards, are optimized. This approach allows for outcomes to be weighed based on relative quality, can be used for both continuous and discrete action spaces, and may naturally be applied in both constrained and unconstrained settings. We show how to compute an asymptotically consistent estimate of the policy gradient for a broad class of risk-sensitive objectives via sampling, subsequently incorporating variance reduction and regularization measures to facilitate effective on-policy learning. We then demonstrate that the use of moderately "pessimistic" risk profiles, which emphasize scenarios where the agent performs poorly, leads to enhanced exploration and a continual focus on addressing deficiencies. We test the approach using different risk profiles in six OpenAI Safety Gym environments, comparing to state of the art on-policy methods. Without cost constraints, we find that pessimistic risk profiles can be used to reduce cost while improving total reward accumulation. With cost constraints, they are seen to provide higher positive rewards than risk-neutral approaches at the prescribed allowable cost.  ( 3 min )
    Communication-Efficient Collaborative Best Arm Identification. (arXiv:2208.09029v1 [cs.LG])
    We investigate top-$m$ arm identification, a basic problem in bandit theory, in a multi-agent learning model in which agents collaborate to learn an objective function. We are interested in designing collaborative learning algorithms that achieve maximum speedup (compared to single-agent learning algorithms) using minimum communication cost, as communication is frequently the bottleneck in multi-agent learning. We give both algorithmic and impossibility results, and conduct a set of experiments to demonstrate the effectiveness of our algorithms.  ( 2 min )
    Scalable Multi-Agent Framework for Optimizing the Lab and Warehouse. (arXiv:2208.09099v1 [cs.MA])
    The field of autonomous physical science - where machine learning guides and learns from experiments in a closed-loop - is rapidly growing in importance. Autonomous systems allow scientists to fail smarter, learn faster, and spend less resources in their studies. The field promises improved performance for various facilities such as labs, research and development pipelines, and warehouses. As autonomous systems grow in number, capability, and complexity, a new challenge arises - how will these systems work together across large facilities? We explore one solution to this question - a multi-agent framework. We demonstrate a framework with 1) a simulated facility with realistic resource limits such as equipment use limits, 2) machine learning agents with diverse learning capabilities and goals, control over lab instruments, and the ability to run research campaigns, and 3) a network over which these agents can share knowledge and work together to achieve individual or collective goals. The framework is dubbed the MULTI-agent auTonomous fAcilities - a Scalable frameworK aka MULTITASK. MULTITASK allows facility-wide simulations including agent-instrument and agent-agent interactions. Framework modularity allows real-world autonomous spaces to come on-line in phases, with simulated instruments gradually replaced by real-world instruments. Here we demonstrate the framework with a real-world materials science challenge of materials exploration and optimization in a simulated materials lab. We hope the framework opens new areas of research in agent-based facility control scenarios such as agent-to-agent markets and economies, management and decision-making structures, communication and data-sharing structures, and optimization strategies for agents and facilities including those based on game theory.  ( 3 min )
    Out-of-distribution Detection via Frequency-regularized Generative Models. (arXiv:2208.09083v1 [cs.LG])
    Modern deep generative models can assign high likelihood to inputs drawn from outside the training distribution, posing threats to models in open-world deployments. While much research attention has been placed on defining new test-time measures of OOD uncertainty, these methods do not fundamentally change how deep generative models are regularized and optimized in training. In particular, generative models are shown to overly rely on the background information to estimate the likelihood. To address the issue, we propose a novel frequency-regularized learning FRL framework for OOD detection, which incorporates high-frequency information into training and guides the model to focus on semantically relevant features. FRL effectively improves performance on a wide range of generative architectures, including variational auto-encoder, GLOW, and PixelCNN++. On a new large-scale evaluation task, FRL achieves the state-of-the-art performance, outperforming a strong baseline Likelihood Regret by 10.7% (AUROC) while achieving 147$\times$ faster inference speed. Extensive ablations show that FRL improves the OOD detection performance while preserving the image generation quality. Code is available at https://github.com/mu-cai/FRL.  ( 2 min )
    Implicit Session Contexts for Next-Item Recommendations. (arXiv:2208.09076v1 [cs.IR])
    Session-based recommender systems capture the short-term interest of a user within a session. Session contexts (i.e., a user's high-level interests or intents within a session) are not explicitly given in most datasets, and implicitly inferring session context as an aggregation of item-level attributes is crude. In this paper, we propose ISCON, which implicitly contextualizes sessions. ISCON first generates implicit contexts for sessions by creating a session-item graph, learning graph embeddings, and clustering to assign sessions to contexts. ISCON then trains a session context predictor and uses the predicted contexts' embeddings to enhance the next-item prediction accuracy. Experiments on four datasets show that ISCON has superior next-item prediction accuracy than state-of-the-art models. A case study of ISCON on the Reddit dataset confirms that assigned session contexts are unique and meaningful.  ( 2 min )
    Superposing Many Tickets into One: A Performance Booster for Sparse Neural Network Training. (arXiv:2205.15322v3 [cs.LG] UPDATED)
    Recent works on sparse neural network training (sparse training) have shown that a compelling trade-off between performance and efficiency can be achieved by training intrinsically sparse neural networks from scratch. Existing sparse training methods usually strive to find the best sparse subnetwork possible in one single run, without involving any expensive dense or pre-training steps. For instance, dynamic sparse training (DST), is capable of reaching a competitive performance of dense training by iteratively evolving the sparse topology during the course of training. In this paper, we argue that it is better to allocate the limited resources to create multiple low-loss sparse subnetworks and superpose them into a stronger one, instead of allocating all resources entirely to find an individual subnetwork. To achieve this, two desiderata are required: (1) efficiently producing many low-loss subnetworks, the so-called cheap tickets, within one training process limited to the standard training time used in dense training; (2) effectively superposing these cheap tickets into one stronger subnetwork. To corroborate our conjecture, we present a novel sparse training approach, termed Sup-tickets, which can satisfy the above two desiderata concurrently in a single sparse-to-sparse training process. Across various modern architectures on CIFAR-10/100 and ImageNet, we show that Sup-tickets integrates seamlessly with the existing sparse training methods and demonstrates consistent performance improvement.
    Self-Supervised Primal-Dual Learning for Constrained Optimization. (arXiv:2208.09046v1 [cs.LG])
    This paper studies how to train machine-learning models that directly approximate the optimal solutions of constrained optimization problems. This is an empirical risk minimization under constraints, which is challenging as training must balance optimality and feasibility conditions. Supervised learning methods often approach this challenge by training the model on a large collection of pre-solved instances. This paper takes a different route and proposes the idea of Primal-Dual Learning (PDL), a self-supervised training method that does not require a set of pre-solved instances or an optimization solver for training and inference. Instead, PDL mimics the trajectory of an Augmented Lagrangian Method (ALM) and jointly trains primal and dual neural networks. Being a primal-dual method, PDL uses instance-specific penalties of the constraint terms in the loss function used to train the primal network. Experiments show that, on a set of nonlinear optimization benchmarks, PDL typically exhibits negligible constraint violations and minor optimality gaps, and is remarkably close to the ALM optimization. PDL also demonstrated improved or similar performance in terms of the optimality gaps, constraint violations, and training times compared to existing approaches.  ( 2 min )
    Is Monte Carlo a bad sampling strategy for learning smooth functions in high dimensions?. (arXiv:2208.09045v1 [math.NA])
    This paper concerns the approximation of smooth, high-dimensional functions from limited samples using polynomials. This task lies at the heart of many applications in computational science and engineering -- notably, those arising from parametric modelling and uncertainty quantification. It is common to use Monte Carlo (MC) sampling in such applications, so as not to succumb to the curse of dimensionality. However, it is well known this strategy is theoretically suboptimal. There are many polynomial spaces of dimension $n$ for which the sample complexity scales log-quadratically in $n$. This well-documented phenomenon has led to a concerted effort to design improved, in fact, near-optimal strategies, whose sample complexities scale log-linearly, or even linearly in $n$. Paradoxically, in this work we show that MC is actually a perfectly good strategy in high dimensions. We first document this phenomenon via several numerical examples. Next, we present a theoretical analysis that resolves this paradox for holomorphic functions of infinitely-many variables. We show that there is a least-squares scheme based on $m$ MC samples whose error decays algebraically fast in $m/\log(m)$, with a rate that is the same as that of the best $n$-term polynomial approximation. This result is non-constructive, since it assumes knowledge of a suitable polynomial space in which to perform the approximation. We next present a compressed sensing-based scheme that achieves the same rate, except for a larger polylogarithmic factor. This scheme is practical, and numerically it performs as well as or better than well-known adaptive least-squares schemes. Overall, our findings demonstrate that MC sampling is eminently suitable for smooth function approximation when the dimension is sufficiently high. Hence the benefits of improved sampling strategies are generically limited to lower-dimensional settings.  ( 3 min )
    Quantitative Universal Approximation Bounds for Deep Belief Networks. (arXiv:2208.09033v1 [stat.ML])
    We show that deep belief networks with binary hidden units can approximate any multivariate probability density under very mild integrability requirements on the parental density of the visible nodes. The approximation is measured in the $L^q$-norm for $q\in[1,\infty]$ ($q=\infty$ corresponding to the supremum norm) and in Kullback-Leibler divergence. Furthermore, we establish sharp quantitative bounds on the approximation error in terms of the number of hidden units.  ( 2 min )
    BanglaWriting: A multi-purpose offline Bangla handwriting dataset. (arXiv:2011.07499v3 [cs.CV] UPDATED)
    This article presents a Bangla handwriting dataset named BanglaWriting that contains single-page handwritings of 260 individuals of different personalities and ages. Each page includes bounding-boxes that bounds each word, along with the unicode representation of the writing. This dataset contains 21,234 words and 32,787 characters in total. Moreover, this dataset includes 5,470 unique words of Bangla vocabulary. Apart from the usual words, the dataset comprises 261 comprehensible overwriting and 450 handwritten strikes and mistakes. All of the bounding-boxes and word labels are manually-generated. The dataset can be used for complex optical character/word recognition, writer identification, handwritten word segmentation, and word generation. Furthermore, this dataset is suitable for extracting age-based and gender-based variation of handwriting.
    Improving Post-Processing of Audio Event Detectors Using Reinforcement Learning. (arXiv:2208.09201v1 [cs.SD])
    We apply post-processing to the class probability distribution outputs of audio event classification models and employ reinforcement learning to jointly discover the optimal parameters for various stages of a post-processing stack, such as the classification thresholds and the kernel sizes of median filtering algorithms used to smooth out model predictions. To achieve this we define a reinforcement learning environment where: 1) a state is the class probability distribution provided by the model for a given audio sample, 2) an action is the choice of a candidate optimal value for each parameter of the post-processing stack, 3) the reward is based on the classification accuracy metric we aim to optimize, which is the audio event-based macro F1-score in our case. We apply our post-processing to the class probability distribution outputs of two audio event classification models submitted to the DCASE Task4 2020 challenge. We find that by using reinforcement learning to discover the optimal per-class parameters for the post-processing stack that is applied to the outputs of audio event classification models, we can improve the audio event-based macro F1-score (the main metric used in the DCASE challenge to compare audio event classification accuracy) by 4-5% compared to using the same post-processing stack with manually tuned parameters.
    Improving Small Molecule Generation using Mutual Information Machine. (arXiv:2208.09016v1 [cs.LG])
    We address the task of controlled generation of small molecules, which entails finding novel molecules with desired properties under certain constraints (e.g., similarity to a reference molecule). Here we introduce MolMIM, a probabilistic auto-encoder for small molecule drug discovery that learns an informative and clustered latent space. MolMIM is trained with Mutual Information Machine (MIM) learning, and provides a fixed length representation of variable length SMILES strings. Since encoder-decoder models can learn representations with ``holes'' of invalid samples, here we propose a novel extension to the training procedure which promotes a dense latent space, and allows the model to sample valid molecules from random perturbations of latent codes. We provide a thorough comparison of MolMIM to several variable-size and fixed-size encoder-decoder models, demonstrating MolMIM's superior generation as measured in terms of validity, uniqueness, and novelty. We then utilize CMA-ES, a naive black-box and gradient free search algorithm, over MolMIM's latent space for the task of property guided molecule optimization. We achieve state-of-the-art results in several constrained single property optimization tasks as well as in the challenging task of multi-objective optimization, improving over previous success rate SOTA by more than 5\% . We attribute the strong results to MolMIM's latent representation which clusters similar molecules in the latent space, whereas CMA-ES is often used as a baseline optimization method. We also demonstrate MolMIM to be favourable in a compute limited regime, making it an attractive model for such cases.  ( 3 min )
    Treeformer: Dense Gradient Trees for Efficient Attention Computation. (arXiv:2208.09015v1 [cs.CL])
    Standard inference and training with transformer based architectures scale quadratically with input sequence length. This is prohibitively large for a variety of applications especially in web-page translation, query-answering etc. Consequently, several approaches have been developed recently to speedup attention computation by enforcing different attention structures such as sparsity, low-rank, approximating attention using kernels. In this work, we view attention computation as that of nearest neighbor retrieval, and use decision tree based hierarchical navigation to reduce the retrieval cost per query token from linear in sequence length to nearly logarithmic. Based on such hierarchical navigation, we design Treeformer which can use one of two efficient attention layers -- TF-Attention and TC-Attention. TF-Attention computes the attention in a fine-grained style, while TC-Attention is a coarse attention layer which also ensures that the gradients are "dense". To optimize such challenging discrete layers, we propose a two-level bootstrapped training method. Using extensive experiments on standard NLP benchmarks, especially for long-sequences, we demonstrate that our Treeformer architecture can be almost as accurate as baseline Transformer while using 30x lesser FLOPs in the attention layer. Compared to Linformer, the accuracy can be as much as 12% higher while using similar FLOPs in the attention layer.  ( 2 min )
    Automated Detection of Acute Lymphoblastic Leukemia Subtypes from Microscopic Blood Smear Images using Deep Neural Networks. (arXiv:2208.08992v1 [eess.IV])
    An estimated 300,000 new cases of leukemia are diagnosed each year which is 2.8 percent of all new cancer cases and the prevalence is rising day by day. The most dangerous and deadly type of leukemia is acute lymphoblastic leukemia (ALL), which affects people of all age groups, including children and adults. In this study, we propose an automated system to detect various-shaped ALL blast cells from microscopic blood smears images using Deep Neural Networks (DNN). The system can detect multiple subtypes of ALL cells with an accuracy of 98 percent. Moreover, we have developed a telediagnosis software to provide real-time support to diagnose ALL subtypes from microscopic blood smears images.  ( 2 min )
    Machine learning algorithms for three-dimensional mean-curvature computation in the level-set method. (arXiv:2208.09047v1 [cs.LG])
    We propose a data-driven mean-curvature solver for the level-set method. This work is the natural extension to $\mathbb{R}^3$ of our two-dimensional strategy in [arXiv:2201.12342][1] and the hybrid inference system of [DOI: 10.1016/j.jcp.2022.111291][2]. However, in contrast to [1,2], which built resolution-dependent neural-network dictionaries, here we develop a pair of models in $\mathbb{R}^3$, regardless of the mesh size. Our feedforward networks ingest transformed level-set, gradient, and curvature data to fix numerical mean-curvature approximations selectively for interface nodes. To reduce the problem's complexity, we have used the Gaussian curvature to classify stencils and fit our models separately to non-saddle and saddle patterns. Non-saddle stencils are easier to handle because they exhibit a curvature error distribution characterized by monotonicity and symmetry. While the latter has allowed us to train only on half the mean-curvature spectrum, the former has helped us blend the data-driven and the baseline estimations seamlessly near flat regions. On the other hand, the saddle-pattern error structure is less clear; thus, we have exploited no latent information beyond what is known. In this regard, we have trained our models on not only spherical but also sinusoidal and hyperbolic paraboloidal patches. Our approach to building their data sets is systematic but gleans samples randomly while ensuring well-balancedness. We have also resorted to standardization and dimensionality reduction as a preprocessing step and integrated regularization to minimize outliers. In addition, we leverage curvature rotation/reflection invariance to improve precision at inference time. Several experiments confirm that our proposed system can yield more accurate mean-curvature estimations than modern particle-based interface reconstruction and level-set schemes around under-resolved regions.  ( 3 min )
    How important are socioeconomic factors for hurricane performance of power systems? An analysis of disparities through machine learning. (arXiv:2208.09063v1 [cs.LG])
    This paper investigates whether socioeconomic factors are important for the hurricane performance of the electric power system in Florida. The investigation is performed using the Random Forest classifier with Mean Decrease of Accuracy (MDA) for measuring the importance of a set of factors that include hazard intensity, time to recovery from maximum impact, and socioeconomic characteristics of the affected population. The data set (at county scale) for this study includes socioeconomic variables from the 5-year American Community Survey (ACS), as well as wind velocities, and outage data of five hurricanes including Alberto and Michael in 2018, Dorian in 2019, and Eta and Isaias in 2020. The study shows that socioeconomic variables are considerably important for the system performance model. This indicates that social disparities may exist in the occurrence of power outages, which directly impact the resilience of communities and thus require immediate attention.  ( 2 min )
    VAuLT: Augmenting the Vision-and-Language Transformer with the Propagation of Deep Language Representations. (arXiv:2208.09021v1 [cs.CV])
    We propose the Vision-and-Augmented-Language Transformer (VAuLT). VAuLT is an extension of the popular Vision-and-Language Transformer (ViLT), and improves performance on vision-and-language tasks that involve more complex text inputs than image captions while having minimal impact on training and inference efficiency. ViLT, importantly, enables efficient training and inference in vision-and-language tasks, achieved by using a shallow image encoder. However, it is pretrained on captioning and similar datasets, where the language input is simple, literal, and descriptive, therefore lacking linguistic diversity. So, when working with multimedia data in the wild, such as multimodal social media data (in our work, Twitter), there is a notable shift from captioning language data, as well as diversity of tasks, and we indeed find evidence that the language capacity of ViLT is lacking instead. The key insight of VAuLT is to propagate the output representations of a large language model like BERT to the language input of ViLT. We show that such a strategy significantly improves over ViLT on vision-and-language tasks involving richer language inputs and affective constructs, such as TWITTER-2015, TWITTER-2017, MVSA-Single and MVSA-Multiple, but lags behind pure reasoning tasks such as the Bloomberg Twitter Text-Image Relationship dataset. We have released the code for all our experiments at https://github.com/gchochla/VAuLT.  ( 3 min )
    GraphTTA: Test Time Adaptation on Graph Neural Networks. (arXiv:2208.09126v1 [cs.LG])
    Recently, test time adaptation (TTA) has attracted increasing attention due to its power of handling the distribution shift issue in the real world. Unlike what has been developed for convolutional neural networks (CNNs) for image data, TTA is less explored for Graph Neural Networks (GNNs). There is still a lack of efficient algorithms tailored for graphs with irregular structures. In this paper, we present a novel test time adaptation strategy named Graph Adversarial Pseudo Group Contrast (GAPGC), for graph neural networks TTA, to better adapt to the Out Of Distribution (OOD) test data. Specifically, GAPGC employs a contrastive learning variant as a self-supervised task during TTA, equipped with Adversarial Learnable Augmenter and Group Pseudo-Positive Samples to enhance the relevance between the self-supervised task and the main task, boosting the performance of the main task. Furthermore, we provide theoretical evidence that GAPGC can extract minimal sufficient information for the main task from information theory perspective. Extensive experiments on molecular scaffold OOD dataset demonstrated that the proposed approach achieves state-of-the-art performance on GNNs.  ( 2 min )
    A Multi-Modal Wildfire Prediction and Personalized Early-Warning System Based on a Novel Machine Learning Framework. (arXiv:2208.09079v1 [cs.LG])
    Wildfires are increasingly impacting the environment, human health and safety. Among the top 20 California wildfires, those in 2020-2021 burned more acres than the last century combined. California's 2018 wildfire season caused damages of $148.5 billion. Among millions of impacted people, those living with disabilities (around 15% of the world population) are disproportionately impacted due to inadequate means of alerts. In this project, a multi-modal wildfire prediction and personalized early warning system has been developed based on an advanced machine learning architecture. Sensor data from the Environmental Protection Agency and historical wildfire data from 2012 to 2018 have been compiled to establish a comprehensive wildfire database, the largest of its kind. Next, a novel U-Convolutional-LSTM (Long Short-Term Memory) neural network was designed with a special architecture for extracting key spatial and temporal features from contiguous environmental parameters indicative of impending wildfires. Environmental and meteorological factors were incorporated into the database and classified as leading indicators and trailing indicators, correlated to risks of wildfire conception and propagation respectively. Additionally, geological data was used to provide better wildfire risk assessment. This novel spatio-temporal neural network achieved >97% accuracy vs. around 76% using traditional convolutional neural networks, successfully predicting 2018's five most devastating wildfires 5-14 days in advance. Finally, a personalized early warning system, tailored to individuals with sensory disabilities or respiratory exacerbation conditions, was proposed. This technique would enable fire departments to anticipate and prevent wildfires before they strike and provide early warnings for at-risk individuals for better preparation, thereby saving lives and reducing economic damages.  ( 3 min )
    IAN: Iterated Adaptive Neighborhoods for manifold learning and dimensionality estimation. (arXiv:2208.09123v1 [cs.LG])
    Invoking the manifold assumption in machine learning requires knowledge of the manifold's geometry and dimension, and theory dictates how many samples are required. However, in applications data are limited, sampling may not be uniform, and manifold properties are unknown and (possibly) non-pure; this implies that neighborhoods must adapt to the local structure. We introduce an algorithm for inferring adaptive neighborhoods for data given by a similarity kernel. Starting with a locally-conservative neighborhood (Gabriel) graph, we sparsify it iteratively according to a weighted counterpart. In each step, a linear program yields minimal neighborhoods globally and a volumetric statistic reveals neighbor outliers likely to violate manifold geometry. We apply our adaptive neighborhoods to non-linear dimensionality reduction, geodesic computation and dimension estimation. A comparison against standard algorithms using, e.g., k-nearest neighbors, demonstrates their usefulness.  ( 2 min )
    Representation Learning for the Automatic Indexing of Sound Effects Libraries. (arXiv:2208.09096v1 [cs.SD])
    Labeling and maintaining a commercial sound effects library is a time-consuming task exacerbated by databases that continually grow in size and undergo taxonomy updates. Moreover, sound search and taxonomy creation are complicated by non-uniform metadata, an unrelenting problem even with the introduction of a new industry standard, the Universal Category System. To address these problems and overcome dataset-dependent limitations that inhibit the successful training of deep learning models, we pursue representation learning to train generalized embeddings that can be used for a wide variety of sound effects libraries and are a taxonomy-agnostic representation of sound. We show that a task-specific but dataset-independent representation can successfully address data issues such as class imbalance, inconsistent class labels, and insufficient dataset size, outperforming established representations such as OpenL3. Detailed experimental results show the impact of metric learning approaches and different cross-dataset training methods on representational effectiveness.  ( 2 min )
  • Open

    Deviation-Based Learning: Training Recommender Systems Using Informed User Choice. (arXiv:2109.09816v2 [econ.TH] UPDATED)
    This paper proposes a new approach to training recommender systems called deviation-based learning. The recommender and rational users have different knowledge. The recommender learns user knowledge by observing what action users take upon receiving recommendations. Learning eventually stalls if the recommender always suggests a choice: Before the recommender completes learning, users start following the recommendations blindly, and their choices do not reflect their knowledge. The learning rate and social welfare improve substantially if the recommender abstains from recommending a particular choice when she predicts that multiple alternatives will produce a similar payoff.
    Bi-fidelity Modeling of Uncertain and Partially Unknown Systems using DeepONets. (arXiv:2204.00997v2 [stat.ML] UPDATED)
    Recent advances in modeling large-scale complex physical systems have shifted research focuses towards data-driven techniques. However, generating datasets by simulating complex systems can require significant computational resources. Similarly, acquiring experimental datasets can prove difficult as well. For these systems, often computationally inexpensive, but in general inaccurate, models, known as the low-fidelity models, are available. In this paper, we propose a bi-fidelity modeling approach for complex physical systems, where we model the discrepancy between the true system's response and low-fidelity response in the presence of a small training dataset from the true system's response using a deep operator network (DeepONet), a neural network architecture suitable for approximating nonlinear operators. We apply the approach to model systems that have parametric uncertainty and are partially unknown. Three numerical examples are used to show the efficacy of the proposed approach to model uncertain and partially unknown complex physical systems.
    A Physics-based Domain Adaptation framework for modelling and forecasting building energy systems. (arXiv:2208.09456v1 [cs.LG])
    State-of-the-art machine-learning based models are a popular choice for modelling and forecasting energy behaviour in buildings because given enough data, they are good at finding spatiotemporal patterns and structures even in scenarios where the complexity prohibits analytical descriptions. However, machine-learning based models for building energy forecasting have difficulty generalizing to out-of-sample scenarios that are not represented in the data because their architecture typically does not hold physical correspondence to mechanistic structures linked with governing phenomena of energy transfer. Thus, their ability to forecast for unseen initial conditions and boundary conditions wholly depends on the representativeness in the data, which is not guaranteed in building measurement data. Consequently, these limitations impede their application to real-world engineering applications such as energy management in Digital Twins. In response, we present a Domain Adaptation framework that aims to leverage well-known understanding of phenomenon governing energy behavior in buildings to forecast for out of sample scenarios beyond building measurement data. More specifically, we represent mechanistic knowledge of energy behavior using low-rank linear time-invariant state space models and subsequently leverage their governing structure to forecast for a target energy system for which only building measurement data is available. We achieve this by aligning the Physics-derived subspace that governs global state space behavior closer towards the target subspace derived from the measurement data. In this initial exploration we focus on linear energy systems; we test the subspace-based DA framework on a 1D heat conduction scenario by varying the thermophysical properties of the source and target systems to demonstrate the transferability of mechanistic models from Physics to measurement data.
    Diffusion-based Time Series Imputation and Forecasting with Structured State Space Models. (arXiv:2208.09399v1 [cs.LG])
    The imputation of missing values represents a significant obstacle for many real-world data analysis pipelines. Here, we focus on time series data and put forward SSSD, an imputation model that relies on two emerging technologies, (conditional) diffusion models as state-of-the-art generative models and structured state space models as internal model architecture, which are particularly suited to capture long-term dependencies in time series data. We demonstrate that SSSD matches or even exceeds state-of-the-art probabilistic imputation and forecasting performance on a broad range of data sets and different missingness scenarios, including the challenging blackout-missing scenarios, where prior approaches failed to provide meaningful results.
    ALBU: An approximate Loopy Belief message passing algorithm for LDA to improve performance on small data sets. (arXiv:2110.00635v2 [cs.LG] UPDATED)
    Variational Bayes (VB) applied to latent Dirichlet allocation (LDA) has become the most popular algorithm for aspect modeling. While sufficiently successful in text topic extraction from large corpora, VB is less successful in identifying aspects in the presence of limited data. We present a novel variational message passing algorithm as applied to Latent Dirichlet Allocation (LDA) and compare it with the gold standard VB and collapsed Gibbs sampling. In situations where marginalisation leads to non-conjugate messages, we use ideas from sampling to derive approximate update equations. In cases where conjugacy holds, Loopy Belief update (LBU) (also known as Lauritzen-Spiegelhalter) is used. Our algorithm, ALBU (approximate LBU), has strong similarities with Variational Message Passing (VMP) (which is the message passing variant of VB). To compare the performance of the algorithms in the presence of limited data, we use data sets consisting of tweets and news groups. Additionally, to perform more fine grained evaluations and comparisons, we use simulations that enable comparisons with the ground truth via Kullback-Leibler divergence (KLD). Using coherence measures for the text corpora and KLD with the simulations we show that ALBU learns latent distributions more accurately than does VB, especially for smaller data sets.
    Small random initialization is akin to spectral learning: Optimization and generalization guarantees for overparameterized low-rank matrix reconstruction. (arXiv:2106.15013v3 [cs.LG] UPDATED)
    Recently there has been significant theoretical progress on understanding the convergence and generalization of gradient-based methods on nonconvex losses with overparameterized models. Nevertheless, many aspects of optimization and generalization and in particular the critical role of small random initialization are not fully understood. In this paper, we take a step towards demystifying this role by proving that small random initialization followed by a few iterations of gradient descent behaves akin to popular spectral methods. We also show that this implicit spectral bias from small random initialization, which is provably more prominent for overparameterized models, also puts the gradient descent iterations on a particular trajectory towards solutions that are not only globally optimal but also generalize well. Concretely, we focus on the problem of reconstructing a low-rank matrix from a few measurements via a natural nonconvex formulation. In this setting, we show that the trajectory of the gradient descent iterations from small random initialization can be approximately decomposed into three phases: (I) a spectral or alignment phase where we show that that the iterates have an implicit spectral bias akin to spectral initialization allowing us to show that at the end of this phase the column space of the iterates and the underlying low-rank matrix are sufficiently aligned, (II) a saddle avoidance/refinement phase where we show that the trajectory of the gradient iterates moves away from certain degenerate saddle points, and (III) a local refinement phase where we show that after avoiding the saddles the iterates converge quickly to the underlying low-rank matrix. Underlying our analysis are insights for the analysis of overparameterized nonconvex optimization schemes that may have implications for computational problems beyond low-rank reconstruction.
    Suboptimal Performance of the Bayes Optimal Algorithm in Frequentist Best Arm Identification. (arXiv:2202.05193v2 [stat.ML] UPDATED)
    We consider the fixed-budget best-arm identification problem with Normal reward distributions. In this problem, the forecaster is given $K$ arms (or treatments) and $T$ time steps. The forecaster attempts to find the best arm, defined by the largest mean, via an adaptive experiment conducted using an algorithm. The algorithm's performance is measured by the simple regret, that is, the quality of the estimated best arm. The frequentist simple regret can be exponentially small to $T$, whereas the Bayesian simple regret is polynomially small to $T$. This paper demonstrates that Bayes optimal algorithm, which minimizes the Bayesian simple regret, does not produce an exponential simple regret for some parameters, a finding that contrasts with the many results indicating the asymptotic equivalence of Bayesian and frequentist algorithms in the context of fixed sampling regimes. While the Bayes optimal algorithm is described in terms of a recursive equation that is virtually impossible to compute exactly, we establish the foundations for further analysis by introducing a key quantity that we call the expected Bellman improvement.  ( 2 min )
    Journal Impact Factor and Peer Review Thoroughness and Helpfulness: A Supervised Machine Learning Study. (arXiv:2207.09821v3 [cs.DL] UPDATED)
    The journal impact factor (JIF) is often equated with journal quality and the quality of the peer review of the papers submitted to the journal. We examined the association between the content of peer review and JIF by analysing 10,000 peer review reports submitted to 1,644 medical and life sciences journals. Two researchers hand-coded a random sample of 2,000 sentences. We then trained machine learning models to classify all 187,240 sentences as contributing or not contributing to content categories. We examined the association between ten groups of journals defined by JIF deciles and the content of peer reviews using linear mixed-effects models, adjusting for the length of the review. The JIF ranged from 0.21 to 74.70. The length of peer reviews increased from the lowest (median number of words 185) to the JIF group (387 words). The proportion of sentences allocated to different content categories varied widely, even within JIF groups. For thoroughness, sentences on 'Materials and Methods' were more common in the highest JIF journals than in the lowest JIF group (difference of 7.8 percentage points; 95% CI 4.9 to 10.7%). The trend for 'Presentation and Reporting' went in the opposite direction, with the highest JIF journals giving less emphasis to such content (difference -8.9%; 95% CI -11.3 to -6.5%). For helpfulness, reviews for higher JIF journals devoted less attention to 'Suggestion and Solution' and provided fewer Examples than lower impact factor journals. No, or only small differences were evident for other content categories. In conclusion, peer review in journals with higher JIF tends to be more thorough in discussing the methods used but less helpful in terms of suggesting solutions and providing examples. Differences were modest and variability high, indicating that the JIF is a bad predictor for the quality of peer review of an individual manuscript.  ( 3 min )
    Deletion and Insertion Tests in Regression Models. (arXiv:2205.12423v2 [cs.LG] UPDATED)
    A basic task in explainable AI (XAI) is to identify the most important features behind a prediction made by a black box function $f$. The insertion and deletion tests of Petsiuk et al. (2018) are used to judge the quality of algorithms that rank pixels from most to least important for a classification. Motivated by regression problems we establish a formula for their area under the curve (AUC) criteria in terms of certain main effects and interactions in an anchored decomposition of $f$. We find an expression for the expected value of the AUC under a random ordering of inputs to $f$ and propose an alternative area above a straight line for the regression setting. We use this criterion to compare feature importances computed by integrated gradients (IG) to those computed by Kernel SHAP (KS) as well as LIME, DeepLIFT, vanilla gradient and input$\times$gradient methods. KS has the best overall performance in two datasets we consider but it is very expensive to compute. We find that IG is nearly as good as KS while being much faster. Our comparison problems include some binary inputs that pose a challenge to IG because it must use values between the possible variable levels and so we consider ways to handle binary variables in IG. We show that sorting variables by their Shapley value does not necessarily give the optimal ordering for an insertion-deletion test. It will however do that for monotone functions of additive models, such as logistic regression.  ( 3 min )
    On the Surprising Behaviour of node2vec. (arXiv:2206.08252v2 [cs.LG] UPDATED)
    Graph embedding techniques are a staple of modern graph learning research. When using embeddings for downstream tasks such as classification, information about their stability and robustness, i.e., their susceptibility to sources of noise, stochastic effects, or specific parameter choices, becomes increasingly important. As one of the most prominent graph embedding schemes, we focus on node2vec and analyse its embedding quality from multiple perspectives. Our findings indicate that embedding quality is unstable with respect to parameter choices, and we propose strategies to remedy this in practice.  ( 2 min )
    Deep Learning for Choice Modeling. (arXiv:2208.09325v1 [stat.ML])
    Choice modeling has been a central topic in the study of individual preference or utility across many fields including economics, marketing, operations research, and psychology. While the vast majority of the literature on choice models has been devoted to the analytical properties that lead to managerial and policy-making insights, the existing methods to learn a choice model from empirical data are often either computationally intractable or sample inefficient. In this paper, we develop deep learning-based choice models under two settings of choice modeling: (i) feature-free and (ii) feature-based. Our model captures both the intrinsic utility for each candidate choice and the effect that the assortment has on the choice probability. Synthetic and real data experiments demonstrate the performances of proposed models in terms of the recovery of the existing choice models, sample complexity, assortment effect, architecture design, and model interpretation.  ( 2 min )
    Non-Stationary Dynamic Pricing Via Actor-Critic Information-Directed Pricing. (arXiv:2208.09372v1 [stat.ML])
    This paper presents a novel non-stationary dynamic pricing algorithm design, where pricing agents face incomplete demand information and market environment shifts. The agents run price experiments to learn about each product's demand curve and the profit-maximizing price, while being aware of market environment shifts to avoid high opportunity costs from offering sub-optimal prices. The proposed ACIDP extends information-directed sampling (IDS) algorithms from statistical machine learning to include microeconomic choice theory, with a novel pricing strategy auditing procedure to escape sub-optimal pricing after market environment shift. The proposed ACIDP outperforms competing bandit algorithms including Upper Confidence Bound (UCB) and Thompson sampling (TS) in a series of market environment shifts.  ( 2 min )
    Estimating a potential without the agony of the partition function. (arXiv:2208.09433v1 [cs.LG])
    Estimating a Gibbs density function given a sample is an important problem in computational statistics and statistical learning. Although the well established maximum likelihood method is commonly used, it requires the computation of the partition function (i.e., the normalization of the density). This function can be easily calculated for simple low-dimensional problems but its computation is difficult or even intractable for general densities and high-dimensional problems. In this paper we propose an alternative approach based on Maximum A-Posteriori (MAP) estimators, we name Maximum Recovery MAP (MR-MAP), to derive estimators that do not require the computation of the partition function, and reformulate the problem as an optimization problem. We further propose a least-action type potential that allows us to quickly solve the optimization problem as a feed-forward hyperbolic neural network. We demonstrate the effectiveness of our methods on some standard data sets.  ( 2 min )
    Kernel PCA with the Nystr\"om method. (arXiv:2109.05578v3 [stat.ML] UPDATED)
    The Nystr\"om method is one of the most popular techniques for improving the scalability of kernel methods. However, it has not yet been derived for kernel PCA in line with classical PCA. In this paper we derive kernel PCA with the Nystr\"om method, thereby providing one of the few available options to make kernel PCA scalable. We further study its statistical accuracy through a finite-sample confidence bound on the empirical reconstruction error compared to the full method. The behaviours of the method and bound are illustrated through computer experiments on multiple real-world datasets. As an application of the method we present kernel principal component regression with the Nystr\"om method, as an alternative to Nystr\"om kernel ridge regression for efficient regularized regression with kernels.  ( 2 min )
    Empirical or Invariant Risk Minimization? A Sample Complexity Perspective. (arXiv:2010.16412v2 [cs.LG] UPDATED)
    Recently, invariant risk minimization (IRM) was proposed as a promising solution to address out-of-distribution (OOD) generalization. However, it is unclear when IRM should be preferred over the widely-employed empirical risk minimization (ERM) framework. In this work, we analyze both these frameworks from the perspective of sample complexity, thus taking a firm step towards answering this important question. We find that depending on the type of data generation mechanism, the two approaches might have very different finite sample and asymptotic behavior. For example, in the covariate shift setting we see that the two approaches not only arrive at the same asymptotic solution, but also have similar finite sample behavior with no clear winner. For other distribution shifts such as those involving confounders or anti-causal variables, however, the two approaches arrive at different asymptotic solutions where IRM is guaranteed to be close to the desired OOD solutions in the finite sample regime, while ERM is biased even asymptotically. We further investigate how different factors -- the number of environments, complexity of the model, and IRM penalty weight -- impact the sample complexity of IRM in relation to its distance from the OOD solutions  ( 3 min )
    Finding groups of cross-correlated features in bi-view data. (arXiv:2009.05079v3 [stat.ME] UPDATED)
    Data sets in which measurements of two (or more) types are obtained from a common set of samples arise in many scientific applications. A common problem in the exploratory analysis of such data is to identify groups of features of different data types that are strongly associated. A bimodule is a pair (A, B) of feature sets from two data types such that the aggregate cross-correlation between the features in A and those in B is large. A bimodule (A, B) is stable if A coincides with the set of features that have significant aggregate correlation with the features in B, and vice-versa. In this paper we propose and investigate an iterative testing-based procedure (BSP) to identify stable bimodules in bi-view data. We carry out a thorough simulation study to assess the performance of BSP, and present an extended application to the problem of expression quantitative trait loci (eQTL) analysis using recent data from the GTEx project. In addition, we apply BSP to climatology data to identify regions in North America where annual temperature variation affects precipitation.  ( 3 min )
    Almost Cost-Free Communication in Federated Best Arm Identification. (arXiv:2208.09215v1 [cs.LG])
    We study the problem of best arm identification in a federated learning multi-armed bandit setup with a central server and multiple clients. Each client is associated with a multi-armed bandit in which each arm yields {\em i.i.d.}\ rewards following a Gaussian distribution with an unknown mean and known variance. The set of arms is assumed to be the same at all the clients. We define two notions of best arm -- local and global. The local best arm at a client is the arm with the largest mean among the arms local to the client, whereas the global best arm is the arm with the largest average mean across all the clients. We assume that each client can only observe the rewards from its local arms and thereby estimate its local best arm. The clients communicate with a central server on uplinks that entail a cost of $C\ge0$ units per usage per uplink. The global best arm is estimated at the server. The goal is to identify the local best arms and the global best arm with minimal total cost, defined as the sum of the total number of arm selections at all the clients and the total communication cost, subject to an upper bound on the error probability. We propose a novel algorithm {\sc FedElim} that is based on successive elimination and communicates only in exponential time steps and obtain a high probability instance-dependent upper bound on its total cost. The key takeaway from our paper is that for any $C\geq 0$ and error probabilities sufficiently small, the total number of arm selections (resp.\ the total cost) under {\sc FedElim} is at most~$2$ (resp.~$3$) times the maximum total number of arm selections under its variant that communicates in every time step. Additionally, we show that the latter is optimal in expectation up to a constant factor, thereby demonstrating that communication is almost cost-free in {\sc FedElim}. We numerically validate the efficacy of {\sc FedElim}.  ( 3 min )
    Classification Performance Metric Elicitation and its Applications. (arXiv:2208.09142v1 [stat.ML])
    Given a learning problem with real-world tradeoffs, which cost function should the model be trained to optimize? This is the metric selection problem in machine learning. Despite its practical interest, there is limited formal guidance on how to select metrics for machine learning applications. This thesis outlines metric elicitation as a principled framework for selecting the performance metric that best reflects implicit user preferences. Once specified, the evaluation metric can be used to compare and train models. In this manuscript, we formalize the problem of Metric Elicitation and devise novel strategies for eliciting classification performance metrics using pairwise preference feedback over classifiers. Specifically, we provide novel strategies for eliciting linear and linear-fractional metrics for binary and multiclass classification problems, which are then extended to a framework that elicits group-fair performance metrics in the presence of multiple sensitive groups. All the elicitation strategies that we discuss are robust to both finite sample and feedback noise, thus are useful in practice for real-world applications. Using the tools and the geometric characterizations of the feasible confusion statistics sets from the binary, multiclass, and multiclass-multigroup classification setups, we further provide strategies to elicit from a wider range of complex, modern multiclass metrics defined by quadratic functions of confusion statistics by exploiting their local linear structure. From application perspective, we also propose to use the metric elicitation framework in optimizing complex black box metrics that is amenable to deep network training. Lastly, to bring theory closer to practice, we conduct a preliminary real-user study that shows the efficacy of the metric elicitation framework in recovering the users' preferred performance metric in a binary classification setup.  ( 3 min )
    Quantitative Universal Approximation Bounds for Deep Belief Networks. (arXiv:2208.09033v1 [stat.ML])
    We show that deep belief networks with binary hidden units can approximate any multivariate probability density under very mild integrability requirements on the parental density of the visible nodes. The approximation is measured in the $L^q$-norm for $q\in[1,\infty]$ ($q=\infty$ corresponding to the supremum norm) and in Kullback-Leibler divergence. Furthermore, we establish sharp quantitative bounds on the approximation error in terms of the number of hidden units.  ( 2 min )
    SimLDA: A tool for topic model evaluation. (arXiv:2208.09299v1 [cs.LG])
    Variational Bayes (VB) applied to latent Dirichlet allocation (LDA) has become the most popular algorithm for aspect modeling. While sufficiently successful in text topic extraction from large corpora, VB is less successful in identifying aspects in the presence of limited data. We present a novel variational message passing algorithm as applied to Latent Dirichlet Allocation (LDA) and compare it with the gold standard VB and collapsed Gibbs sampling. In situations where marginalisation leads to non-conjugate messages, we use ideas from sampling to derive approximate update equations. In cases where conjugacy holds, Loopy Belief update (LBU) (also known as Lauritzen-Spiegelhalter) is used. Our algorithm, ALBU (approximate LBU), has strong similarities with Variational Message Passing (VMP) (which is the message passing variant of VB). To compare the performance of the algorithms in the presence of limited data, we use data sets consisting of tweets and news groups. Using coherence measures we show that ALBU learns latent distributions more accurately than does VB, especially for smaller data sets.  ( 2 min )

  • Open

    Meet Sipeed’s TinyMaix: An Open-Source Lightweight Machine Learning Library For Microcontrollers
    Sipeed TinyMaix is an open-source machine learning library designed for microcontrollers. According to findings, it is lightweight enough to be compatible with Microchip ATmega328 MCU found in the Arduino UNO board and its many clones. The core code of TinyMax, which was created during a weekend hackathon, has roughly 400 lines, a binary size of about 3KB, and uses very little RAM, allowing it to execute the MNIST handwritten digit classification on an ATmega320 MCU with only 2KB SRAM and 32KB flash. Continue reading | Github submitted by /u/ai-lover [link] [comments]  ( 87 min )
    AI generated pepes
    submitted by /u/oatlover666 [link] [comments]  ( 99 min )
    Are there any AIs that you can feed several images and have the AI generate a new image based on the ones you gave it?
    Titles says it all. I'm looking for either a program or website that can accomplish this, or even a github library or something that will allow me to create this. To clarify, I don't want to train the AI on thousands of images, I want to input a single image or a few images and have the AI make similar images. Basically the same as a Text-to-inage, but the input not being text but an image. Thanks! submitted by /u/yea_okay_dude [link] [comments]  ( 88 min )
    My top 10 Dall-e art
    submitted by /u/Phibit-exe [link] [comments]  ( 91 min )
    How can I get a quick overview of all the AIs I could use?
    I love AIs and want to find as many new and barley known AIs as possible submitted by /u/TheblackRook3 [link] [comments]  ( 87 min )
    "Leonardo", Josef Zorn, Sculpted in Blender, 2022
    submitted by /u/thezeffo [link] [comments]  ( 87 min )
    Is there a AI to turn normal videos/images to this kind of animtion?
    submitted by /u/xXNOdrugsForMEXx [link] [comments]  ( 87 min )
    I made an AI generated sci-fi article about alien life on another planet
    submitted by /u/guilds-and-blades [link] [comments]  ( 89 min )
    Highly-Efficient New Neuromorphic Chip for AI on the Edge
    submitted by /u/Tao_Dragon [link] [comments]  ( 90 min )
    new to Phd, looking for tips, advise, and encouragement
    Hi, My background is I have BS CS and an MBA and have recently got accepted to a PHD program (part time - 4 yr yrs --- PHD in AI in case there are AI PHDs here). I'm currently working in corporate which is willing to support my further studies. My partner is supportive and we are expecting early next year. in this moment in time I really believe this goal aligns with my long term goals and career (besides my personal goal). So what I'm looking for are watchouts, tips, and similar stories of people who underwent the same journey or currently there. I understand its going to be challenging thats why I''m trying to look for some inspirations specially on how I can self manage (mental health, etc.), tips for studying, things i might miss or tools i could use. i'm open to just read and listen to everyone. Thank you for your wisdom submitted by /u/saintmichel [link] [comments]  ( 88 min )
    How to filter out AI artwork spam posts so I don’t see them?
    Exactly what the title says. I enjoy interesting articles and updates about the field of AI. But these days half of the posts are just people spamming lame “AI generated artwork”. Thanks for your help. submitted by /u/Uh-ok-sure [link] [comments]  ( 89 min )
  • Open

    [D] What are the best tools to store datasets and their different transformations?
    Hello everyone. I'm wondering how you store your datasets, as well as their transformed versions. Currently, when starting a new project, my workflow looks like this: Load the dataset in memory Apply some transformations (text tokenization, image resizing, audio trimming...) Realize it takes too much time to rerun the transformations each time I need to run my model Write a custom cache management handler, which quickly becomes overengineered for my simple needs I'm looking for tools that could give me a simpler and faster workflow: Load some data into the tool Apply some transformations Grab some data at some given transformation stage from the tool into my model The tool should be able to handle any kind of data (text, numpy arrays, audio, video). Do you have some recommendations? submitted by /u/TheMrZZ0 [link] [comments]  ( 89 min )
    [P] GPU virtualisation for on-prem ML compute in K8S
    I wrote a post on how to virtualise GPUs and attach them to VMs for on-prem workloads. The VMs can then be attached to Juju & K8S for load balancing or whatever you want. I implemented this where I work and it runs all of our ML compute, it was a pain to get working originally hope that you find it useful: https://www.paulcjh.com/technical_posts/gpu_virtualisation.html submitted by /u/paulcjh [link] [comments]  ( 89 min )
    [P] Min3Flow: A Multistage Text to Image Framework. Built using an inference-stripped subset of min-dalle, glid-3-xl, and SwinIR.
    submitted by /u/BiasedVariance [link] [comments]  ( 88 min )
    [R] Sequencer: Deep LSTM for Image Classification - (Rikkyo University, 2022)
    Paper: https://arxiv.org/abs/2205.01972 Github: https://github.com/okojoalg/sequencer Abstract: In recent computer vision research, the advent of the Vision Transformer (ViT) has rapidly revolutionized various architectural design efforts: ViT achieved state-of-the-art image classification performance using self-attention found in natural language processing, and MLP-Mixer achieved competitive performance using simple multi-layer perceptrons. In contrast, several studies have also suggested that carefully redesigned convolutional neural networks (CNNs) can achieve advanced performance comparable to ViT without resorting to these new ideas. Against this background, there is growing interest in what inductive bias is suitable for computer vision. Here we propose Sequencer, a novel and competitive architecture alternative to ViT that provides a new perspective on these issues. Unlike ViTs, Sequencer models long-range dependencies using LSTMs rather than self-attention layers. We also propose a two-dimensional version of Sequencer module, where an LSTM is decomposed into vertical and horizontal LSTMs to enhance performance. Despite its simplicity, several experiments demonstrate that Sequencer performs impressively well: Sequencer2D-L, with 54M parameters, realizes 84.6% top-1 accuracy on only ImageNet-1K. Not only that, we show that it has good transferability and the robust resolution adaptability on double resolution-band. https://preview.redd.it/ob5au2ldu2j91.jpg?width=770&format=pjpg&auto=webp&s=3b01a6ed5d840da070750efe60aa90b735c15cff https://preview.redd.it/98k1v8ldu2j91.jpg?width=984&format=pjpg&auto=webp&s=ce93e2ecaef4b61675cefbe16882dd9b067043ba https://preview.redd.it/zbxx69ldu2j91.jpg?width=884&format=pjpg&auto=webp&s=16e2ba9bf822c812ac666364ba566ff56cd8d193 https://preview.redd.it/gd2t8702w2j91.jpg?width=1518&format=pjpg&auto=webp&s=994943faa53baf58a102270c0808a776bce6f0ca submitted by /u/Singularian2501 [link] [comments]  ( 89 min )
    [D][Career Advice] What would be the right path to get back into research and publications? (Currently an MLE at a tech giant)
    Hi my fellow community, I am currently a senior machine learning engineer at one of the FAANGs. From a career standpoint, to be very honest I want to work on improving AI to solve science problems. So, I relate very heavily with the stuff that DeepMind is doing, or OpenAI and such. My current work isn't anything like that, it's in one of the recommendations team. The problem to get into one of those firms is that I don't have a PhD. And that's by choice. I have a tonne of respect for folks who go through a 4-5 year PhD but I don't fancy myself doing that anymore. There are multiple personal reasons for that which make it difficult for me to pursue one anymore. But I really want to see myself working with one of such firms. Working with ML not necessarily for a product, but to advance sc…  ( 100 min )
    [R] Musika! Fast Infinite Waveform Music Generation + Gradio Web Demo
    submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 91 min )
    [D] industries transformed by LLMs and generative ai?
    What companies or industries have the most to gain from the rise of LLMs and generative content? Which have the most to lose? For example, stock photos (aka Getty images) feels like it is in tough spot given generative images from services like DallE. GitHub with Copilot has a lot to gain. What else will win/lose over the next few years? submitted by /u/goldiceberg [link] [comments]  ( 88 min )
    [P] NNextDB (read: “nextdb”), an open-source, blazingly fast ⚡️, vector search database to power your AI apps 🦾.
    What’s NNext? [Say “Next”] NNext is a blazingly fast ⚡️, open-source 📖, (vector) neural search 🔎 engine for building delightful AI Apps 🦾. Github: https://github.com/nnextdb/nnext Applications Vector search is a key concept in modern machine learning systems. It’s the technology behind 🎖 Recommendation systems (such as Instagram’s “Explore” page). 🔎 Search system of all kinds Text search. For instance google’s “related search” Image Search. Such as reverse image search. 🤖 Chatbots and question answering systems 🧼 Data cleaning and pre-processing. Used to de-duplicate data representing similar items. 🏹 One-shot/zero-shot learning 🧌 Fraud and outlier detection Problem Existing vector search packages are usually in the form of open-source packages released by la…  ( 92 min )
    [D] How to invert a language model?
    I have been trying to invert a language model (i.e, get an input which will give a desired output) but have been so far unsuccessful because of the discrete input space. I have tried to relax the input space to a continuous space and add a regularizer to keep the input "close" to the actual discrete space but optimization seems to be tricky. Is there any published research or literature around this? Any tips? submitted by /u/Interesting_Year_201 [link] [comments]  ( 92 min )
    [P] Diagnose and Debias Large Language models step by step in this brand new YouTube series Stuff You Should Know in applied ML
    submitted by /u/prithivida [link] [comments]  ( 88 min )
    [D] Would doing a PhD in Causal ML be worth it?
    On the one hand it seems like this is the future and the question that needs to be answered in order for us to truly invent AGI but on the other hand, I am not sure if we will truly be able to teach machines about causation... I mean, except some few obvious phenomenon, we humans get tricked by causation too submitted by /u/DesperateBread3179 [link] [comments]  ( 93 min )
    [P] AI Cook Book - I am building a Google Sheet about everything that is AI related - I need feedback on my project.
    LINK to Project https://docs.google.com/spreadsheets/d/1JpgKHUZqsN2n4F4iACc67puPHjCyU7ABVHk8yhESFMw/edit?usp=sharing ​ WHAT Originally this would have been my PhD topic to create an all around "AI dictionary" about - All Python frameworks - Resources - Research Papers - Models (!BENCHMARKED AGAINST EACH OTHER on standard datasets) - Services but I think it is better fits as an public project. ​ GOAL - Faster prototyping - Anyone can find things easier and quicker. - Also it can help people move forward in their carrier and most importantly be able to make better objective decision. ​ HELP FORMATTING: I have a huge collection of staff that I want to add to this google sheet. But I need to make sure that I pick a format for ordering all these information that actually people are looking for and useful. ADDING: So I am looking for community feedback about what things to add (topics = sheets, columns in each topic). Final look will probably a website when/if the project grows. !JOIN : If anyone would like to participate just request editing rights for the sheet (I just have to make sure that people don't abuse the whole of the work by putting their guides first.) submitted by /u/glassAlloy [link] [comments]  ( 90 min )
  • Open

    7 Key Containerization Benefits for Your IT Business
    These days, we all are witnessing advanced IT Business solutions pushing the business sector too the next level. No doubt, all these solutions are highly effective and valuable for every type and size of business. Modern technology solutions are highly effective for the whole business sector. If we discuss the IT business industry individually. We… Read More »7 Key Containerization Benefits for Your IT Business  The post 7 Key Containerization Benefits for Your IT Business  appeared first on Data Science Central.  ( 19 min )
  • Open

    Urgent Need help! My environment fails to render with the trained model but need to provide analysis without it.
    I trained an agent to play game with DQN and PPO. All I have is tensorboard charts and I dont have time to solve this problem. How should I present my analysis for report for uni project. I need to present report soon. Any idea would be helpful. Maybe some of you had gone through same issue. How can I infer which model performed best or how they compare? submitted by /u/prestem [link] [comments]  ( 87 min )
    Is Openai Gym documentation website down?
    I can't reach the open ai gym documentation website, is it down for anyone else? https://preview.redd.it/jhdhow32j4j91.png?width=520&format=png&auto=webp&s=c18f8b70ddc4021ca297f4e8977ac966586ccd3a submitted by /u/afakharany93 [link] [comments]  ( 87 min )
    Self Study RL using custom environments.
    I'm looking forward to understand RL from the very basics by implementation of custom environments and training a model. It would be of great help to have references and code starting from very basics to intermediate moving slowly to advanced like AlphaZero. Request you to help me in my journey with all the possible materials available out there that you know of. Note: I understand the basics of Deep Learning and have 5 years of development experience in python. Thanks for your help. submitted by /u/haldarankit [link] [comments]  ( 88 min )
    Is there a representation learning network for IMPALA?
    I was going through representation learning architectures and did not find any for IMPALA, is there no representation learning method that can be used with IMPALA? submitted by /u/Cool_Abbreviations_9 [link] [comments]  ( 104 min )
    My DQN does not want to output negative values
    I have an environment with only negative rewards. For this reason, my Q network has a final Relu layer and I output -relu(net_out), so that the output of my Q network is always in [-inf, 0]. However, during training my network immediately converge to always produce 0 values, and target values (r + gamma * max_a Q(s', a)) quickly become just equal to the r term. I understand that having a negative Relu means that max_a Q(s', a) will often be 0, but the Q network would then convege to r, and after the target network update, the target network should then output r, and the target would become r + gamma * r, and so on, converging to the right value. Why do you think this doesn't happen? ​ Here is some training curves. Orange is the average target value (i.e. r + gamma * max_a Q_target(s', a)) . Blue is the average predicted value (i.e. Q_eval(s, a)). Green is the next state value (i.e. Q_target(s', a)) and red is the average reward the agent is collecting. The "bumps" in the curves are due to the target network being updated with the eval network weights. https://preview.redd.it/d3iu7w1wb3j91.png?width=1472&format=png&auto=webp&s=725684b354d4a3b701351b580fba0cd1f285ddc5 To put things in perspective: the total return for an episode is around -200, and I'm using a discount factor (gamma) of 0.99 ​ EDIT: I'll put here some other info as I continue investigating the issue: - I'm not using Double DQN, but if I do the result is much worse. This makes me think that it's not an overestimation problem, since Double DQN is supposed to improve it - If I increase the update frequency, values converge much more quickly to exactly 0 ​ submitted by /u/fedetask [link] [comments]  ( 90 min )
    Help for MPO
    Hi, I am currently studying MPO and I am having a really hard time understanding it. I understood half of the part of the "E-step" section, but then from there I am just too stupid to get it. If someone has any good resources regarding this topic (MPO - Maximum a posteriori policy optimization), it would be so kind if you could link them. Or maybe someone understands it so they can explain it. Here is the part I don't understand: https://preview.redd.it/c1q9vcypy1j91.png?width=993&format=png&auto=webp&s=17318849a517488fedf85f10832f73a9e37229c0 How do they do all of it and the text below doesn't make much sense too. But the most confusing part are the formulas. Like I said, it would mean the world to me I could understand the paper. ​ Thanks in advance! submitted by /u/alextheai [link] [comments]  ( 88 min )
  • Open

    Multi layer Hebbian learning?
    So I’ve recently gotten into Hebbian learning and in many examples I see that there is a hidden layer between the input and output. But isn’t Hebbian learning a direct link from input to output so that you can directly increase and decrease the weights? submitted by /u/BeesechurgerLad53 [link] [comments]  ( 87 min )

  • Open

    [Project] Now Find and Filter Papers by Code Availability
    Your suggestions, comments, and candid feedback would be highly welcome! Here's what it looks like in action: Input (with code filter on): "photo style transfer"https://www.catalyzex.com/search?query=photo%20style%20transfer&with_code=true Output: list of all "photo style transfer" papers with corresponding code implementations linked https://preview.redd.it/9b3t0e4dcyi91.png?width=2894&format=png&auto=webp&s=6ffc0005f126a78eca93efef5f13d6ecf62df48e Video of it in action: https://reddit.com/link/wtl90j/video/6aoohdbygyi91/player submitted by /u/MLtinkerer [link] [comments]  ( 91 min )
    I recreated famous album covers with DALL-E
    submitted by /u/lucytalksdata [link] [comments]  ( 104 min )
    Deepmind: Transframer AI dreams 30-second video from an image
    submitted by /u/henlo_there_fren [link] [comments]  ( 87 min )
    Interpretable Natural Language Processing (INLP) Workshop - yesterday
    submitted by /u/akolonin [link] [comments]  ( 87 min )
    AI Consciousness in Batman The Animated Series S01E43 (1992)
    submitted by /u/OmicronGR [link] [comments]  ( 89 min )
    Color Swirl Creature
    submitted by /u/widgia [link] [comments]  ( 86 min )
    What are the consequences for a business if AI fails?
    AI models work on the basis of very complex algorithms and statistical correlations, but there is always a margin of error. Should a company need to implement AI in a process with high variability and low accuracy or vice versa? What are the risks and how much investment will be lost if it doesn't work? submitted by /u/inqoob-Constructor [link] [comments]  ( 90 min )
    any companies selling computer vision ml models
    Looking for companies selling trained models for use in street traffic monitoring, anyone got a clue ? submitted by /u/secccentral [link] [comments]  ( 87 min )
    Can AI be the next Joe Rogan?
    submitted by /u/kbf_ [link] [comments]  ( 87 min )
    Image created with Midjourney. "Vincent Anatomy Evolution"
    submitted by /u/echo_zoo [link] [comments]  ( 87 min )
    any program for lonely people to play board games?
    like i do like a lot of AI and i do want to play some board games but i am wanting a program to feed rules and see what is on the board and play with in a physical game and not a digital analogue submitted by /u/No_Inside_2297 [link] [comments]  ( 87 min )
  • Open

    [D] Comparison of NNs to our biological neuron systems… plausible or BS?
    When I started to learn about Neural Networks, I read plenty of medium articles that make the awe-inspiring comparisons of our brains neuron connections to neural networks. I understand that initially McCulloch created the perception model as an attempt to model the behavior of a neuron, but later developments, Widrow-Hoff learning, SVMs, etc. feel as if they don’t really build onto this analogy, and rather present clever manipulations of loss functions that are more based in optimization theory. I feel as if neural networks and making the analogy to how our neurons behave tremendously undershoots just how much more complex our brains are relative to these simple graphical models. So as we continue to make this analogy today, is it grounded in reality at all, or is it more of just something cool that a lot of beginners like to observe? submitted by /u/No_Lingonberry2565 [link] [comments]  ( 90 min )
    [P] help me create a product: a screen with a frame you can hang on the wall as a painting, generating pictures/painting compatible with Dall-E 2, Midjourney or similar projects.
    Help me with what you got and let's create a team if you believe in this. submitted by /u/Sky13 [link] [comments]  ( 89 min )
    [P] How should I approach a signal-to-signal translation problem?
    Hi all, I've seen a lot of approaches for the unpaired image-to-image problem mainly GAN-based approaches like CycleGAN for example. I've also seen some autoencoder approaches such as the swapping autoencoder. While this problem is pretty heavily explored, I was wondering if anyone has given significant thought to signal-to-signal translation problems. I'm currently trying to train a model for ECG to ECG translation. I started out by trying to train a CycleGAN (replaced all the 2D convolutions with 1D convolutions), but this model was quite difficult to train as the loss never really decreases. Does anyone have some suggestions about how I could go about this problem? I see a lot of people talking about diffusion models on this sub. Would they be a possible strategy? Thanks in advance. submitted by /u/DaGr8Nave [link] [comments]  ( 89 min )
    [P] CatalyzeX Update - Now Find and Filter Papers by Code Availability
    "Talk is cheap. Show me the code." Hey all. We have just rolled out a new update for you. We've revamped CatalyzeX.com with the ability to only search for machine learning papers that have at least 1 open-source code implementation available. We hope this is useful for the community; this was a popular request and not currently a feature on many search engines. Also, we welcome contributions from everyone — so if you have implemented code for any papers, you can add them too if you wish. Your suggestions, comments, and candid feedback would be highly welcome! Here's what it looks like in action: Input (with code filter on): "photo style transfer" Output: list of all "photo style transfer" papers with corresponding code implementations linked (Try example search link in comments) https://preview.redd.it/8uyvkxqqrxi91.png?width=2894&format=png&auto=webp&s=574e0a6b83e8410978ca359b475247c8ea5cb32e Video of it in action: https://reddit.com/link/wtio28/video/v2y5x7qnfyi91/player submitted by /u/MLtinkerer [link] [comments]  ( 90 min )
    [P] Is it feasible to find a mapping between two non-synthesized audio signals of the same audio sequence?
    Hello people. I have been butting my head against this project without success, and I am starting to doubt that this is possible. Perhaps someone who has been working with neural networks for audio can have input on this: I have an audio A, which consists out of 4'000'000 datapoints that signify the sound intensity at a time t. A is recorded in a noisy environment, with the song being played over the speakers and then recorded on a phone. I then have audio B, which is 1'400'000 datapoints which is a subset of audio A(the chorus of a song). Audio B is recorded in a studio. It is possible to line up the two audio sequences so that they perfectly match on all the sounds. The question is then, is it possible to find a general mapping using the 1'400'000 shared datapoints from audio A to audio B, and then use this mapping to extract the non-existing audio by applying it to the non-shared part of audio A. Is there any previous work done in this area? The examples with audio reconstruction and audio synthesis that I have come across are usually synthesized, or noise being added over the original track, which makes the task easier. What I have tried: Direct mapping using MLP MEL-spectrogram matching using different STFT-windows and 1D/2D-CNN's/Encoder-Decoders. Direct mapping of phase + intensity of FFT using MLP submitted by /u/SlayahhEUW [link] [comments]  ( 92 min )
    [R] iColoriT: Towards Propagating Local Hint to the Right Region in Interactive Colorization by Leveraging Vision Transformer
    Project page: https://pmh9960.github.io/research/iColoriT/ https://reddit.com/link/wtcqld/video/wcpxbln3pwi91/player submitted by /u/yeolj0o [link] [comments]  ( 88 min )
    [R] Sketch2Pose — estimating a 3D character pose from a bitmap sketch
    submitted by /u/SpatialComputing [link] [comments]  ( 90 min )
    [P] Building a App for Stable Diffusion: Text to Image generation in Python
    submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 90 min )
    [D] What are the uses of Large Models?
    Many startups and companies are training large models. I want to know what are the real life use cases for such large models (gpt3, bloom, palm, etc) with 100s of billion parameters. I'd understand it being used as a base model for further fine-tuning but even BERT with 500M /1B parameters can perform similarly. Am i missing something? submitted by /u/tororo-in [link] [comments]  ( 111 min )
    [P] GPT-NeoX inference with LLM.int8() on 24GB GPU
    Implementation & LM Eval Harness Results LLM.int8() Paper LLM.int8() r/MachineLearning discussion submitted by /u/mlvpj [link] [comments]  ( 88 min )
    Object, character and scene continuity in image synthesis yet? [D]
    Stable Diffusion and DALLE2 have no continuity between prompts. How much of a challenge is it to have a system where you can define an object or character in one scene, and then manipulate them with further prompts? submitted by /u/The5e [link] [comments]  ( 109 min )
    New video series on deep learning for computer vision [P]
    submitted by /u/OppositePickle2171 [link] [comments]  ( 88 min )
    [N] John Carmack raises $20M from various investors to start Keen Technologies, an AGI Company.
    submitted by /u/hardmaru [link] [comments]  ( 97 min )
    [P] Telepathy is just neural translation software
    My team at A Mind Applied has developed software to translate brain activity associated with thought words. Using a convolutional neural network our software can predict a number of thoughts while wearing an EEG headset. Right now we’re working on improving the accuracy and then we’ll be adding more words! Anyone here like neurotech and interested in getting involved? submitted by /u/rubbedlamp [link] [comments]  ( 89 min )
  • Open

    Implementations of risk aware reinforcement learning.
    I’m working on a decision based problem in finance for options. In researching decision support tools, I like how various relationships can be explored with the impact to the target variable. Ultimately though, I don’t necessarily care about the interpretability but just the performance. One of the things though that I’ve learned with researching decision support tool models is that they can help quantify risk and uncertainty. I started to research how to apply this in a reinforcement learning model. When considering the action the agent should take, the policy should be risk aware and understand that while on some occasions choosing take a given action may yield great reward, most of the time it will incur a penalty and then be able to use that to reduce the exploration space by weighin…  ( 93 min )
    How are benchmarks decided in RL?
    This might be a naive but I've been starting out in research in RL and see distributed architectures like Impala and non distributed ones like SAC , etc. And I see many papers focusing on one set of algorithms along with their contribution. So, how are baselines exactly decided ? Can they be compared separately ( distributed and non distributed) in some problems because anything with impala might always beat a baseline score set by a non distributed architecture. submitted by /u/Cool_Abbreviations_9 [link] [comments]  ( 88 min )
    In the Latest Machine Learning Research, UC Berkeley Researchers Propose an Efficient, Expressive, Multimodal Parameterization Called Adaptive Categorical Discretization (ADACAT) for Autoregressive Models
    submitted by /u/ai-lover [link] [comments]  ( 89 min )
    Good PhD programs for Reinforcement Learning (US)
    Hello! I am wondering what people think are some of the best PhD programs and faculty for doing a PhD in reinforcement learning in the US. There is a similar question that was asked a couple of years ago but I think it had a focus on Europe. submitted by /u/randomkolmogorov [link] [comments]  ( 89 min )
    [Question] Should tabular TD and MC predictions render the same V?
    I've never really bothered to play with tabular methods before, so I decided to look into it. I've implemented a bunch of stuff, like for instance SARSA and MC (first-visit) algorithms. I've noticed, however, that given the exact same policy, they generate different values for V...SARSA: v = defaultdict(float) for _ in range(1, EPISODES): s = str(env.reset()) g = 0 for _ in range(1, TIME_STEPS): a = np.random.randint(0, env.action_space()-1) s_, r, d, _ = env.step(a) s_ = str(s_) g += r v[s] = v[s] + ALPHA * (r + GAMMA * v[s_] - v[s]) s = s_ if d: break and MC First-Visit: v = defaultdict(float) returns = defaultdict(list) for _ in range(1, EPISODES): episode = [] s = str(env.reset()) for _ in range(1, TIME_STEPS): a = np.random.randint(0, env.action_space()-1) s_, r, d, _ = env.step(a) s_ = str(s_) episode.append((s,a,r)) s = s_ if d: break g = 0 for i in range(len(episode)-1,0,-1): g += GAMMA**i * episode[i][2] if episode[i][0] not in [e[0] for e in episode[:i]]: returns[episode[i][0]].append(g) v[episode[i][0]] = np.average(returns[episode[i][0]]) Note that my simple environment (Tower of Hanoi) is deterministic. The policies I'm using are random, but I would expect that, given enough episodes, both V would converge towards the same values? Am I right, or wrong? Thanks people EDIT: I should add that the order of the state values seems roughly the same for V given by SARSA and MC. submitted by /u/AnotherForce [link] [comments]  ( 105 min )
    Help Deciding an Algorithm for a Project
    Hello everyone! I'm a highschooler that just got into reinforcement learning recently since I have a project that I want to solve, and I'm having some trouble selecting which algorithm to use. I have a four wheeled robot, each wheel being able to rotate on its own, and I want to take one camera input in order for my robot to locate and drive into a red ball. I also want it to be able to have small obstacle avoidance capabilities but that's less necessary since this is more of a proof of concept right now to prove to a team that I'm working on that reinforcement learning is super cool and we should use it on our robot! I'm currently thinking of using Deep Q Network, because I have around 15 days to get a basic working version and I've heard the sample efficiency is high, and I was going to try and simulate the robot with PyBullet before putting the code on my robot and praying it works. If anyone has some tips that could point me in the right direction, or point out some massive error in my plan, I would super greatly appreciate it! submitted by /u/WantRecommendations [link] [comments]  ( 89 min )
    In AlphaStar (and in general), how are NNs trained in series?
    Looking at this diagram of their architecture, they have a series of neural nets (NNs) that, together, dictate the move to take. ​ https://preview.redd.it/qfuqr4bvzui91.png?width=2421&format=png&auto=webp&s=06747d200093ae10fa0fd6768a6fdead2652b3dc ​ To my understanding, the order of data passing from input to output is: 1) First NN chooses the "action type" 2) Next NN chooses a "delay" 3) Next NN chooses where to place this action in the queue 4) Next NN chooses which unit(s) to do this action with 5) Have a choice of NNs here depending on if the action targets a unit (eg an attack) or a place on the map (eg a move or build): 5.1) NN to select a unit to target 5.2) NN to select where on map to target. ​ ​ If we ignore steps 2) and 3) and just assume all actions have a unit tar…  ( 90 min )
    Stable baseline 3 to train street fighter agent, issue with results
    Hi all, I am using stable baseline 3 to train a street fighter agent to play against AI. The aim is to maximise the score in the round of Ryu vs Guile. Here is the code and some results. What I dont understand is the following. Why does the loss not decrease, but the policy gradient loss is decreasing? Why doesnt the entropy loss change (-4.84)? The mean reward seems to be increasing though, but when I run an evaluation many times using random motion vs our trained agent, there is literally no difference in the win rate (Sometimes win, sometimes no for both cases) Is this training 'correct' though? import retro import os import time ​ from gym import Env from gym.spaces import Box, MultiBinary, Discrete import numpy as np import cv2 ​ ​ class StreetFighter(Env): de…  ( 107 min )
  • Open

    NEW AI Humanoid Robot CyberOne | New Edge Computing AI Chip With 13X Computational Density
    submitted by /u/kenickh [link] [comments]  ( 87 min )
    Deepfaking a mask into moving lips?
    I know we change human lips but what about a mask of a cartoon character? Can that be made to speak? submitted by /u/RedeletedMonkeyButt [link] [comments]  ( 87 min )

  • Open

    John Carmack’s AGI startup raises $20 million from Sequoia, Nat Friedman, Patrick Collison and others
    submitted by /u/nick7566 [link] [comments]  ( 87 min )
    What are ways to edit selfies with artifical intelligence?
    What different ways are there to edit images with AI? submitted by /u/xXNOdrugsForMEXx [link] [comments]  ( 87 min )
    Researchers at Oxford University Propose a Machine Learning Framework Called ‘TriSegNet’ Based on Triple-View Feature Learning for Medical Image Segmentation
    submitted by /u/ai-lover [link] [comments]  ( 88 min )
    AI Music programs/sites and are there AI songwriting programs or sites?
    I want to experiment with making AI music,but I dont know where and how to start. Im looking for programs and sites that make music with AI. Id also like to know if an AI songwriting thing exists to make lyrics as I think that could be interesting. Anyone have resources? submitted by /u/CharmedBySnakes [link] [comments]  ( 90 min )
    I prompted bloom - failure - are there prompt engineers here ?
    You are camping in the desert with a friend. You install a tent for the night and go sleep. You wake up your friend and asks : - What do you see Watson ? - I see the sky and some beautiful stars - what can you deduct of this ? - I can deduct that we are in the desert - what else can you deduct I can deduct that we are in the desert - what else can you deduct can deduct that we are in the desert - what else can you deduct I can -They stole our tent , stupid I think the answer is I can deduct that we are in the desert submitted by /u/grumpyfrench [link] [comments]  ( 88 min )
    Quantum Computing at World’s Top 50 Innovators 2022
    submitted by /u/chelsea_bear [link] [comments]  ( 87 min )
    Nvidia RTX A1000 - 4GB vs GeForce RTX 3060 - 6 GB vs GeForce RTX 3050 Ti - 4 GB
    Hi all, can someone tell me which one is better for deep learning and training models. I don't need for gaming. I'm trying to buy Dell Laptop but still can't figure out what Graphic card should i buy. Any suggestions ? submitted by /u/AKIvan87 [link] [comments]  ( 87 min )
    Meta Quest 2: Meta fixes an annoying problem but hides the option
    submitted by /u/henlo_there_fren [link] [comments]  ( 87 min )
    Artists and designers protest against AI-generated graphics
    submitted by /u/much_successes [link] [comments]  ( 92 min )
    Hey so I see alot of videos on yt titled something like “____ except it’s written by an AI” can anyone what they use?
    submitted by /u/likbitch15 [link] [comments]  ( 87 min )
    AI Manifest: Galactic Flight Path | Nebula Encounters | Cinematic 4KUHD 60 FPS
    submitted by /u/Available_Tadpole829 [link] [comments]  ( 87 min )
    I died on a sunday - images and music created by AI. Music by aiva; images by MidJourney
    submitted by /u/adminsmithee [link] [comments]  ( 88 min )
    Google has made a robot that takes commands (Using PaLM-SayCan)
    submitted by /u/tabascooo [link] [comments]  ( 94 min )
    Never Gonna Give You Up But Every Lyric Is An AI Generated Image...
    submitted by /u/Wingman143 [link] [comments]  ( 87 min )
    Best conversational chatbots?
    I'm really interested in talking to an AI, but all the ones i can find seem really. Lame. Cleverbot feels bland, Replika has too many preprogrammed messages, Chai feels very scripted. I'm noticing there's this unifying flaw of them being all artificial but no intelligence. It's like talking to a brick wall. I'm hoping other people here have my same experience cuz it's kind of hard to explain. There ability to recognize what you say and respond in an intelligent way - their entire point - is just so weak. submitted by /u/weedmaster6669 [link] [comments]  ( 92 min )
    offroad suv - stable diffusion
    submitted by /u/harrytanoe [link] [comments]  ( 90 min )
    “Mo Honey, Mo Problems” - Pixelz AI 🧸
    submitted by /u/pixelz_ai [link] [comments]  ( 87 min )
    Secret People: Ray Kurzweil
    submitted by /u/Defiant-Branch4346 [link] [comments]  ( 87 min )
    Backpropagation From Scratch
    submitted by /u/marcos_pereira [link] [comments]  ( 92 min )
  • Open

    DARPA "AI For Critical Minerals Assessment" Competition [D]
    DARPA is hosting a competition called “AI for Critical Mineral Assessments,” which is looking for solutions to automatically extract and georeferenced features from scanned or raster maps. The U.S. Geological Survey uses data from these assessments to build reports that can eventually lead to increasing domestic production of critical minerals and reducing U.S. reliance on imports. The competition includes two independent challenges: Map Georeferencing Challenge: Automated map georeferencing is a difficult task as most USGS maps are not digitized, and may be in a multitude of historical coordinate projection systems. Furthermore, the quality of features on scanned maps, critical for the identification of control points for alignment, can vary greatly. Participants will receive a dataset…  ( 90 min )
    [D] Would a “casual research whether or not publicly available models can make “programmer art” be worth it?
    This might not be the best way to discuss what I’m proposing. I work with ML but not really in a programmer fashion - more in the UX / model to model comparison approach. In my own time I’ve created a set of 50 or so prompts that are geared towards Game Dev - specifically the creator of “16-bit” style art for in-game objects, character portraits and backgrounds for things like visual novels. Would comparing these against publicly available modes that are user friendly - Laion 400, Midjourney, Dalle Mini etc and scoring them in broad categories be worth a publish? Like documenting and then say uploading to say a GitHub. I don’t mean publishing in like an academic sense. And I don’t really keep a ML related blog. submitted by /u/RekaAia [link] [comments]  ( 89 min )
    [P] How to build NVAE from scratch
    ​ https://preview.redd.it/ze5bszkttoi91.png?width=832&format=png&auto=webp&s=71b795d81fe33effa26b2641c681e8b507eaf685 I tried reproducing a paper that came out at the start of 2021 called "NVAE. A Deep Hierarchical Variational Autoencoder". It was a real struggle and here are some notes about it. submitted by /u/mgp_123 [link] [comments]  ( 105 min )
    [D] In 2010, did people expect things like DALLE and AlphaFold to be only 10/13 years away?
    AI stuff didn't really hit 'mainstream' until relatively recently so I was really surprised by the new prompt to image technology. Was the field expecting this technology by now or did people think it was a long ways away (20+ years)? I know the field moves really fast these days, so I'm curious about what the research community's expectations were in the past. submitted by /u/jl2cb [link] [comments]  ( 99 min )
    [D] Besides Reddit and Twitter, How do you keep updated with the latest DL/ML research?
    So if you are into research or want to try new methods in the field, you have to keep yourself updated in this area. Some of the methods I use are Twitter, Reddit, and Paper with code, but it takes a lot of time to scroll and also, sometimes, you cannot visit these sites for a long time due to exhaustive research work. However, newsletters are fantastic and give you a weekly, monthly overview of current research in just 5-10 minutes. Some of the newsletters I follow Paper with code newsletter & deep learning weekly. Here are some of the resources I follow : Websites Google AI blog Paperwithcode Arxiv-sanity Facebook AI blog Newsletters Kdnuggets Paperwithcode Huggingface Last Week In AI DeepLearning Weekly I was wondering what are other methods to keep yourself updated with cutting-edge methods, Newsletters, GitHub repos and research papers? submitted by /u/aadityaura [link] [comments]  ( 91 min )
    [D] Is there such a thing as "learning complexity theory" (akin to computational complexity theory) for machine learning?
    Hello, Is there a way to classify learning problems (classification for example) in terms of how hard it is for a machine learning algorithm to achieve good results on them? In the title I compared this to computational complexity theory which is defined as: classifying computational problems according to their resource usage, and relating these classes to each other. A computational problem is a task solved by a computer. A computation problem is solvable by mechanical application of mathematical steps, such as an algorithm. Does there exist something along the line of: classifying learning problems, and relating these classes to each other. A learning problem is a task learned by a computer. A learning problem is solvable by approximation using for example machine learning. Finally, would there be a way to rank machine learning based approaches based on some metric by just looking at the architecture (so without running it on a dataset)? In computational complexity theory we have O(n) and the like or even the equivalent for space complexity. I know this post might seem naive and low-key cringe to some but I was genuinly wondering if such a thing existed or if its pure definition as defined in this post is pointless or just too broad to be discussed in a constructive way :) Anyway let me know! Cheers, submitted by /u/Inquation [link] [comments]  ( 91 min )
    [D] Has any non-PhD here published an ML paper?
    If so, how was the whole experience? Did you collaborate with other people, get some guidance from academics, go through a bunch of re-iterations, etc.? submitted by /u/TacoMisadventures [link] [comments]  ( 122 min )
    [D] Need advice for model deployment + what's annoying for you?
    Hey everyone! I'm somewhat confused. I'm trying to understand how Machine Learning teams deploy their models to be usable by other folks across a company. Is it complicated or annoying to do? On another note: what are some technologically annoying things you have to deal with as a machine learning member at a company when it comes to building ML models, pipelines, and just executing your work? Thanks so much in advance for reading this post and hope you can take a few minutes to respond :))) submitted by /u/Neuro_Euro [link] [comments]  ( 88 min )
  • Open

    Interesting and need a little help
    I've been challenging myself a lot with data science and somehow I got really..... I mean way much interested in Reinforcement learning. But I don't know where to start or maybe I just wanted experts like y'all to tell me where to start. So, please do the honors and guide me to this realm of creativity and passion. I look forward to your answers. If you guys need help I can help too with some things I guess! Thanks! submitted by /u/AyushDave [link] [comments]  ( 88 min )
    Hyperparameters tunning for DQN :
    Hello everyone, I'm currently working on implementing a Double-DQN agent that makes decisions to learn how to solve discrete optimization problems (sudoku for example). However, there are some hyparameters that could have a tremendous impact on the learning that I don't really know how to fix depending on the situation. These are : trajectory_capacity batch_size update_freq target_update_freq The thing is that I would like to regroup the samples in the biggest mini_batch that my VRAM can handle in order to limit the number of back-propagation which are really time-consuming. However I want to avoid the classic side effects on the learning that arise when working with huge batches. For instance, let's consider I'd like to increase my batch size by a factor of 10. Is there a kind of rule or some kind of good practice regarding the other parameters update_freq and target_update_freq ? I'd say intuitively that by dividing by 10 the update_freq, I will get approximately the same number of processed samples while making the learning process faster. What are your intuitions about it?Thanks :) submitted by /u/Zealousideal-Ice9957 [link] [comments]  ( 88 min )
    Did Alpha Fold use RL?
    I cannot tell... I heard they used a transformer, but I don't know enough about protein folding to guess whether or not they did... Also I cannot find any mention of this on the internet. submitted by /u/Udon_noodles [link] [comments]  ( 88 min )
    Help choose right framework
    Hello! I need some physic engine which can be run on browser (so there mustbe WebGL or js libraries). Im doing my research with locomotion (like this https://images.app.goo.gl/B244Z5THmqonkqZJA ) But i have to do it like a webapp ( locomotion can be trained on server, but the final result should be able to be online) and i want to do it more beautiful, which of physics engines can use fbx models? Well, i want to do it more beautiful, not just flat terrain. What can u recommend me ? Online+imorted model from 3d software + RL? I worked with unity ml agents, which can build webgl games, but physx physics bad, its not realistic and not predictable. I i see there is 2 main Engines: mujoco and pybullet, but i dint find clear information that i can do a webapp and import models with them. submitted by /u/IndependenceCivil576 [link] [comments]  ( 88 min )
    D4RL, MuJoCo-py docker image
    Hey guys, I struggled quite a lot with installing MuJoCo-py and D4RL and would've wished a docker image was readily available. Even the Dockerfile on the MuJoCo-py github repo fails to build. Here's an image I created: https://hub.docker.com/r/chboe/rl-py I thought I'd leave it here for other people who are similarly struggling :) Let me know if there's something I should change, as I still don't have any experience with it and therefore haven't been able to test everything fully. submitted by /u/Dragonrooster [link] [comments]  ( 88 min )
    CartPole Swing RL task
    I want to implement the CartPole swing up game. Does anyone have references or articles that they can recommend. GitHub’s and articles are preferred. I seen a couple different ones that are “ok". Found this one on GitHub, https://github.com/0xangelo/gym-cartpole-swingup But I am not sure if this is the openai gym version or not. Feedback and suggestions greatly appreciated. I did manage to create my own custom gym environment, but the agent didn’t perform well. I am also assuming it didn’t perform well because I didn’t use a network of any sort. Just pure Q-Learning. If you have a DQN article I can read that would be nice. submitted by /u/Alternative-Price-27 [link] [comments]  ( 88 min )
  • Open

    Best practices for TensorFlow 1.x acceleration training on Amazon SageMaker
    Today, a lot of customers are using TensorFlow to train deep learning models for their clickthrough rate in advertising and personalization recommendations in ecommerce. As the behavior of their clients change, they can accumulate large amounts of new data every day. Model iteration is one of a data scientist’s daily jobs, but they face the […]  ( 13 min )
  • Open

    Help with Neural Neutwork for probability guessing
    Hi! I'm currently developing a NN for a wide range of probabilities. I'm using pytorch and this is the NN (simple): ​ https://preview.redd.it/lr7mc8998pi91.png?width=524&format=png&auto=webp&s=53ce7e0e3f48a3d44ee48edc8da4d93aeddab764 Also i'm applying to the data of the labels, since they are very low probabilities a multiplication factor of 10^8 and then appliging the logarith to the value. (Later on to test and use the tool i will just do the reverse). Nevertheless this Newrok learns a lot from the training set (about 10000) but then never fits the test set. I don't know what to change. It is suppose to be a 50/50 balanced dataset. I really don't know what to do to improve the non-training data accuracy. Any idea is wellcomed submitted by /u/Single_Vermicelli_33 [link] [comments]  ( 88 min )
    PyTorch tutorials on Information Retrieval, specifically Semantic Search
    I've created a PyTorch based repo which tries to cover the current progress in the world of information-retrieval using neural information retrievers / semantic search. Repo: https://github.com/kuutsav/information-retrieval . Most of the content follows the work of Nils Reimers (creator of the sentence_transformers library) and his research group. Topics covered Classic way of information retrieval Evaluation metrics Bi-Encoders Cross-Encoders Multilingual retrieval models Training techniques using no labeled data Domain adaptation - GPL, TSDAE, SimCSE Things to come Vector databases Approximate Nearest Neighbor techniques for quick retrieval submitted by /u/krumb0y [link] [comments]  ( 97 min )
    Backpropagation From Scratch
    submitted by /u/marcos_pereira [link] [comments]  ( 88 min )
  • Open

    NVIDIA to Share New Details on Grace CPU, Hopper GPU, NVLink Switch, Jetson Orin Module at Hot Chips
    In four talks over two days, senior NVIDIA engineers will describe innovations in accelerated computing for modern data centers and systems at the edge of the network. Speaking at a virtual Hot Chips event, an annual gathering of processor and system architects, they’ll disclose performance numbers and other technical details for NVIDIA’s first server CPU, Read article > The post NVIDIA to Share New Details on Grace CPU, Hopper GPU, NVLink Switch, Jetson Orin Module at Hot Chips appeared first on NVIDIA Blog.  ( 6 min )
    Meet the Omnivore: Startup in3D Turns Selfies Into Talking, Dancing Avatars With NVIDIA Omniverse
    Imagine taking a selfie and using it to get a moving, talking, customizable 3D avatar of yourself in just seconds.  The post Meet the Omnivore: Startup in3D Turns Selfies Into Talking, Dancing Avatars With NVIDIA Omniverse appeared first on NVIDIA Blog.  ( 6 min )
  • Open

    How AI & Big Data are Transforming the Healthcare Processes
    In the last 10 years, healthcare has been one of the fastest-growing sectors of the economy, i.e., the global economy as a whole. As…  ( 11 min )
  • Open

    Conflicting Interactions Among Protection Mechanisms for Machine Learning Models. (arXiv:2207.01991v2 [cs.LG] UPDATED)
    Nowadays, systems based on machine learning (ML) are widely used in different domains. Given their popularity, ML models have become targets for various attacks. As a result, research at the intersection of security/privacy and ML has flourished. Typically such work has focused on individual types of security/privacy concerns and mitigations thereof. However, in real-life deployments, an ML model will need to be protected against several concerns simultaneously. A protection mechanism optimal for one security or privacy concern may interact negatively with mechanisms intended to address other concerns. Despite its practical relevance, the potential for such conflicts has not been studied adequately. We first provide a framework for analyzing such "conflicting interactions". We then focus on systematically analyzing pairwise interactions between protection mechanisms for one concern, model and data ownership verification, with two other classes of ML protection mechanisms: differentially private training, and robustness against model evasion. We find that several pairwise interactions result in conflicts. We explore potential approaches for avoiding such conflicts. First, we study the effect of hyperparameter relaxations, finding that there is no sweet spot balancing the performance of both protection mechanisms. Second, we explore if modifying one type of protection mechanism (ownership verification) so as to decouple it from factors that may be impacted by a conflicting mechanism (differentially private training or robustness to model evasion) can avoid conflict. We show that this approach can avoid the conflict between ownership verification mechanisms when combined with differentially private training, but has no effect on robustness to model evasion. Finally, we identify the gaps in the landscape of studying interactions between other types of ML protection mechanisms.  ( 3 min )
    A Framework and Benchmark for Deep Batch Active Learning for Regression. (arXiv:2203.09410v2 [stat.ML] UPDATED)
    The acquisition of labels for supervised learning can be expensive. In order to improve the sample-efficiency of neural network regression, we study active learning methods that adaptively select batches of unlabeled data for labeling. We present a framework for constructing such methods out of (network-dependent) base kernels, kernel transformations and selection methods. Our framework encompasses many existing Bayesian methods based on Gaussian Process approximations of neural networks as well as non-Bayesian methods. Additionally, we propose to replace the commonly used last-layer features with sketched finite-width Neural Tangent Kernels, and to combine them with a novel clustering method. To evaluate different methods, we introduce an open-source benchmark consisting of 15 large tabular regression data sets. Our proposed method outperforms the state-of-the-art on our benchmark, scales to large data sets, and works out-of-the-box without adjusting the network architecture or training code. We provide open-source code that includes efficient implementations of all kernels, kernel transformations, and selection methods, and can be used for reproducing our results.  ( 3 min )
    What Makes the Story Forward? Inferring Commonsense Explanations as Prompts for Future Event Generation. (arXiv:2201.07099v2 [cs.CL] UPDATED)
    Prediction over event sequences is critical for many real-world applications in Information Retrieval and Natural Language Processing. Future Event Generation (FEG) is a challenging task in event sequence prediction because it requires not only fluent text generation but also commonsense reasoning to maintain the logical coherence of the entire event story. In this paper, we propose a novel explainable FEG framework, Coep. It highlights and integrates two types of event knowledge, sequential knowledge of direct event-event relations and inferential knowledge that reflects the intermediate character psychology between events, such as intents, causes, reactions, which intrinsically pushes the story forward. To alleviate the knowledge forgetting issue, we design two modules, Im and Gm, for each type of knowledge, which are combined via prompt tuning. First, Im focuses on understanding inferential knowledge to generate commonsense explanations and provide a soft prompt vector for Gm. We also design a contrastive discriminator for better generalization ability. Second, Gm generates future events by modeling direct sequential knowledge with the guidance of Im. Automatic and human evaluation demonstrate that our approach can generate more coherent, specific, and logical future events.  ( 3 min )
    Lyapunov-Net: A Deep Neural Network Architecture for Lyapunov Function Approximation. (arXiv:2109.13359v2 [cs.LG] UPDATED)
    We develop a versatile deep neural network architecture, called Lyapunov-Net, to approximate Lyapunov functions of dynamical systems in high dimensions. Lyapunov-Net guarantees positive definiteness, and thus it can be easily trained to satisfy the negative orbital derivative condition, which only renders a single term in the empirical risk function in practice. This significantly reduces the number of hyper-parameters compared to existing methods. We also provide theoretical justifications on the approximation power of Lyapunov-Net and its complexity bounds. We demonstrate the efficiency of the proposed method on nonlinear dynamical systems involving up to 30-dimensional state spaces, and show that the proposed approach significantly outperforms the state-of-the-art methods.  ( 2 min )
    Sequence Prediction Under Missing Data : An RNN Approach Without Imputation. (arXiv:2208.08933v1 [cs.LG])
    Missing data scenarios are very common in ML applications in general and time-series/sequence applications are no exceptions. This paper pertains to a novel Recurrent Neural Network (RNN) based solution for sequence prediction under missing data. Our method is distinct from all existing approaches. It tries to encode the missingness patterns in the data directly without trying to impute data either before or during model building. Our encoding is lossless and achieves compression. It can be employed for both sequence classification and forecasting. We focus on forecasting here in a general context of multi-step prediction in presence of possible exogenous inputs. In particular, we propose novel variants of Encoder-Decoder (Seq2Seq) RNNs for this. The encoder here adopts the above mentioned pattern encoding, while at the decoder which has a different structure, multiple variants are feasible. We demonstrate the utility of our proposed architecture via multiple experiments on both single and multiple sequence (real) data-sets. We consider both scenarios where (i)data is naturally missing and (ii)data is synthetically masked.  ( 2 min )
    Graph Coloring with Physics-Inspired Graph Neural Networks. (arXiv:2202.01606v2 [cs.LG] UPDATED)
    We show how graph neural networks can be used to solve the canonical graph coloring problem. We frame graph coloring as a multi-class node classification problem and utilize an unsupervised training strategy based on the statistical physics Potts model. Generalizations to other multi-class problems such as community detection, data clustering, and the minimum clique cover problem are straightforward. We provide numerical benchmark results and illustrate our approach with an end-to-end application for a real-world scheduling use case within a comprehensive encode-process-decode framework. Our optimization approach performs on par or outperforms existing solvers, with the ability to scale to problems with millions of variables.  ( 2 min )
    Few-Shot Forecasting of Time-Series with Heterogeneous Channels. (arXiv:2204.03456v2 [cs.LG] UPDATED)
    Learning complex time series forecasting models usually requires a large amount of data, as each model is trained from scratch for each task/data set. Leveraging learning experience with similar datasets is a well-established technique for classification problems called few-shot classification. However, existing approaches cannot be applied to time-series forecasting because i) multivariate time-series datasets have different channels and ii) forecasting is principally different from classification. In this paper we formalize the problem of few-shot forecasting of time-series with heterogeneous channels for the first time. Extending recent work on heterogeneous attributes in vector data, we develop a model composed of permutation-invariant deep set-blocks which incorporate a temporal embedding. We assemble the first meta-dataset of 40 multivariate time-series datasets and show through experiments that our model provides a good generalization, outperforming baselines carried over from simpler scenarios that either fail to learn across tasks or miss temporal information.  ( 2 min )
    UN-AVOIDS: Unsupervised and Nonparametric Approach for Visualizing Outliers and Invariant Detection Scoring. (arXiv:2111.10010v2 [cs.LG] UPDATED)
    The visualization and detection of anomalies (outliers) are of crucial importance to many fields, particularly cybersecurity. Several approaches have been proposed in these fields, yet to the best of our knowledge, none of them has fulfilled both objectives, simultaneously or cooperatively, in one coherent framework. The visualization methods of these approaches were introduced for explaining the output of a detection algorithm, not for data exploration that facilitates a standalone visual detection. This is our point of departure: UN-AVOIDS, an unsupervised and nonparametric approach for both visualization (a human process) and detection (an algorithmic process) of outliers, that assigns invariant anomalous scores (normalized to $[0,1]$), rather than hard binary-decision. The main aspect of novelty of UN-AVOIDS is that it transforms data into a new space, which is introduced in this paper as neighborhood cumulative density function (NCDF), in which both visualization and detection are carried out. In this space, outliers are remarkably visually distinguishable, and therefore the anomaly scores assigned by the detection algorithm achieved a high area under the ROC curve (AUC). We assessed UN-AVOIDS on both simulated and two recently published cybersecurity datasets, and compared it to three of the most successful anomaly detection methods: LOF, IF, and FABOD. In terms of AUC, UN-AVOIDS was almost an overall winner. The article concludes by providing a preview of new theoretical and practical avenues for UN-AVOIDS. Among them is designing a visualization aided anomaly detection (VAAD), a type of software that aids analysts by providing UN-AVOIDS' detection algorithm (running in a back engine), NCDF visualization space (rendered to plots), along with other conventional methods of visualization in the original feature space, all of which are linked in one interactive environment.  ( 3 min )
    A Framework for Understanding and Visualizing Strategies of RL Agents. (arXiv:2208.08552v1 [cs.AI])
    Recent years have seen significant advances in explainable AI as the need to understand deep learning models has gained importance with the increased emphasis on trust and ethics in AI. Comprehensible models for sequential decision tasks are a particular challenge as they require understanding not only individual predictions but a series of predictions that interact with environmental dynamics. We present a framework for learning comprehensible models of sequential decision tasks in which agent strategies are characterized using temporal logic formulas. Given a set of agent traces, we first cluster the traces using a novel embedding method that captures frequent action patterns. We then search for logical formulas that explain the agent strategies in the different clusters. We evaluate our framework on combat scenarios in StarCraft II (SC2), using traces from a handcrafted expert policy and a trained reinforcement learning agent. We implemented a feature extractor for SC2 environments that extracts traces as sequences of high-level features describing both the state of the environment and the agent's local behavior from agent replays. We further designed a visualization tool depicting the movement of units in the environment that helps understand how different task conditions lead to distinct agent behavior patterns in each trace cluster. Experimental results show that our framework is capable of separating agent traces into distinct groups of behaviors for which our approach to strategy inference produces consistent, meaningful, and easily understood strategy descriptions.  ( 3 min )
    Cooperate or Compete: A New Perspective on Training of Generative Networks. (arXiv:2207.02192v5 [cs.LG] UPDATED)
    GANs have two competing modules: the generator module is trained to generate new examples, and the discriminator module is trained to discriminate real examples from generated examples. The training procedure of GAN is modeled as a finitely repeated simultaneous game. Each module tries to increase its performance at every repetition of the base game (at every batch of training data) in a non-cooperative manner. We observed that each module can perform better and learn faster if training is modeled as an infinitely repeated simultaneous game. At every repetition of the base game (at every batch of training data) the stronger module (whose performance is increased or remains the same compared to the previous batch of training data) cooperates with the weaker module (whose performance is decreased compared to the previous batch of training data) and only the weaker module is allowed to increase its performance.  ( 2 min )
    Selective Classification Via Neural Network Training Dynamics. (arXiv:2205.13532v2 [cs.LG] UPDATED)
    Selective classification is the task of rejecting inputs a model would predict incorrectly on through a trade-off between input space coverage and model accuracy. Current methods for selective classification impose constraints on either the model architecture or the loss function; this inhibits their usage in practice. In contrast to prior work, we show that state-of-the-art selective classification performance can be attained solely from studying the (discretized) training dynamics of a model. We propose a general framework that, for a given test input, monitors metrics capturing the disagreement with the final predicted label over intermediate models obtained during training; we then reject data points exhibiting too much disagreement at late stages in training. In particular, we instantiate a method that tracks when the label predicted during training stops disagreeing with the final predicted label. Our experimental evaluation shows that our method achieves state-of-the-art accuracy/coverage trade-offs on typical selective classification benchmarks.  ( 2 min )
    Forgetting and Imbalance in Robot Lifelong Learning with Off-policy Data. (arXiv:2204.05893v2 [cs.RO] UPDATED)
    Robots will experience non-stationary environment dynamics throughout their lifetime: the robot dynamics can change due to wear and tear, or its surroundings may change over time. Eventually, the robots should perform well in all of the environment variations it has encountered. At the same time, it should still be able to learn fast in a new environment. We identify two challenges in Reinforcement Learning (RL) under such a lifelong learning setting with off-policy data: first, existing off-policy algorithms struggle with the trade-off between being conservative to maintain good performance in the old environment and learning efficiently in the new environment, despite keeping all the data in the replay buffer. We propose the Offline Distillation Pipeline to break this trade-off by separating the training procedure into an online interaction phase and an offline distillation phase.Second, we find that training with the imbalanced off-policy data from multiple environments across the lifetime creates a significant performance drop. We identify that this performance drop is caused by the combination of the imbalanced quality and size among the datasets which exacerbate the extrapolation error of the Q-function. During the distillation phase, we apply a simple fix to the issue by keeping the policy closer to the behavior policy that generated the data. In the experiments, we demonstrate these two challenges and the proposed solutions with a simulated bipedal robot walk-ing task across various environment changes. We show that the Offline Distillation Pipeline achieves better performance across all the encountered environments without affecting data collection. We also provide a comprehensive empirical study to support our hypothesis on the data imbalance issue.  ( 3 min )
    Designing Reinforcement Learning Algorithms for Digital Interventions: Pre-implementation Guidelines. (arXiv:2206.03944v3 [cs.LG] UPDATED)
    Online reinforcement learning (RL) algorithms are increasingly used to personalize digital interventions in the fields of mobile health and online education. Common challenges in designing and testing an RL algorithm in these settings include ensuring the RL algorithm can learn and run stably under real-time constraints, and accounting for the complexity of the environment, e.g., a lack of accurate mechanistic models for the user dynamics. To guide how one can tackle these challenges, we extend the PCS (Predictability, Computability, Stability) framework, a data science framework that incorporates best practices from machine learning and statistics in supervised learning (Yu and Kumbier, 2020), to the design of RL algorithms for the digital interventions setting. Further, we provide guidelines on how to design simulation environments, a crucial tool for evaluating RL candidate algorithms using the PCS framework. We illustrate the use of the PCS framework for designing an RL algorithm for Oralytics, a mobile health study aiming to improve users' tooth-brushing behaviors through the personalized delivery of intervention messages. Oralytics will go into the field in late 2022.  ( 3 min )
    Momentum-Based Policy Gradient with Second-Order Information. (arXiv:2205.08253v2 [cs.LG] UPDATED)
    Variance-reduced gradient estimators for policy gradient methods have been one of the main focus of research in the reinforcement learning in recent years as they allow acceleration of the estimation process. We propose a variance-reduced policy-gradient method, called SHARP, which incorporates second-order information into stochastic gradient descent (SGD) using momentum with a time-varying learning rate. SHARP algorithm is parameter-free, achieving $\epsilon$-approximate first-order stationary point with $O(\epsilon^{-3})$ number of trajectories, while using a batch size of $O(1)$ at each iteration. Unlike most previous work, our proposed algorithm does not require importance sampling which can compromise the advantage of variance reduction process. Moreover, the variance of estimation error decays with the fast rate of $O(1/t^{2/3})$ where $t$ is the number of iterations. Our extensive experimental evaluations show the effectiveness of the proposed algorithm on various control tasks and its advantage over the state of the art in practice.  ( 2 min )
    Semi-self-supervised Automated ICD Coding. (arXiv:2205.10088v2 [cs.CL] UPDATED)
    Clinical Text Notes (CTNs) contain physicians' reasoning process, written in an unstructured free text format, as they examine and interview patients. In recent years, several studies have been published that provide evidence for the utility of machine learning for predicting doctors' diagnoses from CTNs, a task known as ICD coding. Data annotation is time consuming, particularly when a degree of specialization is needed, as is the case for medical data. This paper presents a method of augmenting a sparsely annotated dataset of Icelandic CTNs with a machine-learned imputation in a semi-self-supervised manner. We train a neural network on a small set of annotated CTNs and use it to extract clinical features from a set of un-annotated CTNs. These clinical features consist of answers to about a thousand potential questions that a physician might find the answers to during a consultation of a patient. The features are then used to train a classifier for the diagnosis of certain types of diseases. We report the results of an evaluation of this data augmentation method over three tiers of data availability to the physician. Our data augmentation method shows a significant positive effect which is diminished when clinical features from the examination of the patient and diagnostics are made available. We recommend our method for augmenting scarce datasets for systems that take decisions based on clinical features that do not include examinations or tests.  ( 3 min )
    Rethinking Spatial Invariance of Convolutional Networks for Object Counting. (arXiv:2206.05253v2 [cs.CV] UPDATED)
    Previous work generally believes that improving the spatial invariance of convolutional networks is the key to object counting. However, after verifying several mainstream counting networks, we surprisingly found too strict pixel-level spatial invariance would cause overfit noise in the density map generation. In this paper, we try to use locally connected Gaussian kernels to replace the original convolution filter to estimate the spatial position in the density map. The purpose of this is to allow the feature extraction process to potentially stimulate the density map generation process to overcome the annotation noise. Inspired by previous work, we propose a low-rank approximation accompanied with translation invariance to favorably implement the approximation of massive Gaussian convolution. Our work points a new direction for follow-up research, which should investigate how to properly relax the overly strict pixel-level spatial invariance for object counting. We evaluate our methods on 4 mainstream object counting networks (i.e., MCNN, CSRNet, SANet, and ResNet-50). Extensive experiments were conducted on 7 popular benchmarks for 3 applications (i.e., crowd, vehicle, and plant counting). Experimental results show that our methods significantly outperform other state-of-the-art methods and achieve promising learning of the spatial position of objects.  ( 3 min )
    Online Allocation Problem with Two-sided Resource Constraints: Fairness and Provable Guarantees. (arXiv:2112.13964v2 [cs.LG] UPDATED)
    In this paper, we investigate the online allocation problem of maximizing the overall revenue subject to both lower and upper bound constraints. Compared to the extensively studied online problems with only resource upper bounds, the two-sided constraints affect the prospects of resource consumption more severely. As a result, only limited violation of constraints or pessimistic competitive bounds could be guaranteed. To tackle the challenge, we define a measure of feasibility $\xi^*$ to evaluate the hardness of this problem, and estimate this measurement by an optimization routine with theoretical guarantees. We propose an online algorithm adopting a constructive framework, where we initialize a threshold price vector using the estimation, then dynamically update the price vector and use it for decision making at each step. It can be shown that the proposed algorithm is $\big(1-O(\frac{\varepsilon}{\xi^*-\varepsilon})\big)$ or $\big(1-O(\frac{\varepsilon}{\xi^*-\sqrt{\varepsilon}})\big)$ competitive with high probability for $\xi^*$ known or unknown respectively. To the best of our knowledge, this is the first result establishing a nearly optimal competitive algorithm for solving two-sided constrained online allocation problems with high probability of feasibility.  ( 3 min )
    Silicon Photonic Architecture for Training Deep Neural Networks with Direct Feedback Alignment. (arXiv:2111.06862v2 [cs.LG] UPDATED)
    There has been growing interest in using photonic processors for performing neural network inference operations; however, these networks are currently trained using standard digital electronics. Here, we propose on-chip training of neural networks enabled by a CMOS-compatible silicon photonic architecture to harness the potential for massively parallel, efficient, and fast data operations. Our scheme employs the direct feedback alignment training algorithm, which trains neural networks using error feedback rather than error backpropagation, and can operate at speeds of trillions of multiply-accumulate (MAC) operations per second while consuming less than one picojoule per MAC operation. The photonic architecture exploits parallelized matrix-vector multiplications using arrays of microring resonators for processing multi-channel analog signals along single waveguide buses to calculate the gradient vector for each neural network layer in situ. We also experimentally demonstrate training deep neural networks with the MNIST dataset using on-chip MAC operation results. Our novel approach for efficient, ultra-fast neural network training showcases photonics as a promising platform for executing AI applications.  ( 3 min )
    Data-driven End-to-end Learning of Pole Placement Control for Nonlinear Dynamics via Koopman Invariant Subspaces. (arXiv:2208.08883v1 [eess.SY])
    We propose a data-driven method for controlling the frequency and convergence rate of black-box nonlinear dynamical systems based on the Koopman operator theory. With the proposed method, a policy network is trained such that the eigenvalues of a Koopman operator of controlled dynamics are close to the target eigenvalues. The policy network consists of a neural network to find a Koopman invariant subspace, and a pole placement module to adjust the eigenvalues of the Koopman operator. Since the policy network is differentiable, we can train it in an end-to-end fashion using reinforcement learning. We demonstrate that the proposed method achieves better performance than model-free reinforcement learning and model-based control with system identification.  ( 2 min )
    NetKet 3: Machine Learning Toolbox for Many-Body Quantum Systems. (arXiv:2112.10526v2 [quant-ph] UPDATED)
    We introduce version 3 of NetKet, the machine learning toolbox for many-body quantum physics. NetKet is built around neural-network quantum states and provides efficient algorithms for their evaluation and optimization. This new version is built on top of JAX, a differentiable programming and accelerated linear algebra framework for the Python programming language. The most significant new feature is the possibility to define arbitrary neural network ans\"atze in pure Python code using the concise notation of machine-learning frameworks, which allows for just-in-time compilation as well as the implicit generation of gradients thanks to automatic differentiation. NetKet 3 also comes with support for GPU and TPU accelerators, advanced support for discrete symmetry groups, chunking to scale up to thousands of degrees of freedom, drivers for quantum dynamics applications, and improved modularity, allowing users to use only parts of the toolbox as a foundation for their own code.
    Causal Reasoning Meets Visual Representation Learning: A Prospective Study. (arXiv:2204.12037v7 [cs.CV] UPDATED)
    Visual representation learning is ubiquitous in various real-world applications, including visual comprehension, video understanding, multi-modal analysis, human-computer interaction, and urban computing. Due to the emergence of huge amounts of multi-modal heterogeneous spatial/temporal/spatial-temporal data in big data era, the lack of interpretability, robustness, and out-of-distribution generalization are becoming the challenges of the existing visual models. The majority of the existing methods tend to fit the original data/variable distributions and ignore the essential causal relations behind the multi-modal knowledge, which lacks unified guidance and analysis about why modern visual representation learning methods easily collapse into data bias and have limited generalization and cognitive abilities. Inspired by the strong inference ability of human-level agents, recent years have therefore witnessed great effort in developing causal reasoning paradigms to realize robust representation and model learning with good cognitive ability. In this paper, we conduct a comprehensive review of existing causal reasoning methods for visual representation learning, covering fundamental theories, models, and datasets. The limitations of current methods and datasets are also discussed. Moreover, we propose some prospective challenges, opportunities, and future research directions for benchmarking causal reasoning algorithms in visual representation learning. This paper aims to provide a comprehensive overview of this emerging field, attract attention, encourage discussions, bring to the forefront the urgency of developing novel causal reasoning methods, publicly available benchmarks, and consensus-building standards for reliable visual representation learning and related real-world applications more efficiently.
    Convergence Rates for Stochastic Approximation on a Boundary. (arXiv:2208.07243v2 [stat.ML] UPDATED)
    We analyze the behavior of projected stochastic gradient descent focusing on the case where the optimum is on the boundary of the constraint set and the gradient does not vanish at the optimum. Here iterates may in expectation make progress against the objective at each step. When this and an appropriate moment condition on noise holds, we prove that the convergence rate to the optimum of the constrained stochastic gradient descent will be different and typically be faster than the unconstrained stochastic gradient descent algorithm. Our results argue that the concentration around the optimum is exponentially distributed rather than normally distributed, which typically determines the limiting convergence in the unconstrained case. The methods that we develop rely on a geometric ergodicity proof. This extends a result on Markov chains by Hajek (1982) to the area of stochastic approximation algorithms. As examples, we show how the results apply to linear programming and tabular reinforcement learning.
    How many perturbations break this model? Evaluating robustness beyond adversarial accuracy. (arXiv:2207.04129v2 [cs.LG] UPDATED)
    Robustness to adversarial attack is typically evaluated with adversarial accuracy. This metric quantifies the number of points for which, given a threat model, successful adversarial perturbations cannot be found. While essential, this metric does not capture all aspects of robustness and in particular leaves out the question of how many perturbations can be found for each point. In this work we introduce an alternative approach, adversarial sparsity, which quantifies how difficult it is to find a successful perturbation given both an input point and a constraint on the direction of the perturbation. This constraint may be angular (L2 perturbations), or based on the number of pixels (Linf perturbations). We show that sparsity provides valuable insight on neural networks in multiple ways. analyzing the sparsity of existing robust models illustrates important differences between them that accuracy analysis does not, and suggests approaches for improving their robustness. When applying broken defenses effective against weak attacks but not strong ones, sparsity can discriminate between the totally ineffective and the partially effective defenses. Finally, with sparsity we can measure increases in robustness that do not affect accuracy: we show for example that data augmentation can by itself increase adversarial robustness, without using adversarial training.
    improving the Diversity of Bootstrapped DQN by Replacing Priors With Noise. (arXiv:2203.01004v2 [cs.LG] UPDATED)
    Q-learning is one of the most well-known Reinforcement Learning algorithms. There have been tremendous efforts to develop this algorithm using neural networks. Bootstrapped Deep Q-Learning Network is amongst them. It utilizes multiple neural network heads to introduce diversity into Q-learning. Diversity can sometimes be viewed as the amount of reasonable moves an agent can take at a given state, analogous to the definition of the exploration ratio in RL. Thus, the performance of Bootstrapped Deep Q-Learning Network is deeply connected with the level of diversity within the algorithm. In the original research, it was pointed out that a random prior could improve the performance of the model. In this article, we further explore the possibility of replacing priors with noise and sample the noise from a Gaussian distribution to introduce more diversity into this algorithm. We conduct our experiment on the Atari benchmark and compare our algorithm to both the original and other related algorithms. The results show that our modification of the Bootstrapped Deep Q-Learning algorithm achieves significantly higher evaluation scores across different types of Atari games. Thus, we conclude that replacing priors with noise can improve Bootstrapped Deep Q-Learning's performance by ensuring the integrity of diversities.
    One-Step Abductive Multi-Target Learning with Diverse Noisy Samples and Its Application to Tumour Segmentation for Breast Cancer. (arXiv:2110.10325v6 [cs.LG] UPDATED)
    Recent studies have demonstrated the effectiveness of the combination of machine learning and logical reasoning, including data-driven logical reasoning, knowledge driven machine learning and abductive learning, in inventing advanced artificial intelligence technologies. One-step abductive multi-target learning (OSAMTL), an approach inspired by abductive learning, via simply combining machine learning and logical reasoning in a one-step balanced way, has as well shown its effectiveness in handling complex noisy labels of a single noisy sample in medical histopathology whole slide image analysis (MHWSIA). However, OSAMTL is not suitable for the situation where diverse noisy samples (DiNS) are provided for a learning task. In this paper, giving definition of DiNS, we propose one-step abductive multi-target learning with DiNS (OSAMTL-DiNS) to expand the original OSAMTL to handle complex noisy labels of DiNS. Applying OSAMTL-DiNS to tumour segmentation for breast cancer in MHWSIA, we show that OSAMTL-DiNS is able to enable various state-of-the-art approaches for learning from noisy labels to achieve more rational predictions.
    A spatiotemporal machine learning approach to forecasting COVID-19 incidence at the county level in the USA. (arXiv:2109.12094v4 [stat.ML] UPDATED)
    With COVID-19 affecting every country globally and changing everyday life, the ability to forecast the spread of the disease is more important than any previous epidemic. The conventional methods of disease-spread modeling, compartmental models, are based on the assumption of spatiotemporal homogeneity of the spread of the virus, which may cause forecasting to underperform, especially at high spatial resolutions. In this paper we approach the forecasting task with an alternative technique - spatiotemporal machine learning. We present COVID-LSTM, a data-driven model based on a Long Short-term Memory deep learning architecture for forecasting COVID-19 incidence at the county-level in the US. We use the weekly number of new positive cases as temporal input, and hand-engineered spatial features from Facebook movement and connectedness datasets to capture the spread of the disease in time and space. COVID-LSTM outperforms the COVID-19 Forecast Hub's Ensemble model (COVIDhub-ensemble) on our 17-week evaluation period, making it the first model to be more accurate than the COVIDhub-ensemble over one or more forecast periods. Over the 4-week forecast horizon, our model is on average 50 cases per county more accurate than the COVIDhub-ensemble. We highlight that the underutilization of data-driven forecasting of disease spread prior to COVID-19 is likely due to the lack of sufficient data available for previous diseases, in addition to the recent advances in machine learning methods for spatiotemporal forecasting. We discuss the impediments to the wider uptake of data-driven forecasting, and whether it is likely that more deep learning-based models will be used in the future.
    t-METASET: Tailoring Property Bias of Large-Scale Metamaterial Datasets through Active Learning. (arXiv:2202.10565v2 [cs.CE] UPDATED)
    Inspired by the recent achievements of machine learning in diverse domains, data-driven metamaterials design has emerged as a compelling paradigm that can unlock the potential of multiscale architectures. The model-centric research trend, however, lacks principled frameworks dedicated to data acquisition, whose quality propagates into the downstream tasks. Often built by naive space-filling design in shape descriptor space, metamaterial datasets suffer from property distributions that are either highly imbalanced or at odds with design tasks of interest. To this end, we present t-METASET: an active-learning-based data acquisition framework aiming to guide both diverse and task-aware data generation. Distinctly, we seek a solution to a commonplace yet frequently overlooked scenario at early stages of data-driven design of metamaterials: when a massive (~O(10^4 )) shape-only library has been prepared with no properties evaluated. The key idea is to harness a data-driven shape descriptor learned from generative models, fit a sparse regressor as a start-up agent, and leverage metrics related to diversity to drive data acquisition to areas that help designers fulfill design goals. We validate the proposed framework in three deployment cases, which encompass general use, task-specific use, and tailorable use. Two large-scale mechanical metamaterial datasets are used to demonstrate the efficacy. Applicable to general image-based design representations, t-METASET could boost future advancements in data-driven design.
    High Dimensional Statistical Estimation under Uniformly Dithered One-bit Quantization. (arXiv:2202.13157v2 [stat.ML] UPDATED)
    In this paper, we propose a uniformly dithered one-bit quantization scheme for high-dimensional statistical estimation. The scheme contains truncation, dithering, and quantization as typical steps. As canonical examples, the quantization scheme is applied to three estimation problems: sparse covariance matrix estimation, sparse linear regression, and matrix completion. We study both sub-Gaussian and heavy-tailed regimes, with the underlying distribution of heavy-tailed data assumed to possess bounded second or fourth moment. For each model we propose new estimators based on one-bit quantized data. In sub-Gaussian regime, our estimators achieve optimal minimax rates up to logarithmic factors, which indicates that our quantization scheme nearly introduces no additional cost. In heavy-tailed regime, while the rates of our estimators become essentially slower, these results are either the first ones in such one-bit quantized and heavy-tailed setting, or exhibit significant improvements over existing comparable results. Moreover, we contribute considerably to the problems of one-bit compressed sensing and one-bit matrix completion. Specifically, we extend one-bit compressed sensing to sub-Gaussian or even heavy-tailed sensing vectors via convex programming. For one-bit matrix completion, our method is essentially different from the standard likelihood approach and can handle pre-quantization random noise with unknown distribution. Experimental results on synthetic data are presented to support our theoretical analysis.
    Motley: Benchmarking Heterogeneity and Personalization in Federated Learning. (arXiv:2206.09262v5 [cs.LG] UPDATED)
    Personalized federated learning considers learning models unique to each client in a heterogeneous network. The resulting client-specific models have been purported to improve metrics such as accuracy, fairness, and robustness in federated networks. However, despite a plethora of work in this area, it remains unclear: (1) which personalization techniques are most effective in various settings, and (2) how important personalization truly is for realistic federated applications. To better answer these questions, we propose Motley, a benchmark for personalized federated learning. Motley consists of a suite of cross-device and cross-silo federated datasets from varied problem domains, as well as thorough evaluation metrics for better understanding the possible impacts of personalization. We establish baselines on the benchmark by comparing a number of representative personalized federated learning methods. These initial results highlight strengths and weaknesses of existing approaches, and raise several open questions for the community. Motley aims to provide a reproducible means with which to advance developments in personalized and heterogeneity-aware federated learning, as well as the related areas of transfer learning, meta-learning, and multi-task learning.
    PADA: Pruning Assisted Domain Adaptation for Self-Supervised Speech Representations. (arXiv:2203.16965v2 [cs.CL] UPDATED)
    While self-supervised speech representation learning (SSL) models serve a variety of downstream tasks, these models have been observed to overfit to the domain from which the unlabelled data originates. To alleviate this issue, we propose PADA (Pruning Assisted Domain Adaptation) and zero out redundant weights from models pre-trained on large amounts of out-of-domain (OOD) data. Intuitively, this helps to make space for the target-domain ASR finetuning. The redundant weights can be identified through various pruning strategies which have been discussed in detail as a part of this work. Specifically, we investigate the effect of the recently discovered Task-Agnostic and Task-Aware pruning on PADA and propose a new pruning paradigm based on the latter, which we call Cross-Domain Task-Aware Pruning (CD-TAW). CD-TAW obtains the initial pruning mask from a well fine-tuned OOD model, which makes it starkly different from the rest of the pruning strategies discussed in the paper. Our proposed CD-TAW methodology achieves up to 20.6% relative WER improvement over our baseline when fine-tuned on a 2-hour subset of Switchboard data without language model (LM) decoding. Furthermore, we conduct a detailed analysis to highlight the key design choices of our proposed method.
    View-labels Are Indispensable: A Multifacet Complementarity Study of Multi-view Clustering. (arXiv:2205.02507v2 [cs.LG] UPDATED)
    Consistency and complementarity are two key ingredients for boosting multi-view clustering (MVC). Recently with the introduction of popular contrastive learning, the consistency learning of views has been further enhanced in MVC, leading to promising performance. However, by contrast, the complementarity has not received sufficient attention except just in the feature facet, where the Hilbert Schmidt Independence Criterion (HSIC) term or the independent encoder-decoder network is usually adopted to capture view-specific information. This motivates us to reconsider the complementarity learning of views comprehensively from multiple facets including the feature-, view-label- and contrast- facets, while maintaining the view consistency. We empirically find that all the facets contribute to the complementarity learning, especially the view-label facet, which is usually neglected by existing methods. Based on this, we develop a novel \underline{M}ultifacet \underline{C}omplementarity learning framework for \underline{M}ulti-\underline{V}iew \underline{C}lustering (MCMVC), which fuses multifacet complementarity information, especially explicitly embedding the view-label information. To our best knowledge, it is the first time to use view-labels explicitly to guide the complementarity learning of views. Compared with the SOTA baseline, MCMVC achieves remarkable improvements, e.g., by average margins over $5.00\%$ and $7.00\%$ respectively in complete and incomplete MVC settings on Caltech101-20 in terms of three evaluation metrics.
    Transformers and the representation of biomedical background knowledge. (arXiv:2202.02432v3 [cs.CL] UPDATED)
    Specialised transformers-based models (such as BioBERT and BioMegatron) are adapted for the biomedical domain based on publicly available biomedical corpora. As such, they have the potential to encode large-scale biological knowledge. We investigate the encoding and representation of biological knowledge in these models, and its potential utility to support inference in cancer precision medicine - namely, the interpretation of the clinical significance of genomic alterations. We compare the performance of different transformer baselines; we use probing to determine the consistency of encodings for distinct entities; and we use clustering methods to compare and contrast the internal properties of the embeddings for genes, variants, drugs and diseases. We show that these models do indeed encode biological knowledge, although some of this is lost in fine-tuning for specific tasks. Finally, we analyse how the models behave with regard to biases and imbalances in the dataset.
    Cross-Silo Heterogeneous Model Federated Multitask Learning. (arXiv:2202.08603v5 [cs.LG] UPDATED)
    Federated learning (FL) is a machine learning technique that enables participants to collaboratively train high-quality models without exchanging their private data. Participants utilizing cross-silo federated learning (CS-FL) settings are independent organizations with different task needs, and they are concerned not only with data privacy but also with independently training their unique models due to intellectual property considerations. Most existing FL methods are incapable of satisfying the above scenarios. In this study, we present a novel federated learning method CoFED based on unlabeled data pseudolabeling via a process known as cotraining. CoFED is a federated learning method that is compatible with heterogeneous models, tasks, and training processes. The experimental results suggest that the proposed method outperforms competing ones. This is especially true for non-independent and identically distributed settings and heterogeneous models, where the proposed method achieves a 35% performance improvement.
    Low Emission Building Control with Zero-Shot Reinforcement Learning. (arXiv:2206.14191v2 [eess.SY] UPDATED)
    Heating and cooling systems in buildings account for 31% of global energy use, much of which are regulated by Rule Based Controllers (RBCs) that neither maximise energy efficiency nor minimise emissions by interacting optimally with the grid. Control via Reinforcement Learning (RL) has been shown to significantly improve building energy efficiency, but existing solutions require access to building-specific simulators or data that cannot be expected for every building in the world. In response, we show it is possible to obtain emission-reducing policies without such knowledge a priori--a paradigm we call zero-shot building control. We combine ideas from system identification and model-based RL to create PEARL (Probabilistic Emission-Abating Reinforcement Learning) and show that a short period of active exploration is all that is required to build a performant model. In experiments across three varied building energy simulations, we show PEARL outperforms an existing RBC once, and popular RL baselines in all cases, reducing building emissions by as much as 31% whilst maintaining thermal comfort. Our source code is available online via https://enjeeneer.io/projects/pearl/
    Beyond the Hype: A Real-World Evaluation of the Impact and Cost of Machine Learning-Based Malware Detection. (arXiv:2012.09214v4 [cs.CR] UPDATED)
    In this paper, we present a scientific evaluation of four prominent malware detection tools to assist an organization with two primary questions: To what extent do ML-based tools accurately classify previously- and never-before-seen files? Is it worth purchasing a network-level malware detector? To identify weaknesses, we tested each tool against 3,536 total files (2,554 or 72\% malicious, 982 or 28\% benign) of a variety of file types, including hundreds of malicious zero-days, polyglots, and APT-style files, delivered on multiple protocols. We present statistical results on detection time and accuracy, consider complementary analysis (using multiple tools together), and provide two novel applications of the recent cost-benefit evaluation procedure of Iannacone \& Bridges. While the ML-based tools are more effective at detecting zero-day files and executables, the signature-based tool may still be an overall better option. Both network-based tools provide substantial (simulated) savings when paired with either host tool, yet both show poor detection rates on protocols other than HTTP or SMTP. Our results show that all four tools have near-perfect precision but alarmingly low recall, especially on file types other than executables and office files -- 37% of malware tested, including all polyglot files, were undetected. Priorities for researchers and takeaways for end users are given.
    Finding and Fixing Spurious Patterns with Explanations. (arXiv:2106.02112v3 [cs.LG] UPDATED)
    Image classifiers often use spurious patterns, such as "relying on the presence of a person to detect a tennis racket, which do not generalize. In this work, we present an end-to-end pipeline for identifying and mitigating spurious patterns for such models, under the assumption that we have access to pixel-wise object-annotations. We start by identifying patterns such as "the model's prediction for tennis racket changes 63% of the time if we hide the people." Then, if a pattern is spurious, we mitigate it via a novel form of data augmentation. We demonstrate that our method identifies a diverse set of spurious patterns and that it mitigates them by producing a model that is both more accurate on a distribution where the spurious pattern is not helpful and more robust to distribution shift.
    Ensemble learning using individual neonatal data for seizure detection. (arXiv:2204.07043v2 [eess.SP] UPDATED)
    Sharing medical data between institutions is difficult in practice due to data protection laws and official procedures within institutions. Therefore, most existing algorithms are trained on relatively small electroencephalogram (EEG) data sets which is likely to be detrimental to prediction accuracy. In this work, we simulate a case when the data can not be shared by splitting the publicly available data set into disjoint sets representing data in individual institutions. We propose to train a (local) detector in each institution and aggregate their individual predictions into one final prediction. Four aggregation schemes are compared, namely, the majority vote, the mean, the weighted mean and the Dawid-Skene method. The method was validated on an independent data set using only a subset of EEG channels. The ensemble reaches accuracy comparable to a single detector trained on all the data when sufficient amount of data is available in each institution. The weighted mean aggregation scheme showed best performance, it was only marginally outperformed by the Dawid--Skene method when local detectors approach performance of a single detector trained on all available data.
    Discovering Bugs in Vision Models using Off-the-shelf Image Generation and Captioning. (arXiv:2208.08831v1 [cs.CV])
    Automatically discovering failures in vision models under real-world settings remains an open challenge. This work demonstrates how off-the-shelf, large-scale, image-to-text and text-to-image models, trained on vast amounts of data, can be leveraged to automatically find such failures. In essence, a conditional text-to-image generative model is used to generate large amounts of synthetic, yet realistic, inputs given a ground-truth label. Misclassified inputs are clustered and a captioning model is used to describe each cluster. Each cluster's description is used in turn to generate more inputs and assess whether specific clusters induce more failures than expected. We use this pipeline to demonstrate that we can effectively interrogate classifiers trained on ImageNet to find specific failure cases and discover spurious correlations. We also show that we can scale the approach to generate adversarial datasets targeting specific classifier architectures. This work serves as a proof-of-concept demonstrating the utility of large-scale generative models to automatically discover bugs in vision models in an open-ended manner. We also describe a number of limitations and pitfalls related to this approach.
    ObfuNAS: A Neural Architecture Search-based DNN Obfuscation Approach. (arXiv:2208.08569v1 [cs.CR])
    Malicious architecture extraction has been emerging as a crucial concern for deep neural network (DNN) security. As a defense, architecture obfuscation is proposed to remap the victim DNN to a different architecture. Nonetheless, we observe that, with only extracting an obfuscated DNN architecture, the adversary can still retrain a substitute model with high performance (e.g., accuracy), rendering the obfuscation techniques ineffective. To mitigate this under-explored vulnerability, we propose ObfuNAS, which converts the DNN architecture obfuscation into a neural architecture search (NAS) problem. Using a combination of function-preserving obfuscation strategies, ObfuNAS ensures that the obfuscated DNN architecture can only achieve lower accuracy than the victim. We validate the performance of ObfuNAS with open-source architecture datasets like NAS-Bench-101 and NAS-Bench-301. The experimental results demonstrate that ObfuNAS can successfully find the optimal mask for a victim model within a given FLOPs constraint, leading up to 2.6% inference accuracy degradation for attackers with only 0.14x FLOPs overhead. The code is available at: https://github.com/Tongzhou0101/ObfuNAS.
    Global Convergence of Two-timescale Actor-Critic for Solving Linear Quadratic Regulator. (arXiv:2208.08744v1 [cs.LG])
    The actor-critic (AC) reinforcement learning algorithms have been the powerhouse behind many challenging applications. Nevertheless, its convergence is fragile in general. To study its instability, existing works mostly consider the uncommon double-loop variant or basic models with finite state and action space. We investigate the more practical single-sample two-timescale AC for solving the canonical linear quadratic regulator (LQR) problem, where the actor and the critic update only once with a single sample in each iteration on an unbounded continuous state and action space. Existing analysis cannot conclude the convergence for such a challenging case. We develop a new analysis framework that allows establishing the global convergence to an $\epsilon$-optimal solution with at most an $\tilde{\mathcal{O}}(\epsilon^{-2.5})$ sample complexity. To our knowledge, this is the first finite-time convergence analysis for the single sample two-timescale AC for solving LQR with global optimality. The sample complexity improves those of other variants by orders, which sheds light on the practical wisdom of single sample algorithms. We also further validate our theoretical findings via comprehensive simulation comparisons.
    Study of General Robust Subband Adaptive Filtering. (arXiv:2208.08856v1 [eess.SP])
    In this paper, we propose a general robust subband adaptive filtering (GR-SAF) scheme against impulsive noise by minimizing the mean square deviation under the random-walk model with individual weight uncertainty. Specifically, by choosing different scaling factors such as from the M-estimate and maximum correntropy robust criteria in the GR-SAF scheme, we can easily obtain different GR-SAF algorithms. Importantly, the proposed GR-SAF algorithm can be reduced to a variable regularization robust normalized SAF algorithm, thus having fast convergence rate and low steady-state error. Simulations in the contexts of system identification with impulsive noise and echo cancellation with double-talk have verified that the proposed GR-SAF algorithms outperforms its counterparts.
    Network inference via process motifs for lagged correlation in linear stochastic processes. (arXiv:2208.08871v1 [stat.ML])
    A major challenge for causal inference from time-series data is the trade-off between computational feasibility and accuracy. Motivated by process motifs for lagged covariance in an autoregressive model with slow mean-reversion, we propose to infer networks of causal relations via pairwise edge measure (PEMs) that one can easily compute from lagged correlation matrices. Motivated by contributions of process motifs to covariance and lagged variance, we formulate two PEMs that correct for confounding factors and for reverse causation. To demonstrate the performance of our PEMs, we consider network interference from simulations of linear stochastic processes, and we show that our proposed PEMs can infer networks accurately and efficiently. Specifically, for slightly autocorrelated time-series data, our approach achieves accuracies higher than or similar to Granger causality, transfer entropy, and convergent crossmapping -- but with much shorter computation time than possible with any of these methods. Our fast and accurate PEMs are easy-to-implement methods for network inference with a clear theoretical underpinning. They provide promising alternatives to current paradigms for the inference of linear models from time-series data, including Granger causality, vector-autoregression, and sparse inverse covariance estimation.
    Long-term dynamics of fairness: understanding the impact of data-driven targeted help on job seekers. (arXiv:2208.08881v1 [cs.LG])
    The use of data-driven decision support by public agencies is becoming more widespread and already influences the allocation of public resources. This raises ethical concerns, as it has adversely affected minorities and historically discriminated groups. In this paper, we use an approach that combines statistics and machine learning with dynamical modeling to assess long-term fairness effects of labor market interventions. Specifically, we develop and use a model to investigate the impact of decisions caused by a public employment authority that selectively supports job-seekers through targeted help. The selection of who receives what help is based on a data-driven intervention model that estimates an individual's chances of finding a job in a timely manner and is based on data that describes a population in which skills relevant to the labor market are unevenly distributed between two groups (e.g., males and females). The intervention model has incomplete access to the individual's actual skills and can augment this with knowledge of the individual's group affiliation, thus using a protected attribute to increase predictive accuracy. We assess this intervention model's dynamics -- especially fairness-related issues and trade-offs between different fairness goals -- over time and compare it to an intervention model that does not use group affiliation as a predictive feature. We conclude that in order to quantify the trade-off correctly and to assess the long-term fairness effects of such a system in the real-world, careful modeling of the surrounding labor market is indispensable.
    An intertwined neural network model for EEG classification in brain-computer interfaces. (arXiv:2208.08860v1 [eess.SP])
    The brain computer interface (BCI) is a nonstimulatory direct and occasionally bidirectional communication link between the brain and a computer or an external device. Classically, EEG-based BCI algorithms have relied on models such as support vector machines and linear discriminant analysis or multiclass common spatial patterns. During the last decade, however, more sophisticated machine learning architectures, such as convolutional neural networks, recurrent neural networks, long short-term memory networks and gated recurrent unit networks, have been extensively used to enhance discriminability in multiclass BCI tasks. Additionally, preprocessing and denoising of EEG signals has always been key in the successful decoding of brain activity, and the determination of an optimal and standardized EEG preprocessing activity is an active area of research. In this paper, we present a deep neural network architecture specifically engineered to a) provide state-of-the-art performance in multiclass motor imagery classification and b) remain robust to preprocessing to enable real-time processing of raw data as it streams from EEG and BCI equipment. It is based on the intertwined use of time-distributed fully connected (tdFC) and space-distributed 1D temporal convolutional layers (sdConv) and explicitly addresses the possibility that interaction of spatial and temporal features of the EEG signal occurs at all levels of complexity. Numerical experiments demonstrate that our architecture provides superior performance compared baselines based on a combination of 3D convolutions and recurrent neural networks in a six-class motor imagery network, with a subjectwise accuracy that reaches 99%. Importantly, these results remain unchanged when minimal or extensive preprocessing is applied, possibly paving the way for a more transversal and real-time use of deep learning architectures in EEG classification.
    Early heart disease prediction using hybrid quantum classification. (arXiv:2208.08882v1 [quant-ph])
    The rate of heart morbidity and heart mortality increases significantly which affect the global public health and world economy. Early prediction of heart disease is crucial for reducing heart morbidity and mortality. This paper proposes two quantum machine learning methods i.e. hybrid quantum neural network and hybrid random forest quantum neural network for early detection of heart disease. The methods are applied on the Cleveland and Statlog datasets. The results show that hybrid quantum neural network and hybrid random forest quantum neural network are suitable for high dimensional and low dimensional problems respectively. The hybrid quantum neural network is sensitive to outlier data while hybrid random forest is robust on outlier data. A comparison between different machine learning methods shows that the proposed quantum methods are more appropriate for early heart disease prediction where 96.43% and 97.78% area under curve are obtained for Cleveland and Statlog dataset respectively.
    Two-layer neural networks with values in a Banach space. (arXiv:2105.02095v3 [cs.LG] UPDATED)
    We study two-layer neural networks whose domain and range are Banach spaces with separable preduals. In addition, we assume that the image space is equipped with a partial order, i.e. it is a Riesz space. As the nonlinearity we choose the lattice operation of taking the positive part; in case of $\mathbb R^d$-valued neural networks this corresponds to the ReLU activation function. We prove inverse and direct approximation theorems with Monte-Carlo rates for a certain class of functions, extending existing results for the finite-dimensional case. In the second part of the paper, we study, from the regularisation theory viewpoint, the problem of finding optimal representations of such functions via signed measures on a latent space from a finite number of noisy observations. We discuss regularity conditions known as source conditions and obtain convergence rates in a Bregman distance for the representing measure in the regime when both the noise level goes to zero and the number of samples goes to infinity at appropriate rates.
    Learned Indexing in Proteins: Substituting Complex Distance Calculations with Embedding and Clustering Techniques. (arXiv:2208.08910v1 [cs.IR])
    Despite the constant evolution of similarity searching research, it continues to face the same challenges stemming from the complexity of the data, such as the curse of dimensionality and computationally expensive distance functions. Various machine learning techniques have proven capable of replacing elaborate mathematical models with combinations of simple linear functions, often gaining speed and simplicity at the cost of formal guarantees of accuracy and correctness of querying. The authors explore the potential of this research trend by presenting a lightweight solution for the complex problem of 3D protein structure search. The solution consists of three steps -- (i) transformation of 3D protein structural information into very compact vectors, (ii) use of a probabilistic model to group these vectors and respond to queries by returning a given number of similar objects, and (iii) a final filtering step which applies basic vector distance functions to refine the result.
    "Task-relevant autoencoding" enhances machine learning for human neuroscience. (arXiv:2208.08478v1 [q-bio.NC])
    In human neuroscience, machine learning can help reveal lower-dimensional neural representations relevant to subjects' behavior. However, state-of-the-art models typically require large datasets to train, so are prone to overfitting on human neuroimaging data that often possess few samples but many input dimensions. Here, we capitalized on the fact that the features we seek in human neuroscience are precisely those relevant to subjects' behavior. We thus developed a Task-Relevant Autoencoder via Classifier Enhancement (TRACE), and tested its ability to extract behaviorally-relevant, separable representations compared to a standard autoencoder for two severely truncated machine learning datasets. We then evaluated both models on fMRI data where subjects observed animals and objects. TRACE outperformed both the autoencoder and raw inputs nearly unilaterally, showing up to 30% increased classification accuracy and up to threefold improvement in discovering "cleaner", task-relevant representations. These results showcase TRACE's potential for a wide variety of data related to human behavior.
    Merchandise Recommendation for Retail Events with Word Embedding Weighted Tf-idf and Dynamic Query Expansion. (arXiv:2208.08581v1 [cs.IR])
    To recommend relevant merchandises for seasonal retail events, we rely on item retrieval from marketplace inventory. With feedback to expand query scope, we discuss keyword expansion candidate selection using word embedding similarity, and an enhanced tf-idf formula for expanded words in search ranking.
    Automatic Detection of Noisy Electrocardiogram Signals without Explicit Noise Labels. (arXiv:2208.08853v1 [eess.SP])
    Electrocardiogram (ECG) signals are beneficial in diagnosing cardiovascular diseases, which are one of the leading causes of death. However, they are often contaminated by noise artifacts and affect the automatic and manual diagnosis process. Automatic deep learning-based examination of ECG signals can lead to inaccurate diagnosis, and manual analysis involves rejection of noisy ECG samples by clinicians, which might cost extra time. To address this limitation, we present a two-stage deep learning-based framework to automatically detect the noisy ECG samples. Through extensive experiments and analysis on two different datasets, we observe that the deep learning-based framework can detect slightly and highly noisy ECG samples effectively. We also study the transfer of the model learned on one dataset to another dataset and observe that the framework effectively detects noisy ECG samples.
    Efficient Signed Graph Sampling via Balancing & Gershgorin Disc Perfect Alignment. (arXiv:2208.08726v1 [eess.SP])
    A basic premise in graph signal processing (GSP) is that a graph encoding pairwise (anti-)correlations of the targeted signal as edge weights is exploited for graph filtering. However, existing fast graph sampling schemes are designed and tested only for positive graphs describing positive correlations. In this paper, we show that for datasets with strong inherent anti-correlations, a suitable graph contains both positive and negative edge weights. In response, we propose a linear-time signed graph sampling method centered on the concept of balanced signed graphs. Specifically, given an empirical covariance data matrix $\bar{\bf{C}}$, we first learn a sparse inverse matrix (graph Laplacian) $\mathcal{L}$ corresponding to a signed graph $\mathcal{G}$. We define the eigenvectors of Laplacian $\mathcal{L}_B$ for a balanced signed graph $\mathcal{G}_B$ -- approximating $\mathcal{G}$ via edge weight augmentation -- as graph frequency components. Next, we choose samples to minimize the low-pass filter reconstruction error in two steps. We first align all Gershgorin disc left-ends of Laplacian $\mathcal{L}_B$ at smallest eigenvalue $\lambda_{\min}(\mathcal{L}_B)$ via similarity transform $\mathcal{L}_p = \S \mathcal{L}_B \S^{-1}$, leveraging a recent linear algebra theorem called Gershgorin disc perfect alignment (GDPA). We then perform sampling on $\mathcal{L}_p$ using a previous fast Gershgorin disc alignment sampling (GDAS) scheme. Experimental results show that our signed graph sampling method outperformed existing fast sampling schemes noticeably on various datasets.
    Never Worse, Mostly Better: Stable Policy Improvement in Deep Reinforcement Learning. (arXiv:1910.01062v3 [cs.LG] UPDATED)
    In recent years, there has been significant progress in applying deep reinforcement learning (RL) for solving challenging problems across a wide variety of domains. Nevertheless, convergence of various methods has been shown to suffer from inconsistencies, due to algorithmic instability and variance, as well as stochasticity in the benchmark environments. Particularly, despite the fact that the agent's performance may be improving on average, it may abruptly deteriorate at late stages of training. In this work, we study methods for enhancing the agent's learning process, by providing conservative updates with respect to either the obtained history or a reference benchmark policy. Our method, termed EVEREST, obtains high confidence improvements via confidence bounds of a reference policy. Through extensive empirical analysis we demonstrate the benefit of our approach in terms of both performance and stabilization, with significant improvements in continuous control and Atari benchmarks.
    Autism spectrum disorder classification based on interpersonal neural synchrony: Can classification be improved by dyadic neural biomarkers using unsupervised graph representation learning?. (arXiv:2208.08902v1 [cs.LG])
    Research in machine learning for autism spectrum disorder (ASD) classification bears the promise to improve clinical diagnoses. However, recent studies in clinical imaging have shown the limited generalization of biomarkers across and beyond benchmark datasets. Despite increasing model complexity and sample size in neuroimaging, the classification performance of ASD remains far away from clinical application. This raises the question of how we can overcome these barriers to develop early biomarkers for ASD. One approach might be to rethink how we operationalize the theoretical basis of this disease in machine learning models. Here we introduced unsupervised graph representations that explicitly map the neural mechanisms of a core aspect of ASD, deficits in dyadic social interaction, as assessed by dual brain recordings, termed hyperscanning, and evaluated their predictive performance. The proposed method differs from existing approaches in that it is more suitable to capture social interaction deficits on a neural level and is applicable to young children and infants. First results from functional-near infrared spectroscopy data indicate potential predictive capacities of a task-agnostic, interpretable graph representation. This first effort to leverage interaction-related deficits on neural level to classify ASD may stimulate new approaches and methods to enhance existing models to achieve developmental ASD biomarkers in the future.
    Projection-free Graph-based Classifier Learning using Gershgorin Disc Perfect Alignment. (arXiv:2106.01642v2 [cs.LG] UPDATED)
    In semi-supervised graph-based binary classifier learning, a subset of known labels $\hat{x}_i$ are used to infer unknown labels, assuming that the label signal $\mathbf{x}$ is smooth with respect to a similarity graph specified by a Laplacian matrix. When restricting labels $x_i$ to binary values, the problem is NP-hard. While a conventional semi-definite programming relaxation (SDR) can be solved in polynomial time using, for example, the alternating direction method of multipliers (ADMM), the complexity of projecting a candidate matrix $\mathbf{M}$ onto the positive semi-definite (PSD) cone ($\mathbf{M} \succeq 0$) per iteration remains high. In this paper, leveraging a recent linear algebraic theory called Gershgorin disc perfect alignment (GDPA), we propose a fast projection-free method by solving a sequence of linear programs (LP) instead. Specifically, we first recast the SDR to its dual, where a feasible solution $\mathbf{H} \succeq 0$ is interpreted as a Laplacian matrix corresponding to a balanced signed graph minus the last node. To achieve graph balance, we split the last node into two, each retains the original positive / negative edges, resulting in a new Laplacian $\bar{\mathbf{H}}$. We repose the SDR dual for solution $\bar{\mathbf{H}}$, then replace the PSD cone constraint $\bar{\mathbf{H} \succeq 0$ with linear constraints derived from GDPA -- sufficient conditions to ensure $\bar{\mathbf{H}}$ is PSD -- so that the optimization becomes an LP per iteration. Finally, we extract predicted labels from converged solution $\bar{\mathbf{H}}$. Experiments show that our algorithm enjoyed a $28\times$ speedup over the next fastest scheme while achieving comparable label prediction performance.
    Transformer Networks for Predictive Group Elevator Control. (arXiv:2208.08948v1 [physics.soc-ph])
    We propose a Predictive Group Elevator Scheduler by using predictive information of passengers arrivals from a Transformer based destination predictor and a linear regression model that predicts remaining time to destinations. Through extensive empirical evaluation, we find that the savings of Average Waiting Time (AWT) could be as high as above 50% for light arrival streams and around 15% for medium arrival streams in afternoon down-peak traffic regimes. Such results can be obtained after carefully setting the Predicted Probability of Going to Elevator (PPGE) threshold, thus avoiding a majority of false predictions for people heading to the elevator, while achieving as high as 80% of true predictive elevator landings as early as after having seen only 60% of the whole trajectory of a passenger.
    Towards Learning in Grey Spatiotemporal Systems: A Prophet to Non-consecutive Spatiotemporal Dynamics. (arXiv:2208.08878v1 [cs.LG])
    Spatiotemporal forecasting is an imperative topic in data science due to its diverse and critical applications in smart cities. Existing works mostly perform consecutive predictions of following steps with observations completely and continuously obtained, where nearest observations can be exploited as key knowledge for instantaneous status estimation. However, the practical issues of early activity planning and sensor failures elicit a brand-new task, i.e., non-consecutive forecasting. In this paper, we define spatiotemporal learning systems with missing observation as Grey Spatiotemporal Systems (G2S) and propose a Factor-Decoupled learning framework for G2S (FDG2S), where the core idea is to hierarchically decouple multi-level factors and enable both flexible aggregations and disentangled uncertainty estimations. Firstly, to compensate for missing observations, a generic semantic-neighboring sequence sampling is devised, which selects representative sequences to capture both periodical regularity and instantaneous variations. Secondly, we turn the predictions of non-consecutive statuses into inferring statuses under expected combined exogenous factors. In particular, a factor-decoupled aggregation scheme is proposed to decouple factor-induced predictive intensity and region-wise proximity by two energy functions of conditional random field. To infer region-wise proximity under flexible factor-wise combinations and enable dynamic neighborhood aggregations, we further disentangle compounded influences of exogenous factors on region-wise proximity and learn to aggregate them. Given the inherent incompleteness and critical applications of G2S, a DisEntangled Uncertainty Quantification is put forward, to identify two types of uncertainty for reliability guarantees and model interpretations.
    Learning-based estimation of in-situ wind speed from underwater acoustics. (arXiv:2208.08912v1 [cs.LG])
    Wind speed retrieval at sea surface is of primary importance for scientific and operational applications. Besides weather models, in-situ measurements and remote sensing technologies, especially satellite sensors, provide complementary means to monitor wind speed. As sea surface winds produce sounds that propagate underwater, underwater acoustics recordings can also deliver fine-grained wind-related information. Whereas model-driven schemes, especially data assimilation approaches, are the state-of-the-art schemes to address inverse problems in geoscience, machine learning techniques become more and more appealing to fully exploit the potential of observation datasets. Here, we introduce a deep learning approach for the retrieval of wind speed time series from underwater acoustics possibly complemented by other data sources such as weather model reanalyses. Our approach bridges data assimilation and learning-based frameworks to benefit both from prior physical knowledge and computational efficiency. Numerical experiments on real data demonstrate that we outperform the state-of-the-art data-driven methods with a relative gain up to 16% in terms of RMSE. Interestingly, these results support the relevance of the time dynamics of underwater acoustic data to better inform the time evolution of wind speed. They also show that multimodal data, here underwater acoustics data combined with ECMWF reanalysis data, may further improve the reconstruction performance, including the robustness with respect to missing underwater acoustics data.
    Estimating individual treatment effects under unobserved confounding using binary instruments. (arXiv:2208.08544v1 [stat.ME])
    Estimating individual treatment effects (ITEs) from observational data is relevant in many fields such as personalized medicine. However, in practice, the treatment assignment is usually confounded by unobserved variables and thus introduces bias. A remedy to remove the bias is the use of instrumental variables (IVs). Such settings are widespread in medicine (e.g., trials where compliance is used as binary IV). In this paper, we propose a novel, multiply robust machine learning framework, called MRIV, for estimating ITEs using binary IVs and thus yield an unbiased ITE estimator. Different from previous work for binary IVs, our framework estimates the ITE directly via a pseudo outcome regression. (1) We provide a theoretical analysis where we show that our framework yields multiply robust convergence rates: our ITE estimator achieves fast convergence even if several nuisance estimators converge slowly. (2) We further show that our framework asymptotically outperforms state-of-the-art plug-in IV methods for ITE estimation. (3) We build upon our theoretical results and propose a tailored deep neural network architecture called MRIV-Net for ITE estimation using binary IVs. Across various computational experiments, we demonstrate empirically that our MRIV-Net achieves state-of-the-art performance. To the best of our knowledge, our MRIV is the first machine learning framework for estimating ITEs in the binary IV setting shown to be multiply robust.
    Efficient data-driven gap filling of satellite image time series using deep neural networks with partial convolutions. (arXiv:2208.08781v1 [cs.LG])
    The abundance of gaps in satellite image time series often complicates the application of deep learning models such as convolutional neural networks for spatiotemporal modeling. Based on previous work in computer vision on image inpainting, this paper shows how three-dimensional spatiotemporal partial convolutions can be used as layers in neural networks to fill gaps in satellite image time series. To evaluate the approach, we apply a U-Net-like model on incomplete image time series of quasi-global carbon monoxide observations from the Sentinel-5P satellite. Prediction errors were comparable to two considered statistical approaches while computation times for predictions were up to three orders of magnitude faster, making the approach applicable to process large amounts of satellite data. Partial convolutions can be added as layers to other types of neural networks, making it relatively easy to integrate with existing deep learning models. However, the approach does not quantify prediction errors and further research is needed to understand and improve model transferability. The implementation of spatiotemporal partial convolutions and the U-Net-like model is available as open-source software.
    Evaluating Continual Test-Time Adaptation for Contextual and Semantic Domain Shifts. (arXiv:2208.08767v1 [cs.CV])
    In this paper, our goal is to adapt a pre-trained Convolutional Neural Network to domain shifts at test time. We do so continually with the incoming stream of test batches, without labels. Existing literature mostly operates on artificial shifts obtained via adversarial perturbations of a test image. Motivated by this, we evaluate the state of the art on two realistic and challenging sources of domain shifts, namely contextual and semantic shifts. Contextual shifts correspond to the environment types, for example a model pre-trained on indoor context has to adapt to the outdoor context on CORe-50 [7]. Semantic shifts correspond to the capture types, for example a model pre-trained on natural images has to adapt to cliparts, sketches and paintings on DomainNet [10]. We include in our analysis recent techniques such as Prediction-Time Batch Normalization (BN) [8], Test Entropy Minimization (TENT) [16] and Continual Test-Time Adaptation (CoTTA) [17]. Our findings are three-fold: i) Test-time adaptation methods perform better and forget less on contextual shifts compared to semantic shifts, ii) TENT outperforms other methods on short-term adaptation, whereas CoTTA outpeforms other methods on long-term adaptation, iii) BN is most reliable and robust.
    Truth-Table Net: A New Convolutional Architecture Encodable By Design Into SAT Formulas. (arXiv:2208.08609v1 [cs.AI])
    With the expanding role of neural networks, the need for complete and sound verification of their property has become critical. In the recent years, it was established that Binary Neural Networks (BNNs) have an equivalent representation in Boolean logic and can be formally analyzed using logical reasoning tools such as SAT solvers. However, to date, only BNNs can be transformed into a SAT formula. In this work, we introduce Truth Table Deep Convolutional Neural Networks (TTnets), a new family of SAT-encodable models featuring for the first time real-valued weights. Furthermore, it admits, by construction, some valuable conversion features including post-tuning and tractability in the robustness verification setting. The latter property leads to a more compact SAT symbolic encoding than BNNs. This enables the use of general SAT solvers, making property verification easier. We demonstrate the value of TTnets regarding the formal robustness property: TTnets outperform the verified accuracy of all BNNs with a comparable computation time. More generally, they represent a relevant trade-off between all known complete verification methods: TTnets achieve high verified accuracy with fast verification time, being complete with no timeouts. We are exploring here a proof of concept of TTnets for a very important application (complete verification of robustness) and we believe this novel real-valued network constitutes a practical response to the rising need for functional formal verification. We postulate that TTnets can apply to various CNN-based architectures and be extended to other properties such as fairness, fault attack and exact rule extraction.
    Physics-Informed Neural Network Method for Parabolic Differential Equations with Sharply Perturbed Initial Conditions. (arXiv:2208.08635v1 [math.NA])
    In this paper, we develop a physics-informed neural network (PINN) model for parabolic problems with a sharply perturbed initial condition. As an example of a parabolic problem, we consider the advection-dispersion equation (ADE) with a point (Gaussian) source initial condition. In the $d$-dimensional ADE, perturbations in the initial condition decay with time $t$ as $t^{-d/2}$, which can cause a large approximation error in the PINN solution. Localized large gradients in the ADE solution make the (common in PINN) Latin hypercube sampling of the equation's residual highly inefficient. Finally, the PINN solution of parabolic equations is sensitive to the choice of weights in the loss function. We propose a normalized form of ADE where the initial perturbation of the solution does not decrease in amplitude and demonstrate that this normalization significantly reduces the PINN approximation error. We propose criteria for weights in the loss function that produce a more accurate PINN solution than those obtained with the weights selected via other methods. Finally, we proposed an adaptive sampling scheme that significantly reduces the PINN solution error for the same number of the sampling (residual) points. We demonstrate the accuracy of the proposed PINN model for forward, inverse, and backward ADEs.
    Performance Evaluation of Selective Fixed-filter Active Noise Control based on Different Convolutional Neural Networks. (arXiv:2208.08440v1 [cs.LG])
    Due to its rapid response time and a high degree of robustness, the selective fixed-filter active noise control (SFANC) method appears to be a viable candidate for widespread use in a variety of practical active noise control (ANC) systems. In comparison to conventional fixed-filter ANC methods, SFANC can select the pre-trained control filters for different types of noise. Deep learning technologies, thus, can be used in SFANC methods to enable a more flexible selection of the most appropriate control filters for attenuating various noises. Furthermore, with the assistance of a deep neural network, the selecting strategy can be learned automatically from noise data rather than through trial and error, which significantly simplifies and improves the practicability of ANC design. Therefore, this paper investigates the performance of SFANC based on different one-dimensional and two-dimensional convolutional neural networks. Additionally, we conducted comparative analyses of several network training strategies and discovered that fine-tuning could improve selection performance.
    Communication-Efficient Decentralized Online Continuous DR-Submodular Maximization. (arXiv:2208.08681v1 [cs.LG])
    Maximizing a monotone submodular function is a fundamental task in machine learning, economics, and statistics. In this paper, we present two communication-efficient decentralized online algorithms for the monotone continuous DR-submodular maximization problem, both of which reduce the number of per-function gradient evaluations and per-round communication complexity from $T^{3/2}$ to $1$. The first one, One-shot Decentralized Meta-Frank-Wolfe (Mono-DMFW), achieves a $(1-1/e)$-regret bound of $O(T^{4/5})$. As far as we know, this is the first one-shot and projection-free decentralized online algorithm for monotone continuous DR-submodular maximization. Next, inspired by the non-oblivious boosting function \citep{zhang2022boosting}, we propose the Decentralized Online Boosting Gradient Ascent (DOBGA) algorithm, which attains a $(1-1/e)$-regret of $O(\sqrt{T})$. To the best of our knowledge, this is the first result to obtain the optimal $O(\sqrt{T})$ against a $(1-1/e)$-approximation with only one gradient inquiry for each local objective function per step. Finally, various experimental results confirm the effectiveness of the proposed methods.
    Meta Sparse Principle Component Analysis. (arXiv:2208.08938v1 [stat.ML])
    We study the meta-learning for support (i.e. the set of non-zero entries) recovery in high-dimensional Principal Component Analysis. We reduce the sufficient sample complexity in a novel task with the information that is learned from auxiliary tasks. We assume each task to be a different random Principal Component (PC) matrix with a possibly different support and that the support union of the PC matrices is small. We then pool the data from all the tasks to execute an improper estimation of a single PC matrix by maximising the $l_1$-regularised predictive covariance to establish that with high probability the true support union can be recovered provided a sufficient number of tasks $m$ and a sufficient number of samples $ O\left(\frac{\log(p)}{m}\right)$ for each task, for $p$-dimensional vectors. Then, for a novel task, we prove that the maximisation of the $l_1$-regularised predictive covariance with the additional constraint that the support is a subset of the estimated support union could reduce the sufficient sample complexity of successful support recovery to $O(\log |J|)$, where $J$ is the support union recovered from the auxiliary tasks. Typically, $|J|$ would be much less than $p$ for sparse matrices. Finally, we demonstrate the validity of our experiments through numerical simulations.
    Memory and Capacity of Graph Embedding Methods. (arXiv:2208.08769v1 [stat.ML])
    We introduce a method for embedding graphs as vectors in a structure-preserving manner. In this paper, we showcase its rich representational capacity and give some theoretical properties of our method. In particular, our procedure falls under the bind-and-sum approach, and we show that our binding operation - the tensor product - is the most general binding operation that respects the principle of superposition. Similarly, we show that the spherical code achieves optimal compression. We then establish some precise results characterizing the performance our method as well as some experimental results showcasing how it can accurately perform various graph operations even when the number of edges is quite large. Finally, we conclude with establishing a link to adjacency matrices, showing that our method is, in some sense, a generalization of adjacency matrices with applications towards large sparse graphs.
    Outlier Detection using Self-Organizing Maps for Automated Blood Cell Analysis. (arXiv:2208.08834v1 [eess.IV])
    The quality of datasets plays a crucial role in the successful training and deployment of deep learning models. Especially in the medical field, where system performance may impact the health of patients, clean datasets are a safety requirement for reliable predictions. Therefore, outlier detection is an essential process when building autonomous clinical decision systems. In this work, we assess the suitability of Self-Organizing Maps for outlier detection specifically on a medical dataset containing quantitative phase images of white blood cells. We detect and evaluate outliers based on quantization errors and distance maps. Our findings confirm the suitability of Self-Organizing Maps for unsupervised Out-Of-Distribution detection on the dataset at hand. Self-Organizing Maps perform on par with a manually specified filter based on expert domain knowledge. Additionally, they show promise as a tool in the exploration and cleaning of medical datasets. As a direction for future research, we suggest a combination of Self-Organizing Maps and feature extraction based on deep learning.
    Deep Neural Network Approximation of Invariant Functions through Dynamical Systems. (arXiv:2208.08707v1 [cs.LG])
    We study the approximation of functions which are invariant with respect to certain permutations of the input indices using flow maps of dynamical systems. Such invariant functions includes the much studied translation-invariant ones involving image tasks, but also encompasses many permutation-invariant functions that finds emerging applications in science and engineering. We prove sufficient conditions for universal approximation of these functions by a controlled equivariant dynamical system, which can be viewed as a general abstraction of deep residual networks with symmetry constraints. These results not only imply the universal approximation for a variety of commonly employed neural network architectures for symmetric function approximation, but also guide the design of architectures with approximation guarantees for applications involving new symmetry requirements.
    Deep Recursive Embedding for High-Dimensional Data. (arXiv:2111.00622v2 [cs.LG] UPDATED)
    Embedding high-dimensional data onto a low-dimensional manifold is of both theoretical and practical value. In this paper, we propose to combine deep neural networks (DNN) with mathematics-guided embedding rules for high-dimensional data embedding. We introduce a generic deep embedding network (DEN) framework, which is able to learn a parametric mapping from high-dimensional space to low-dimensional space, guided by well-established objectives such as Kullback-Leibler (KL) divergence minimization. We further propose a recursive strategy, called deep recursive embedding (DRE), to make use of the latent data representations for boosted embedding performance. We exemplify the flexibility of DRE by different architectures and loss functions, and benchmarked our method against the two most popular embedding methods, namely, t-distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP). The proposed DRE method can map out-of-sample data and scale to extremely large datasets. Experiments on a range of public datasets demonstrated improved embedding performance in terms of local and global structure preservation, compared with other state-of-the-art embedding methods.
    Understanding Scaling Laws for Recommendation Models. (arXiv:2208.08489v1 [cs.IR])
    Scale has been a major driving force in improving machine learning performance, and understanding scaling laws is essential for strategic planning for a sustainable model quality performance growth, long-term resource planning and developing efficient system infrastructures to support large-scale models. In this paper, we study empirical scaling laws for DLRM style recommendation models, in particular Click-Through Rate (CTR). We observe that model quality scales with power law plus constant in model size, data size and amount of compute used for training. We characterize scaling efficiency along three different resource dimensions, namely data, parameters and compute by comparing the different scaling schemes along these axes. We show that parameter scaling is out of steam for the model architecture under study, and until a higher-performing model architecture emerges, data scaling is the path forward. The key research questions addressed by this study include: Does a recommendation model scale sustainably as predicted by the scaling laws? Or are we far off from the scaling law predictions? What are the limits of scaling? What are the implications of the scaling laws on long-term hardware/system development?
    DIET: Conditional independence testing with marginal dependence measures of residual information. (arXiv:2208.08579v1 [stat.ME])
    Conditional randomization tests (CRTs) assess whether a variable $x$ is predictive of another variable $y$, having observed covariates $z$. CRTs require fitting a large number of predictive models, which is often computationally intractable. Existing solutions to reduce the cost of CRTs typically split the dataset into a train and test portion, or rely on heuristics for interactions, both of which lead to a loss in power. We propose the decoupled independence test (DIET), an algorithm that avoids both of these issues by leveraging marginal independence statistics to test conditional independence relationships. DIET tests the marginal independence of two random variables: $F(x \mid z)$ and $F(y \mid z)$ where $F(\cdot \mid z)$ is a conditional cumulative distribution function (CDF). These variables are termed "information residuals." We give sufficient conditions for DIET to achieve finite sample type-1 error control and power greater than the type-1 error rate. We then prove that when using the mutual information between the information residuals as a test statistic, DIET yields the most powerful conditionally valid test. Finally, we show DIET achieves higher power than other tractable CRTs on several synthetic and real benchmarks.
    SensorSCAN: Self-Supervised Learning and Deep Clustering for Fault Diagnosis in Chemical Processes. (arXiv:2208.08879v1 [cs.LG])
    Modern industrial facilities generate large volumes of raw sensor data during production process. This data is used to monitor and control the processes and can be analyzed to detect and predict process abnormalities. Typically, the data has to be annotated by experts to be further used in predictive modeling. Most of today's research is focusing on either unsupervised anomaly detection algorithms or supervised methods, that require manually annotated data. The studies are often done using process simulator generated data for a narrow class of events and proposed algorithms are rarely verified on publicly available datasets. In this paper, we propose a novel method SensorSCAN for unsupervised fault detection and diagnosis designed for industrial chemical sensor data. We demonstrate our model performance on two publicly available datasets based on the Tennessee Eastman Process with various fault types. Results show that our method significantly outperforms existing approaches (+0.2-0.3 TPR for a fixed FPR) and detects most of the process faults without the use of expert annotation. In addition, we performed experiments to show that our method is suitable for real-world applications where the number of fault types is not known in advance.
    Analyzing Robustness of End-to-End Neural Models for Automatic Speech Recognition. (arXiv:2208.08509v1 [cs.CL])
    We investigate robustness properties of pre-trained neural models for automatic speech recognition. Real life data in machine learning is usually very noisy and almost never clean, which can be attributed to various factors depending on the domain, e.g. outliers, random noise and adversarial noise. Therefore, the models we develop for various tasks should be robust to such kinds of noisy data, which led to the thriving field of robust machine learning. We consider this important issue in the setting of automatic speech recognition. With the increasing popularity of pre-trained models, it's an important question to analyze and understand the robustness of such models to noise. In this work, we perform a robustness analysis of the pre-trained neural models wav2vec2, HuBERT and DistilHuBERT on the LibriSpeech and TIMIT datasets. We use different kinds of noising mechanisms and measure the model performances as quantified by the inference time and the standard Word Error Rate metric. We also do an in-depth layer-wise analysis of the wav2vec2 model when injecting noise in between layers, enabling us to predict at a high level what each layer learns. Finally for this model, we visualize the propagation of errors across the layers and compare how it behaves on clean versus noisy data. Our experiments conform the predictions of Pasad et al. [2021] and also raise interesting directions for future work.
    Quality issues in Machine Learning Software Systems. (arXiv:2208.08982v1 [cs.SE])
    Context: An increasing demand is observed in various domains to employ Machine Learning (ML) for solving complex problems. ML models are implemented as software components and deployed in Machine Learning Software Systems (MLSSs). Problem: There is a strong need for ensuring the serving quality of MLSSs. False or poor decisions of such systems can lead to malfunction of other systems, significant financial losses, or even threat to human life. The quality assurance of MLSSs is considered as a challenging task and currently is a hot research topic. Moreover, it is important to cover all various aspects of the quality in MLSSs. Objective: This paper aims to investigate the characteristics of real quality issues in MLSSs from the viewpoint of practitioners. This empirical study aims to identify a catalog of bad-practices related to poor quality in MLSSs. Method: We plan to conduct a set of interviews with practitioners/experts, believing that interviews are the best method to retrieve their experience and practices when dealing with quality issues. We expect that the catalog of issues developed at this step will also help us later to identify the severity, root causes, and possible remedy for quality issues of MLSSs, allowing us to develop efficient quality assurance tools for ML models and MLSSs.
    Quantifying the Knowledge in a DNN to Explain Knowledge Distillation for Classification. (arXiv:2208.08741v1 [cs.LG])
    Compared to traditional learning from scratch, knowledge distillation sometimes makes the DNN achieve superior performance. This paper provides a new perspective to explain the success of knowledge distillation, i.e., quantifying knowledge points encoded in intermediate layers of a DNN for classification, based on the information theory. To this end, we consider the signal processing in a DNN as the layer-wise information discarding. A knowledge point is referred to as an input unit, whose information is much less discarded than other input units. Thus, we propose three hypotheses for knowledge distillation based on the quantification of knowledge points. 1. The DNN learning from knowledge distillation encodes more knowledge points than the DNN learning from scratch. 2. Knowledge distillation makes the DNN more likely to learn different knowledge points simultaneously. In comparison, the DNN learning from scratch tends to encode various knowledge points sequentially. 3. The DNN learning from knowledge distillation is often optimized more stably than the DNN learning from scratch. In order to verify the above hypotheses, we design three types of metrics with annotations of foreground objects to analyze feature representations of the DNN, \textit{i.e.} the quantity and the quality of knowledge points, the learning speed of different knowledge points, and the stability of optimization directions. In experiments, we diagnosed various DNNs for different classification tasks, i.e., image classification, 3D point cloud classification, binary sentiment classification, and question answering, which verified above hypotheses.
    On an Application of Generative Adversarial Networks on Remaining Lifetime Estimation. (arXiv:2208.08666v1 [cs.LG])
    A major problem of structural health monitoring (SHM) has been the prognosis of damage and the definition of the remaining useful life of a structure. Both tasks depend on many parameters, many of which are often uncertain. Many models have been developed for the aforementioned tasks but they have been either deterministic or stochastic with the ability to take into account only a restricted amount of past states of the structure. In the current work, a generative model is proposed in order to make predictions about the damage evolution of structures. The model is able to perform in a population-based SHM (PBSHM) framework, to take into account many past states of the damaged structure, to incorporate uncertainties in the modelling process and to generate potential damage evolution outcomes according to data acquired from a structure. The algorithm is tested on a simulated damage evolution example and the results reveal that it is able to provide quite confident predictions about the remaining useful life of structures within a population.
    Enhancing Targeted Attack Transferability via Diversified Weight Pruning. (arXiv:2208.08677v1 [cs.CV])
    Malicious attackers can generate targeted adversarial examples by imposing human-imperceptible noise on images, forcing neural network models to produce specific incorrect outputs. With cross-model transferable adversarial examples, the vulnerability of neural networks remains even if the model information is kept secret from the attacker. Recent studies have shown the effectiveness of ensemble-based methods in generating transferable adversarial examples. However, existing methods fall short under the more challenging scenario of creating targeted attacks transferable among distinct models. In this work, we propose Diversified Weight Pruning (DWP) to further enhance the ensemble-based methods by leveraging the weight pruning method commonly used in model compression. Specifically, we obtain multiple diverse models by a random weight pruning method. These models preserve similar accuracies and can serve as additional models for ensemble-based methods, yielding stronger transferable targeted attacks. Experiments on ImageNet-Compatible Dataset under the more challenging scenarios are provided: transferring to distinct architectures and to adversarially trained models. The results show that our proposed DWP improves the targeted attack success rates with up to 4.1% and 8.0% on the combination of state-of-the-art methods, respectively
    Pandemic Control, Game Theory and Machine Learning. (arXiv:2208.08646v1 [math.OC])
    Game theory has been an effective tool in the control of disease spread and in suggesting optimal policies at both individual and area levels. In this AMS Notices article, we focus on the decision-making development for the intervention of COVID-19, aiming to provide mathematical models and efficient machine learning methods, and justifications for related policies that have been implemented in the past and explain how the authorities' decisions affect their neighboring regions from a game theory viewpoint.
    Bayesian Optimization Augmented with Actively Elicited Expert Knowledge. (arXiv:2208.08742v1 [cs.LG])
    Bayesian optimization (BO) is a well-established method to optimize black-box functions whose direct evaluations are costly. In this paper, we tackle the problem of incorporating expert knowledge into BO, with the goal of further accelerating the optimization, which has received very little attention so far. We design a multi-task learning architecture for this task, with the goal of jointly eliciting the expert knowledge and minimizing the objective function. In particular, this allows for the expert knowledge to be transferred into the BO task. We introduce a specific architecture based on Siamese neural networks to handle the knowledge elicitation from pairwise queries. Experiments on various benchmark functions with both simulated and actual human experts show that the proposed method significantly speeds up BO even when the expert knowledge is biased compared to the objective function.
    Psychophysiological Arousal in Young Children Who Stutter: An Interpretable AI Approach. (arXiv:2208.08859v1 [eess.SP])
    The presented first-of-its-kind study effectively identifies and visualizes the second-by-second pattern differences in the physiological arousal of preschool-age children who do stutter (CWS) and who do not stutter (CWNS) while speaking perceptually fluently in two challenging conditions i.e speaking in stressful situations and narration. The first condition may affect children's speech due to high arousal; the latter introduces linguistic, cognitive, and communicative demands on speakers. We collected physiological parameters data from 70 children in the two target conditions. First, we adopt a novel modality-wise multiple-instance-learning (MI-MIL) approach to classify CWS vs. CWNS in different conditions effectively. The evaluation of this classifier addresses four critical research questions that align with state-of-the-art speech science studies' interests. Later, we leverage SHAP classifier interpretations to visualize the salient, fine-grain, and temporal physiological parameters unique to CWS at the population/group-level and personalized-level. While group-level identification of distinct patterns would enhance our understanding of stuttering etiology and development, the personalized-level identification would enable remote, continuous, and real-time assessment of stuttering children's physiological arousal, which may lead to personalized, just-in-time interventions, resulting in an improvement in speech fluency. The presented MI-MIL approach is novel, generalizable to different domains, and real-time executable. Finally, comprehensive evaluations are done on multiple datasets, presented framework, and several baselines that identified notable insights on CWSs' physiological arousal during speech production.
    EEG-BBNet: a Hybrid Framework for Brain Biometric using Graph Connectivity. (arXiv:2208.08901v1 [eess.SP])
    Brain biometrics based on electroencephalography (EEG) have been used increasingly for personal identification. Traditional machine learning techniques as well as modern day deep learning methods have been applied with promising results. In this paper we present EEG-BBNet, a hybrid network which integrates convolutional neural networks (CNN) with graph convolutional neural networks (GCNN). The benefit of the CNN in automatic feature extraction and the capability of GCNN in learning connectivity between EEG electrodes through graph representation are jointly exploited. We examine various connectivity measures, namely the Euclidean distance, Pearson's correlation coefficient, phase-locked value, phase-lag index, and Rho index. The performance of the proposed method is assessed on a benchmark dataset consisting of various brain-computer interface (BCI) tasks and compared to other state-of-the-art approaches. We found that our models outperform all baselines in the event-related potential (ERP) task with an average correct recognition rates up to 99.26% using intra-session data. EEG-BBNet with Pearson's correlation and RHO index provide the best classification results. In addition, our model demonstrates greater adaptability using inter-session and inter-task data. We also investigate the practicality of our proposed model with smaller number of electrodes. Electrode placements over the frontal lobe region appears to be most appropriate with minimal lost in performance.
    Siamese Prototypical Contrastive Learning. (arXiv:2208.08819v1 [cs.CV])
    Contrastive Self-supervised Learning (CSL) is a practical solution that learns meaningful visual representations from massive data in an unsupervised approach. The ordinary CSL embeds the features extracted from neural networks onto specific topological structures. During the training progress, the contrastive loss draws the different views of the same input together while pushing the embeddings from different inputs apart. One of the drawbacks of CSL is that the loss term requires a large number of negative samples to provide better mutual information bound ideally. However, increasing the number of negative samples by larger running batch size also enhances the effects of false negatives: semantically similar samples are pushed apart from the anchor, hence downgrading downstream performance. In this paper, we tackle this problem by introducing a simple but effective contrastive learning framework. The key insight is to employ siamese-style metric loss to match intra-prototype features, while increasing the distance between inter-prototype features. We conduct extensive experiments on various benchmarks where the results demonstrate the effectiveness of our method on improving the quality of visual representations. Specifically, our unsupervised pre-trained ResNet-50 with a linear probe, out-performs the fully-supervised trained version on the ImageNet-1K dataset.
    RC-Struct: A Structure-based Neural Network Approach for MIMO-OFDM Detection. (arXiv:2110.02219v2 [cs.IT] UPDATED)
    In this paper, we introduce a structure-based neural network architecture, namely RC-Struct, for MIMO-OFDM symbol detection. The RC-Struct exploits the temporal structure of the MIMO-OFDM signals through reservoir computing (RC). A binary classifier leverages the repetitive constellation structure in the system to perform multi-class detection. The incorporation of RC allows the RC-Struct to be learned in a purely online fashion with extremely limited pilot symbols in each OFDM subframe. The binary classifier enables the efficient utilization of the precious online training symbols and allows an easy extension to high-order modulations without a substantial increase in complexity. Experiments show that the introduced RC-Struct outperforms both the conventional model-based symbol detection approaches and the state-of-the-art learning-based strategies in terms of bit error rate (BER). The advantages of RC-Struct over existing methods become more significant when rank and link adaptation are adopted. The introduced RC-Struct sheds light on combining communication domain knowledge and learning-based receive processing for 5G/5G-Advanced and Beyond.
    Private, Efficient, and Accurate: Protecting Models Trained by Multi-party Learning with Differential Privacy. (arXiv:2208.08662v1 [cs.CR])
    Secure multi-party computation-based machine learning, referred to as MPL, has become an important technology to utilize data from multiple parties with privacy preservation. While MPL provides rigorous security guarantees for the computation process, the models trained by MPL are still vulnerable to attacks that solely depend on access to the models. Differential privacy could help to defend against such attacks. However, the accuracy loss brought by differential privacy and the huge communication overhead of secure multi-party computation protocols make it highly challenging to balance the 3-way trade-off between privacy, efficiency, and accuracy. In this paper, we are motivated to resolve the above issue by proposing a solution, referred to as PEA (Private, Efficient, Accurate), which consists of a secure DPSGD protocol and two optimization methods. First, we propose a secure DPSGD protocol to enforce DPSGD in secret sharing-based MPL frameworks. Second, to reduce the accuracy loss led by differential privacy noise and the huge communication overhead of MPL, we propose two optimization methods for the training process of MPL: (1) the data-independent feature extraction method, which aims to simplify the trained model structure; (2) the local data-based global model initialization method, which aims to speed up the convergence of the model training. We implement PEA in two open-source MPL frameworks: TF-Encrypted and Queqiao. The experimental results on various datasets demonstrate the efficiency and effectiveness of PEA. E.g. when ${\epsilon}$ = 2, we can train a differentially private classification model with an accuracy of 88% for CIFAR-10 within 7 minutes under the LAN setting. This result significantly outperforms the one from CryptGPU, one SOTA MPL framework: it costs more than 16 hours to train a non-private deep neural network model on CIFAR-10 with the same accuracy.
    POCS-based Clustering Algorithm. (arXiv:2208.08888v1 [cs.LG])
    A novel clustering technique based on the projection onto convex set (POCS) method, called POCS-based clustering algorithm, is proposed in this paper. The proposed POCS-based clustering algorithm exploits a parallel projection method of POCS to find appropriate cluster prototypes in the feature space. The algorithm considers each data point as a convex set and projects the cluster prototypes parallelly to the member data points. The projections are convexly combined to minimize the objective function for data clustering purpose. The performance of the proposed POCS-based clustering algorithm is verified through experiments on various synthetic datasets. The experimental results show that the proposed POCS-based clustering algorithm is competitive and efficient in terms of clustering error and execution speed when compared with other conventional clustering methods including Fuzzy C-Means (FCM) and K-means clustering algorithms.  ( 2 min )
    Frequency propagation: Multi-mechanism learning in nonlinear physical networks. (arXiv:2208.08862v1 [cond-mat.dis-nn])
    We introduce frequency propagation, a learning algorithm for nonlinear physical networks. In a resistive electrical circuit with variable resistors, an activation current is applied at a set of input nodes at one frequency, and an error current is applied at a set of output nodes at another frequency. The voltage response of the circuit to these boundary currents is the superposition of an `activation signal' and an `error signal' whose coefficients can be read in different frequencies of the frequency domain. Each conductance is updated proportionally to the product of the two coefficients. The learning rule is local and proved to perform gradient descent on a loss function. We argue that frequency propagation is an instance of a multi-mechanism learning strategy for physical networks, be it resistive, elastic, or flow networks. Multi-mechanism learning strategies incorporate at least two physical quantities, potentially governed by independent physical mechanisms, to act as activation and error signals in the training process. Locally available information about these two signals is then used to update the trainable parameters to perform gradient descent. We demonstrate how earlier work implementing learning via chemical signaling in flow networks also falls under the rubric of multi-mechanism learning.
    Learning to Generate Image Source-Agnostic Universal Adversarial Perturbations. (arXiv:2009.13714v4 [cs.LG] UPDATED)
    Adversarial perturbations are critical for certifying the robustness of deep learning models. A universal adversarial perturbation (UAP) can simultaneously attack multiple images, and thus offers a more unified threat model, obviating an image-wise attack algorithm. However, the existing UAP generator is underdeveloped when images are drawn from different image sources (e.g., with different image resolutions). Towards an authentic universality across image sources, we take a novel view of UAP generation as a customized instance of few-shot learning, which leverages bilevel optimization and learning-to-optimize (L2O) techniques for UAP generation with improved attack success rate (ASR). We begin by considering the popular model agnostic meta-learning (MAML) framework to meta-learn a UAP generator. However, we see that the MAML framework does not directly offer the universal attack across image sources, requiring us to integrate it with another meta-learning framework of L2O. The resulting scheme for meta-learning a UAP generator (i) has better performance (50% higher ASR) than baselines such as Projected Gradient Descent, (ii) has better performance (37% faster) than the vanilla L2O and MAML frameworks (when applicable), and (iii) is able to simultaneously handle UAP generation for different victim models and image data sources.
    X-GOAL: Multiplex Heterogeneous Graph Prototypical Contrastive Learning. (arXiv:2109.03560v3 [cs.LG] UPDATED)
    Graphs are powerful representations for relations among objects, which have attracted plenty of attention. A fundamental challenge for graph learning is how to train an effective Graph Neural Network (GNN) encoder without labels, which are expensive and time consuming to obtain. Contrastive Learning (CL) is one of the most popular paradigms to address this challenge, which trains GNNs by discriminating positive and negative node pairs. Despite the success of recent CL methods, there are still two under-explored problems. First, how to reduce the semantic error introduced by random topology based data augmentations. Traditional CL defines positive and negative node pairs via the node-level topological proximity, which is solely based on the graph topology regardless of the semantic information of node attributes, and thus some semantically similar nodes could be wrongly treated as negative pairs. Second, how to effectively model the multiplexity of the real-world graphs, where nodes are connected by various relations and each relation could form a homogeneous graph layer. To solve these problems, we propose a novel multiplex heterogeneous graph prototypical contrastive leaning (X-GOAL) framework to extract node embeddings. X-GOAL is comprised of two components: the GOAL framework, which learns node embeddings for each homogeneous graph layer, and an alignment regularization, which jointly models different layers by aligning layer-specific node embeddings. Specifically, the GOAL framework captures the node-level information by a succinct graph transformation technique, and captures the cluster-level information by pulling nodes within the same semantic cluster closer in the embedding space. The alignment regularization aligns embeddings across layers at both node and cluster levels. We evaluate X-GOAL on various real-world datasets and downstream tasks to demonstrate its effectiveness.
    Learning to Infer Structures of Network Games. (arXiv:2206.08119v2 [cs.LG] UPDATED)
    Strategic interactions between a group of individuals or organisations can be modelled as games played on networks, where a player's payoff depends not only on their actions but also on those of their neighbours. Inferring the network structure from observed game outcomes (equilibrium actions) is an important problem with numerous potential applications in economics and social sciences. Existing methods mostly require the knowledge of the utility function associated with the game, which is often unrealistic to obtain in real-world scenarios. We adopt a transformer-like architecture which correctly accounts for the symmetries of the problem and learns a mapping from the equilibrium actions to the network structure of the game without explicit knowledge of the utility function. We test our method on three different types of network games using both synthetic and real-world data, and demonstrate its effectiveness in network structure inference and superior performance over existing methods.
    Continual Learning in Deep Networks: an Analysis of the Last Layer. (arXiv:2106.01834v3 [cs.LG] UPDATED)
    We study how different output layer parameterizations of a deep neural network affects learning and forgetting in continual learning settings. The following three effects can cause catastrophic forgetting in the output layer: (1) weights modifications, (2) interference, and (3) projection drift. In this paper, our goal is to provide more insights into how changing the output layer parameterization may address (1) and (2). Some potential solutions to those issues are proposed and evaluated here in several continual learning scenarios. We show that the best-performing type of output layer depends on the data distribution drifts and/or the amount of data available. In particular, in some cases where a standard linear layer would fail, changing parameterization is sufficient to achieve a significantly better performance, without introducing any continual-learning algorithm but instead by using standard SGD to train a model. Our analysis and results shed light on the dynamics of the output layer in continual learning scenarios and suggest a way of selecting the best type of output layer for a given scenario.
    Intention estimation from gaze and motion features for human-robot shared-control object manipulation. (arXiv:2208.08688v1 [cs.RO])
    Shared control can help in teleoperated object manipulation by assisting with the execution of the user's intention. To this end, robust and prompt intention estimation is needed, which relies on behavioral observations. Here, an intention estimation framework is presented, which uses natural gaze and motion features to predict the current action and the target object. The system is trained and tested in a simulated environment with pick and place sequences produced in a relatively cluttered scene and with both hands, with possible hand-over to the other hand. Validation is conducted across different users and hands, achieving good accuracy and earliness of prediction. An analysis of the predictive power of single features shows the predominance of the grasping trigger and the gaze features in the early identification of the current action. In the current framework, the same probabilistic model can be used for the two hands working in parallel and independently, while a rule-based model is proposed to identify the resulting bimanual action. Finally, limitations and perspectives of this approach to more complex, full-bimanual manipulations are discussed.
    RRWaveNet: A Compact End-to-End Multi-Scale Residual CNN for Robust PPG Respiratory Rate Estimation. (arXiv:2208.08672v1 [eess.SP])
    Respiratory rate (RR) is an important biomarker as RR changes can reflect severe medical events such as heart disease, lung disease, and sleep disorders. Unfortunately, however, standard manual RR counting is prone to human error and cannot be performed continuously. This study proposes a method for continuously estimating RR, RRWaveNet. The method is a compact end-to-end deep learning model which does not require feature engineering and can use low-cost raw photoplethysmography (PPG) as input signal. RRWaveNet was tested subject-independently and compared to baseline in three datasets (BIDMC, CapnoBase, and WESAD) and using three window sizes (16, 32, and 64 seconds). RRWaveNet outperformed current state-of-the-art methods with mean absolute errors at optimal window size of 1.66 \pm 1.01, 1.59 \pm 1.08, and 1.92 \pm 0.96 breaths per minute for each dataset. In remote monitoring settings, such as in the WESAD dataset, we apply transfer learning to two other ICU datasets, reducing the MAE to 1.52 \pm 0.50 breaths per minute, showing this model allows accurate and practical estimation of RR on affordable and wearable devices. Our study shows feasibility of remote RR monitoring in the context of telemedicine and at home.
    A Hybrid Self-Supervised Learning Framework for Vertical Federated Learning. (arXiv:2208.08934v1 [cs.LG])
    Federated learning (FL) enables independent parties to collaboratively build machine learning (ML) models while protecting data privacy. Vertical federated learning (VFL), a variant of FL, has recently drawn increasing attention as the VFL matches the enterprises' demands of leveraging more valuable features to achieve better model performance without jeopardizing data privacy. However, conventional VFL may run into data deficiency as it is only able to exploit aligned samples (belonging to different parties) with labels, leaving often the majority of unaligned and unlabeled samples unused. The data deficiency hampers the effort of the federation. In this work, we propose a Federated Hybrid Self-Supervised Learning framework, coined FedHSSL, to utilize all available data (including unaligned and unlabeled samples) of participants to train the joint VFL model. The core idea of FedHSSL is to utilize cross-party views (i.e., dispersed features) of samples aligned among parties and local views (i.e., augmentations) of samples within each party to improve the representation learning capability of the joint VFL model through SSL (e.g., SimSiam). FedHSSL further exploits generic features shared among parties to boost the performance of the joint model through partial model aggregation. We empirically demonstrate that our FedHSSL achieves significant performance gains compared with baseline methods, especially when the number of labeled samples is small. We provide an in-depth analysis of FedHSSL regarding privacy leakage, which is rarely discussed in existing self-supervised VFL works. We investigate the protection mechanism for FedHSSL. The results show our protection can thwart the state-of-the-art label inference attack.  ( 3 min )
    Conviformers: Convolutionally guided Vision Transformer. (arXiv:2208.08900v1 [cs.CV])
    Vision transformers are nowadays the de-facto preference for image classification tasks. There are two broad categories of classification tasks, fine-grained and coarse-grained. In fine-grained classification, the necessity is to discover subtle differences due to the high level of similarity between sub-classes. Such distinctions are often lost as we downscale the image to save the memory and computational cost associated with vision transformers (ViT). In this work, we present an in-depth analysis and describe the critical components for developing a system for the fine-grained categorization of plants from herbarium sheets. Our extensive experimental analysis indicated the need for a better augmentation technique and the ability of modern-day neural networks to handle higher dimensional images. We also introduce a convolutional transformer architecture called Conviformer which, unlike the popular Vision Transformer (ConViT), can handle higher resolution images without exploding memory and computational cost. We also introduce a novel, improved pre-processing technique called PreSizer to resize images better while preserving their original aspect ratios, which proved essential for classifying natural plants. With our simple yet effective approach, we achieved SoTA on Herbarium 202x and iNaturalist 2019 dataset.
    Learning Generative Models for Active Inference using Tensor Networks. (arXiv:2208.08713v1 [cs.LG])
    Active inference provides a general framework for behavior and learning in autonomous agents. It states that an agent will attempt to minimize its variational free energy, defined in terms of beliefs over observations, internal states and policies. Traditionally, every aspect of a discrete active inference model must be specified by hand, i.e.\ by manually defining the hidden state space structure, as well as the required distributions such as likelihood and transition probabilities. Recently, efforts have been made to learn state space representations automatically from observations using deep neural networks. However, these models are typically overparameterized, with the risk of overfitting the data at hand. In this paper, we present a novel approach of learning state spaces using quantum physics-inspired tensor networks. The ability of tensor networks to represent the probabilistic nature of quantum states as well as to reduce large state spaces makes tensor networks a natural candidate for active inference. We show how tensor networks can be used as a generative model for sequential data. Furthermore, we show how one can obtain beliefs from such a generative model and how an active inference agent can use these to compute the expected free energy. Finally, we demonstrate our method on the classic T-maze environment.
    Neural Payoff Machines: Predicting Fair and Stable Payoff Allocations Among Team Members. (arXiv:2208.08798v1 [cs.LG])
    In many multi-agent settings, participants can form teams to achieve collective outcomes that may far surpass their individual capabilities. Measuring the relative contributions of agents and allocating them shares of the reward that promote long-lasting cooperation are difficult tasks. Cooperative game theory offers solution concepts identifying distribution schemes, such as the Shapley value, that fairly reflect the contribution of individuals to the performance of the team or the Core, which reduces the incentive of agents to abandon their team. Applications of such methods include identifying influential features and sharing the costs of joint ventures or team formation. Unfortunately, using these solutions requires tackling a computational barrier as they are hard to compute, even in restricted settings. In this work, we show how cooperative game-theoretic solutions can be distilled into a learned model by training neural networks to propose fair and stable payoff allocations. We show that our approach creates models that can generalize to games far from the training distribution and can predict solutions for more players than observed during training. An important application of our framework is Explainable AI: our approach can be used to speed-up Shapley value computations on many instances.  ( 3 min )
    SWP-LeafNET: A novel multistage approach for plant leaf identification based on deep CNN. (arXiv:2009.05139v2 [cs.CV] UPDATED)
    Modern scientific and technological advances allow botanists to use computer vision-based approaches for plant identification tasks. These approaches have their own challenges. Leaf classification is a computer-vision task performed for the automated identification of plant species, a serious challenge due to variations in leaf morphology, including its size, texture, shape, and venation. Researchers have recently become more inclined toward deep learning-based methods rather than conventional feature-based methods due to the popularity and successful implementation of deep learning methods in image analysis, object recognition, and speech recognition. In this paper, to have an interpretable and reliable system, a botanist's behavior is modeled in leaf identification by proposing a highly-efficient method of maximum behavioral resemblance developed through three deep learning-based models. Different layers of the three models are visualized to ensure that the botanist's behavior is modeled accurately. The first and second models are designed from scratch. Regarding the third model, the pre-trained architecture MobileNetV2 is employed along with the transfer-learning technique. The proposed method is evaluated on two well-known datasets: Flavia and MalayaKew. According to a comparative analysis, the suggested approach is more accurate than hand-crafted feature extraction methods and other deep learning techniques in terms of 99.67% and 99.81% accuracy. Unlike conventional techniques that have their own specific complexities and depend on datasets, the proposed method requires no hand-crafted feature extraction. Also, it increases accuracy as compared with other deep learning techniques. Moreover, SWP-LeafNET is distributable and considerably faster than other methods because of using shallower models with fewer parameters asynchronously.
    On the Universality of the Double Descent Peak in Ridgeless Regression. (arXiv:2010.01851v7 [stat.ML] UPDATED)
    We prove a non-asymptotic distribution-independent lower bound for the expected mean squared generalization error caused by label noise in ridgeless linear regression. Our lower bound generalizes a similar known result to the overparameterized (interpolating) regime. In contrast to most previous works, our analysis applies to a broad class of input distributions with almost surely full-rank feature matrices, which allows us to cover various types of deterministic or random feature maps. Our lower bound is asymptotically sharp and implies that in the presence of label noise, ridgeless linear regression does not perform well around the interpolation threshold for any of these feature maps. We analyze the imposed assumptions in detail and provide a theory for analytic (random) feature maps. Using this theory, we can show that our assumptions are satisfied for input distributions with a (Lebesgue) density and feature maps given by random deep neural networks with analytic activation functions like sigmoid, tanh, softplus or GELU. As further examples, we show that feature maps from random Fourier features and polynomial kernels also satisfy our assumptions. We complement our theory with further experimental and analytic results.
    AoI-based Temporal Attention Graph Neural Network for Popularity Prediction and Content Caching. (arXiv:2208.08606v1 [cs.LG])
    Along with the fast development of network technology and the rapid growth of network equipment, the data throughput is sharply increasing. To handle the problem of backhaul bottleneck in cellular network and satisfy people's requirements about latency, the network architecture like information-centric network (ICN) intends to proactively keep limited popular content at the edge of network based on predicted results. Meanwhile, the interactions between the content (e.g., deep neural network models, Wikipedia-alike knowledge base) and users could be regarded as a dynamic bipartite graph. In this paper, to maximize the cache hit rate, we leverage an effective dynamic graph neural network (DGNN) to jointly learn the structural and temporal patterns embedded in the bipartite graph. Furthermore, in order to have deeper insights into the dynamics within the evolving graph, we propose an age of information (AoI) based attention mechanism to extract valuable historical information while avoiding the problem of message staleness. Combining this aforementioned prediction model, we also develop a cache selection algorithm to make caching decisions in accordance with the prediction results. Extensive results demonstrate that our model can obtain a higher prediction accuracy than other state-of-the-art schemes in two real-world datasets. The results of hit rate further verify the superiority of the caching policy based on our proposed model over other traditional ways.
    Robust Causal Graph Representation Learning against Confounding Effects. (arXiv:2208.08584v1 [cs.LG])
    The prevailing graph neural network models have achieved significant progress in graph representation learning. However, in this paper, we uncover an ever-overlooked phenomenon: the pre-trained graph representation learning model tested with full graphs underperforms the model tested with well-pruned graphs. This observation reveals that there exist confounders in graphs, which may interfere with the model learning semantic information, and current graph representation learning methods have not eliminated their influence. To tackle this issue, we propose Robust Causal Graph Representation Learning (RCGRL) to learn robust graph representations against confounding effects. RCGRL introduces an active approach to generate instrumental variables under unconditional moment restrictions, which empowers the graph representation learning model to eliminate confounders, thereby capturing discriminative information that is causally related to downstream predictions. We offer theorems and proofs to guarantee the theoretical effectiveness of the proposed approach. Empirically, we conduct extensive experiments on a synthetic dataset and multiple benchmark datasets. The results demonstrate that compared with state-of-the-art methods, RCGRL achieves better prediction performance and generalization ability.
    KDD CUP 2022 Wind Power Forecasting Team 88VIP Solution. (arXiv:2208.08952v1 [cs.LG])
    KDD CUP 2022 proposes a time-series forecasting task on spatial dynamic wind power dataset, in which the participants are required to predict the future generation given the historical context factors. The evaluation metrics contain RMSE and MAE. This paper describes the solution of Team 88VIP, which mainly comprises two types of models: a gradient boosting decision tree to memorize the basic data patterns and a recurrent neural network to capture the deep and latent probabilistic transitions. Ensembling these models contributes to tackle the fluctuation of wind power, and training submodels targets on the distinguished properties in heterogeneous timescales of forecasting, from minutes to days. In addition, feature engineering, imputation techniques and the design of offline evaluation are also described in details. The proposed solution achieves an overall online score of -45.213 in Phase 3.
    Profiler: Profile-Based Model to Detect Phishing Emails. (arXiv:2208.08745v1 [cs.CR])
    Email phishing has become more prevalent and grows more sophisticated over time. To combat this rise, many machine learning (ML) algorithms for detecting phishing emails have been developed. However, due to the limited email data sets on which these algorithms train, they are not adept at recognising varied attacks and, thus, suffer from concept drift; attackers can introduce small changes in the statistical characteristics of their emails or websites to successfully bypass detection. Over time, a gap develops between the reported accuracy from literature and the algorithm's actual effectiveness in the real world. This realises itself in frequent false positive and false negative classifications. To this end, we propose a multidimensional risk assessment of emails to reduce the feasibility of an attacker adapting their email and avoiding detection. This horizontal approach to email phishing detection profiles an incoming email on its main features. We develop a risk assessment framework that includes three models which analyse an email's (1) threat level, (2) cognitive manipulation, and (3) email type, which we combine to return the final risk assessment score. The Profiler does not require large data sets to train on to be effective and its analysis of varied email features reduces the impact of concept drift. Our Profiler can be used in conjunction with ML approaches, to reduce their misclassifications or as a labeller for large email data sets in the training stage. We evaluate the efficacy of the Profiler against a machine learning ensemble using state-of-the-art ML algorithms on a data set of 9000 legitimate and 900 phishing emails from a large Australian research organisation. Our results indicate that the Profiler's mitigates the impact of concept drift, and delivers 30% less false positive and 25% less false negative email classifications over the ML ensemble's approach.  ( 3 min )
    Choquet regularization for reinforcement learning. (arXiv:2208.08497v1 [stat.ML])
    We propose \emph{Choquet regularizers} to measure and manage the level of exploration for reinforcement learning (RL), and reformulate the continuous-time entropy-regularized RL problem of Wang et al. (2020, JMLR, 21(198)) in which we replace the differential entropy used for regularization with a Choquet regularizer. We derive the Hamilton--Jacobi--Bellman equation of the problem, and solve it explicitly in the linear--quadratic (LQ) case via maximizing statically a mean--variance constrained Choquet regularizer. Under the LQ setting, we derive explicit optimal distributions for several specific Choquet regularizers, and conversely identify the Choquet regularizers that generate a number of broadly used exploratory samplers such as $\epsilon$-greedy, exponential, uniform and Gaussian.  ( 2 min )
    Geometric Scattering on Measure Spaces. (arXiv:2208.08561v1 [stat.ML])
    The scattering transform is a multilayered, wavelet-based transform initially introduced as a model of convolutional neural networks (CNNs) that has played a foundational role in our understanding of these networks' stability and invariance properties. Subsequently, there has been widespread interest in extending the success of CNNs to data sets with non-Euclidean structure, such as graphs and manifolds, leading to the emerging field of geometric deep learning. In order to improve our understanding of the architectures used in this new field, several papers have proposed generalizations of the scattering transform for non-Euclidean data structures such as undirected graphs and compact Riemannian manifolds without boundary. In this paper, we introduce a general, unified model for geometric scattering on measure spaces. Our proposed framework includes previous work on geometric scattering as special cases but also applies to more general settings such as directed graphs, signed graphs, and manifolds with boundary. We propose a new criterion that identifies to which groups a useful representation should be invariant and show that this criterion is sufficient to guarantee that the scattering transform has desirable stability and invariance properties. Additionally, we consider finite measure spaces that are obtained from randomly sampling an unknown manifold. We propose two methods for constructing a data-driven graph on which the associated graph scattering transform approximates the scattering transform on the underlying manifold. Moreover, we use a diffusion-maps based approach to prove quantitative estimates on the rate of convergence of one of these approximations as the number of sample points tends to infinity. Lastly, we showcase the utility of our method on spherical images, directed graphs, and on high-dimensional single-cell data.  ( 3 min )
    Domain-Specific Risk Minimization. (arXiv:2208.08661v1 [cs.LG])
    Learning a domain-invariant representation has become one of the most popular approaches for domain adaptation/generalization. In this paper, we show that the invariant representation may not be sufficient to guarantee a good generalization, where the labeling function shift should be taken into consideration. Inspired by this, we first derive a new generalization upper bound on the empirical risk that explicitly considers the labeling function shift. We then propose Domain-specific Risk Minimization (DRM), which can model the distribution shifts of different domains separately and select the most appropriate one for the target domain. Extensive experiments on four popular domain generalization datasets, CMNIST, PACS, VLCS, and DomainNet, demonstrate the effectiveness of the proposed DRM for domain generalization with the following advantages: 1) it significantly outperforms competitive baselines; 2) it enables either comparable or superior accuracies on all training domains comparing to vanilla empirical risk minimization (ERM); 3) it remains very simple and efficient during training, and 4) it is complementary to invariant learning approaches.  ( 2 min )
    Lifted Bregman Training of Neural Networks. (arXiv:2208.08772v1 [math.OC])
    We introduce a novel mathematical formulation for the training of feed-forward neural networks with (potentially non-smooth) proximal maps as activation functions. This formulation is based on Bregman distances and a key advantage is that its partial derivatives with respect to the network's parameters do not require the computation of derivatives of the network's activation functions. Instead of estimating the parameters with a combination of first-order optimisation method and back-propagation (as is the state-of-the-art), we propose the use of non-smooth first-order optimisation methods that exploit the specific structure of the novel formulation. We present several numerical results that demonstrate that these training approaches can be equally well or even better suited for the training of neural network-based classifiers and (denoising) autoencoders with sparse coding compared to more conventional training frameworks.  ( 2 min )
    NET-FLEET: Achieving Linear Convergence Speedup for Fully Decentralized Federated Learning with Heterogeneous Data. (arXiv:2208.08490v1 [cs.LG])
    Federated learning (FL) has received a surge of interest in recent years thanks to its benefits in data privacy protection, efficient communication, and parallel data processing. Also, with appropriate algorithmic designs, one could achieve the desirable linear speedup for convergence effect in FL. However, most existing works on FL are limited to systems with i.i.d. data and centralized parameter servers and results on decentralized FL with heterogeneous datasets remains limited. Moreover, whether or not the linear speedup for convergence is achievable under fully decentralized FL with data heterogeneity remains an open question. In this paper, we address these challenges by proposing a new algorithm, called NET-FLEET, for fully decentralized FL systems with data heterogeneity. The key idea of our algorithm is to enhance the local update scheme in FL (originally intended for communication efficiency) by incorporating a recursive gradient correction technique to handle heterogeneous datasets. We show that, under appropriate parameter settings, the proposed NET-FLEET algorithm achieves a linear speedup for convergence. We further conduct extensive numerical experiments to evaluate the performance of the proposed NET-FLEET algorithm and verify our theoretical findings.  ( 2 min )
    CTRL: Clustering Training Losses for Label Error Detection. (arXiv:2208.08464v1 [cs.LG])
    In supervised machine learning, use of correct labels is extremely important to ensure high accuracy. Unfortunately, most datasets contain corrupted labels. Machine learning models trained on such datasets do not generalize well. Thus, detecting their label errors can significantly increase their efficacy. We propose a novel framework, called CTRL (Clustering TRaining Losses for label error detection), to detect label errors in multi-class datasets. It detects label errors in two steps based on the observation that models learn clean and noisy labels in different ways. First, we train a neural network using the noisy training dataset and obtain the loss curve for each sample. Then, we apply clustering algorithms to the training losses to group samples into two categories: cleanly-labeled and noisily-labeled. After label error detection, we remove samples with noisy labels and retrain the model. Our experimental results demonstrate state-of-the-art error detection accuracy on both image (CIFAR-10 and CIFAR-100) and tabular datasets under simulated noise. We also use a theoretical analysis to provide insights into why CTRL performs so well.  ( 2 min )
    Generating Synthetic Clinical Data that Capture Class Imbalanced Distributions with Generative Adversarial Networks: Example using Antiretroviral Therapy for HIV. (arXiv:2208.08655v1 [cs.LG])
    Clinical data usually cannot be freely distributed due to their highly confidential nature and this hampers the development of machine learning in the healthcare domain. One way to mitigate this problem is by generating realistic synthetic datasets using generative adversarial networks (GANs). However, GANs are known to suffer from mode collapse and thus creating outputs of low diveristy. In this paper, we extend the classic GAN setup with an external memory to replay features from real samples. Using antiretroviral therapy for human immunodeficiency virus (ART for HIV) as a case study, we show that our extended setup increases convergence and more importantly, it is effective in capturing the severe class imbalanced distributions common to real world clinical data.  ( 2 min )
    Challenges and opportunities in applying Neural Temporal Point Processes to large scale industry data. (arXiv:2208.08623v1 [cs.LG])
    In this work, we identify open research opportunities in applying Neural Temporal Point Process (NTPP) models to industry scale customer behavior data by carefully reproducing NTPP models published up to date on known literature benchmarks as well as applying NTPP models to a novel, real world consumer behavior dataset that is twice as large as the largest publicly available NTPP benchmark. We identify the following challenges. First, NTPP models, albeit their generative nature, remain vulnerable to dataset imbalances and cannot forecast rare events. Second, NTPP models based on stochastic differential equations, despite their theoretical appeal and leading performance on literature benchmarks, do not scale easily to large industry-scale data. The former is in light of previously made observations on deep generative models. Additionally, to combat a cold-start problem, we explore a novel addition to NTPP models - a parametrization based on static user features.  ( 2 min )
    Speech Representation Disentanglement with Adversarial Mutual Information Learning for One-shot Voice Conversion. (arXiv:2208.08757v1 [eess.AS])
    One-shot voice conversion (VC) with only a single target speaker's speech for reference has become a hot research topic. Existing works generally disentangle timbre, while information about pitch, rhythm and content is still mixed together. To perform one-shot VC effectively with further disentangling these speech components, we employ random resampling for pitch and content encoder and use the variational contrastive log-ratio upper bound of mutual information and gradient reversal layer based adversarial mutual information learning to ensure the different parts of the latent space containing only the desired disentangled representation during training. Experiments on the VCTK dataset show the model achieves state-of-the-art performance for one-shot VC in terms of naturalness and intellgibility. In addition, we can transfer characteristics of one-shot VC on timbre, pitch and rhythm separately by speech representation disentanglement. Our code, pre-trained models and demo are available at https://im1eon.github.io/IS2022-SRDVC/.  ( 2 min )
    Learning with Local Gradients at the Edge. (arXiv:2208.08503v1 [cs.LG])
    To enable learning on edge devices with fast convergence and low memory, we present a novel backpropagation-free optimization algorithm dubbed Target Projection Stochastic Gradient Descent (tpSGD). tpSGD generalizes direct random target projection to work with arbitrary loss functions and extends target projection for training recurrent neural networks (RNNs) in addition to feedforward networks. tpSGD uses layer-wise stochastic gradient descent (SGD) and local targets generated via random projections of the labels to train the network layer-by-layer with only forward passes. tpSGD doesn't require retaining gradients during optimization, greatly reducing memory allocation compared to SGD backpropagation (BP) methods that require multiple instances of the entire neural network weights, input/output, and intermediate results. Our method performs comparably to BP gradient-descent within 5% accuracy on relatively shallow networks of fully connected layers, convolutional layers, and recurrent layers. tpSGD also outperforms other state-of-the-art gradient-free algorithms in shallow models consisting of multi-layer perceptrons, convolutional neural networks (CNNs), and RNNs with competitive accuracy and less memory and time. We evaluate the performance of tpSGD in training deep neural networks (e.g. VGG) and extend the approach to multi-layer RNNs. These experiments highlight new research directions related to optimized layer-based adaptor training for domain-shift using tpSGD at the edge.  ( 2 min )
    Nearly Optimal Latent State Decoding in Block MDPs. (arXiv:2208.08480v1 [cs.LG])
    We investigate the problems of model estimation and reward-free learning in episodic Block MDPs. In these MDPs, the decision maker has access to rich observations or contexts generated from a small number of latent states. We are first interested in estimating the latent state decoding function (the mapping from the observations to latent states) based on data generated under a fixed behavior policy. We derive an information-theoretical lower bound on the error rate for estimating this function and present an algorithm approaching this fundamental limit. In turn, our algorithm also provides estimates of all the components of the MDP. We then study the problem of learning near-optimal policies in the reward-free framework. Based on our efficient model estimation algorithm, we show that we can infer a policy converging (as the number of collected samples grows large) to the optimal policy at the best possible rate. Interestingly, our analysis provides necessary and sufficient conditions under which exploiting the block structure yields improvements in the sample complexity for identifying near-optimal policies. When these conditions are met, the sample complexity in the minimax reward-free setting is improved by a multiplicative factor $n$, where $n$ is the number of possible contexts.  ( 2 min )
    CP-PINNs: Changepoints Detection in PDEs using Physics Informed Neural Networks with Total-Variation Penalty. (arXiv:2208.08626v1 [stat.ML])
    We consider the inverse problem for the Partial Differential Equations (PDEs) such that the parameters of the dependency structure can exhibit random changepoints over time. This can arise, for example, when the physical system is either under malicious attack (e.g., hacker attacks on power grids and internet networks) or subject to extreme external conditions (e.g., weather conditions impacting electricity grids or large market movements impacting valuations of derivative contracts). For that purpose, we employ Physics Informed Neural Networks (PINNs) -- universal approximators that can incorporate prior information from any physical law described by a system of PDEs. This prior knowledge acts in the training of the neural network as a regularization that limits the space of admissible solutions and increases the correctness of the function approximation. We show that when the true data generating process exhibits changepoints in the PDE dynamics, this regularization can lead to a complete miss-calibration and a failure of the model. Therefore, we propose an extension of PINNs using a Total-Variation penalty which accommodates (multiple) changepoints in the PDE dynamics. These changepoints can occur at random locations over time, and they are estimated together with the solutions. We propose an additional refinement algorithm that combines changepoints detection with a reduced dynamic programming method that is feasible for the computationally intensive PINNs methods, and we demonstrate the benefits of the proposed model empirically using examples of different equations with changes in the parameters. In case of no changepoints in the data, the proposed model reduces to the original PINNs model. In the presence of changepoints, it leads to improvements in parameter estimation, better model fitting, and a lower training error compared to the original PINNs model.  ( 3 min )
    Resisting Adversarial Attacks in Deep Neural Networks using Diverse Decision Boundaries. (arXiv:2208.08697v1 [cs.LG])
    The security of deep learning (DL) systems is an extremely important field of study as they are being deployed in several applications due to their ever-improving performance to solve challenging tasks. Despite overwhelming promises, the deep learning systems are vulnerable to crafted adversarial examples, which may be imperceptible to the human eye, but can lead the model to misclassify. Protections against adversarial perturbations on ensemble-based techniques have either been shown to be vulnerable to stronger adversaries or shown to lack an end-to-end evaluation. In this paper, we attempt to develop a new ensemble-based solution that constructs defender models with diverse decision boundaries with respect to the original model. The ensemble of classifiers constructed by (1) transformation of the input by a method called Split-and-Shuffle, and (2) restricting the significant features by a method called Contrast-Significant-Features are shown to result in diverse gradients with respect to adversarial attacks, which reduces the chance of transferring adversarial examples from the original to the defender model targeting the same class. We present extensive experimentations using standard image classification datasets, namely MNIST, CIFAR-10 and CIFAR-100 against state-of-the-art adversarial attacks to demonstrate the robustness of the proposed ensemble-based defense. We also evaluate the robustness in the presence of a stronger adversary targeting all the models within the ensemble simultaneously. Results for the overall false positives and false negatives have been furnished to estimate the overall performance of the proposed methodology.  ( 3 min )
    A Tree-structured Transformer for Program Representation Learning. (arXiv:2208.08643v1 [cs.SE])
    When using deep learning techniques to model program languages, neural networks with tree or graph structures are widely adopted to capture the rich structural information within program abstract syntax trees (AST). However, long-term/global dependencies widely exist in programs, and most of these neural architectures fail to capture these dependencies. In this paper, we propose Tree-Transformer, a novel recursive tree-structured neural network which aims to overcome the above limitations. Tree-Transformer leverages two multi-head attention units to model the dependency between siblings and parent-children node pairs. Moreover, we propose a bi-directional propagation strategy to allow node information passing in two directions: bottom-up and top-down along trees. By combining bottom-up and top-down propagation, Tree-Transformer can learn both global contexts and meaningful node features. The extensive experimental results show that our Tree-Transformer outperforms existing tree-based or graph-based neural networks in program-related tasks with tree-level and node-level prediction tasks, indicating that Tree-Transformer performs well on learning both tree-level and node-level representations.  ( 2 min )
    Musika! Fast Infinite Waveform Music Generation. (arXiv:2208.08706v1 [cs.SD])
    Fast and user-controllable music generation could enable novel ways of composing or performing music. However, state-of-the-art music generation systems require large amounts of data and computational resources for training, and are slow at inference. This makes them impractical for real-time interactive use. In this work, we introduce Musika, a music generation system that can be trained on hundreds of hours of music using a single consumer GPU, and that allows for much faster than real-time generation of music of arbitrary length on a consumer CPU. We achieve this by first learning a compact invertible representation of spectrogram magnitudes and phases with adversarial autoencoders, then training a Generative Adversarial Network (GAN) on this representation for a particular music domain. A latent coordinate system enables generating arbitrarily long sequences of excerpts in parallel, while a global context vector allows the music to remain stylistically coherent through time. We perform quantitative evaluations to assess the quality of the generated samples and showcase options for user control in piano and techno music generation. We release the source code and pretrained autoencoder weights at github.com/marcoppasini/musika, such that a GAN can be trained on a new music domain with a single GPU in a matter of hours.  ( 2 min )
    Complex-Value Spatio-temporal Graph Convolutional Neural Networks and its Applications to Electric Power Systems AI. (arXiv:2208.08485v1 [cs.LG])
    The effective representation, precessing, analysis, and visualization of large-scale structured data over graphs are gaining a lot of attention. So far most of the literature has focused on real-valued signals. However, signals are often sparse in the Fourier domain, and more informative and compact representations for them can be obtained using the complex envelope of their spectral components, as opposed to the original real-valued signals. Motivated by this fact, in this work we generalize graph convolutional neural networks (GCN) to the complex domain, deriving the theory that allows to incorporate a complex-valued graph shift operators (GSO) in the definition of graph filters (GF) and process complex-valued graph signals (GS). The theory developed can handle spatio-temporal complex network processes. We prove that complex-valued GCNs are stable with respect to perturbations of the underlying graph support, the bound of the transfer error and the bound of error propagation through multiply layers. Then we apply complex GCN to power grid state forecasting, power grid cyber-attack detection and localization.  ( 2 min )
    Enhancing Diffusion-Based Image Synthesis with Robust Classifier Guidance. (arXiv:2208.08664v1 [cs.CV])
    Denoising diffusion probabilistic models (DDPMs) are a recent family of generative models that achieve state-of-the-art results. In order to obtain class-conditional generation, it was suggested to guide the diffusion process by gradients from a time-dependent classifier. While the idea is theoretically sound, deep learning-based classifiers are infamously susceptible to gradient-based adversarial attacks. Therefore, while traditional classifiers may achieve good accuracy scores, their gradients are possibly unreliable and might hinder the improvement of the generation results. Recent work discovered that adversarially robust classifiers exhibit gradients that are aligned with human perception, and these could better guide a generative process towards semantically meaningful images. We utilize this observation by defining and training a time-dependent adversarially robust classifier and use it as guidance for a generative diffusion model. In experiments on the highly challenging and diverse ImageNet dataset, our scheme introduces significantly more intelligible intermediate gradients, better alignment with theoretical findings, as well as improved generation results under several evaluation metrics. Furthermore, we conduct an opinion survey whose findings indicate that human raters prefer our method's results.  ( 2 min )
    Intelligent problem-solving as integrated hierarchical reinforcement learning. (arXiv:2208.08731v1 [cs.AI])
    According to cognitive psychology and related disciplines, the development of complex problem-solving behaviour in biological agents depends on hierarchical cognitive mechanisms. Hierarchical reinforcement learning is a promising computational approach that may eventually yield comparable problem-solving behaviour in artificial agents and robots. However, to date the problem-solving abilities of many human and non-human animals are clearly superior to those of artificial systems. Here, we propose steps to integrate biologically inspired hierarchical mechanisms to enable advanced problem-solving skills in artificial agents. Therefore, we first review the literature in cognitive psychology to highlight the importance of compositional abstraction and predictive processing. Then we relate the gained insights with contemporary hierarchical reinforcement learning methods. Interestingly, our results suggest that all identified cognitive mechanisms have been implemented individually in isolated computational architectures, raising the question of why there exists no single unifying architecture that integrates them. As our final contribution, we address this question by providing an integrative perspective on the computational challenges to develop such a unifying architecture. We expect our results to guide the development of more sophisticated cognitively inspired hierarchical machine learning architectures.  ( 3 min )
    Tree species classification from hyperspectral data using graph-regularized neural networks. (arXiv:2208.08675v1 [cs.CV])
    Manual labeling of tree species remains a challenging task, especially in tropical regions, owing to inaccessibility and labor-intensive ground-based surveys. Hyperspectral images (HSIs), through their narrow and contiguous bands, can assist in distinguishing tree species based on their spectral properties. Therefore, automated classification algorithms on HSI images can help augment the limited labeled information and generate a real-time classification map for various tree species. Achieving high classification accuracy with a limited amount of labeled information in an image is one of the key challenges that researchers have started addressing in recent years. We propose a novel graph-regularized neural network (GRNN) algorithm that encompasses the superpixel-based segmentation for graph construction, a pixel-wise neural network classifier, and the label propagation technique to generate an accurate classification map. GRNN outperforms several state-of-the-art techniques not only for the standard Indian Pines HSI but also achieves a high classification accuracy (approx. 92%) on a new HSI data set collected over the forests of French Guiana (FG) even when less than 1% of the pixels are labeled. We show that GRNN is not only competitive with the state-of-the-art semi-supervised methods, but also exhibits lower variance in accuracy for different number of training samples and over different independent random sampling of the labeled pixels for training.  ( 3 min )
  • Open

    Geometric Scattering on Measure Spaces. (arXiv:2208.08561v1 [stat.ML])
    The scattering transform is a multilayered, wavelet-based transform initially introduced as a model of convolutional neural networks (CNNs) that has played a foundational role in our understanding of these networks' stability and invariance properties. Subsequently, there has been widespread interest in extending the success of CNNs to data sets with non-Euclidean structure, such as graphs and manifolds, leading to the emerging field of geometric deep learning. In order to improve our understanding of the architectures used in this new field, several papers have proposed generalizations of the scattering transform for non-Euclidean data structures such as undirected graphs and compact Riemannian manifolds without boundary. In this paper, we introduce a general, unified model for geometric scattering on measure spaces. Our proposed framework includes previous work on geometric scattering as special cases but also applies to more general settings such as directed graphs, signed graphs, and manifolds with boundary. We propose a new criterion that identifies to which groups a useful representation should be invariant and show that this criterion is sufficient to guarantee that the scattering transform has desirable stability and invariance properties. Additionally, we consider finite measure spaces that are obtained from randomly sampling an unknown manifold. We propose two methods for constructing a data-driven graph on which the associated graph scattering transform approximates the scattering transform on the underlying manifold. Moreover, we use a diffusion-maps based approach to prove quantitative estimates on the rate of convergence of one of these approximations as the number of sample points tends to infinity. Lastly, we showcase the utility of our method on spherical images, directed graphs, and on high-dimensional single-cell data.
    High Probability Bounds for Stochastic Subgradient Schemes with Heavy Tailed Noise. (arXiv:2208.08567v1 [math.OC])
    In this work we study high probability bounds for stochastic subgradient methods under heavy tailed noise. In this case the noise is only assumed to have finite variance as opposed to a sub-Gaussian distribution for which it is known that standard subgradient methods enjoys high probability bounds. We analyzed a clipped version of the projected stochastic subgradient method, where subgradient estimates are truncated whenever they have large norms. We show that this clipping strategy leads both to near optimal any-time and finite horizon bounds for many classical averaging schemes. Preliminary experiments are shown to support the validity of the method.
    Acquisition of Chess Knowledge in AlphaZero. (arXiv:2111.09259v3 [cs.AI] UPDATED)
    What is learned by sophisticated neural network agents such as AlphaZero? This question is of both scientific and practical interest. If the representations of strong neural networks bear no resemblance to human concepts, our ability to understand faithful explanations of their decisions will be restricted, ultimately limiting what we can achieve with neural network interpretability. In this work we provide evidence that human knowledge is acquired by the AlphaZero neural network as it trains on the game of chess. By probing for a broad range of human chess concepts we show when and where these concepts are represented in the AlphaZero network. We also provide a behavioural analysis focusing on opening play, including qualitative analysis from chess Grandmaster Vladimir Kramnik. Finally, we carry out a preliminary investigation looking at the low-level details of AlphaZero's representations, and make the resulting behavioural and representational analyses available online.
    A Framework and Benchmark for Deep Batch Active Learning for Regression. (arXiv:2203.09410v2 [stat.ML] UPDATED)
    The acquisition of labels for supervised learning can be expensive. In order to improve the sample-efficiency of neural network regression, we study active learning methods that adaptively select batches of unlabeled data for labeling. We present a framework for constructing such methods out of (network-dependent) base kernels, kernel transformations and selection methods. Our framework encompasses many existing Bayesian methods based on Gaussian Process approximations of neural networks as well as non-Bayesian methods. Additionally, we propose to replace the commonly used last-layer features with sketched finite-width Neural Tangent Kernels, and to combine them with a novel clustering method. To evaluate different methods, we introduce an open-source benchmark consisting of 15 large tabular regression data sets. Our proposed method outperforms the state-of-the-art on our benchmark, scales to large data sets, and works out-of-the-box without adjusting the network architecture or training code. We provide open-source code that includes efficient implementations of all kernels, kernel transformations, and selection methods, and can be used for reproducing our results.
    How many perturbations break this model? Evaluating robustness beyond adversarial accuracy. (arXiv:2207.04129v2 [cs.LG] UPDATED)
    Robustness to adversarial attack is typically evaluated with adversarial accuracy. This metric quantifies the number of points for which, given a threat model, successful adversarial perturbations cannot be found. While essential, this metric does not capture all aspects of robustness and in particular leaves out the question of how many perturbations can be found for each point. In this work we introduce an alternative approach, adversarial sparsity, which quantifies how difficult it is to find a successful perturbation given both an input point and a constraint on the direction of the perturbation. This constraint may be angular (L2 perturbations), or based on the number of pixels (Linf perturbations). We show that sparsity provides valuable insight on neural networks in multiple ways. analyzing the sparsity of existing robust models illustrates important differences between them that accuracy analysis does not, and suggests approaches for improving their robustness. When applying broken defenses effective against weak attacks but not strong ones, sparsity can discriminate between the totally ineffective and the partially effective defenses. Finally, with sparsity we can measure increases in robustness that do not affect accuracy: we show for example that data augmentation can by itself increase adversarial robustness, without using adversarial training.
    High Dimensional Statistical Estimation under Uniformly Dithered One-bit Quantization. (arXiv:2202.13157v2 [stat.ML] UPDATED)
    In this paper, we propose a uniformly dithered one-bit quantization scheme for high-dimensional statistical estimation. The scheme contains truncation, dithering, and quantization as typical steps. As canonical examples, the quantization scheme is applied to three estimation problems: sparse covariance matrix estimation, sparse linear regression, and matrix completion. We study both sub-Gaussian and heavy-tailed regimes, with the underlying distribution of heavy-tailed data assumed to possess bounded second or fourth moment. For each model we propose new estimators based on one-bit quantized data. In sub-Gaussian regime, our estimators achieve optimal minimax rates up to logarithmic factors, which indicates that our quantization scheme nearly introduces no additional cost. In heavy-tailed regime, while the rates of our estimators become essentially slower, these results are either the first ones in such one-bit quantized and heavy-tailed setting, or exhibit significant improvements over existing comparable results. Moreover, we contribute considerably to the problems of one-bit compressed sensing and one-bit matrix completion. Specifically, we extend one-bit compressed sensing to sub-Gaussian or even heavy-tailed sensing vectors via convex programming. For one-bit matrix completion, our method is essentially different from the standard likelihood approach and can handle pre-quantization random noise with unknown distribution. Experimental results on synthetic data are presented to support our theoretical analysis.
    Estimating individual treatment effects under unobserved confounding using binary instruments. (arXiv:2208.08544v1 [stat.ME])
    Estimating individual treatment effects (ITEs) from observational data is relevant in many fields such as personalized medicine. However, in practice, the treatment assignment is usually confounded by unobserved variables and thus introduces bias. A remedy to remove the bias is the use of instrumental variables (IVs). Such settings are widespread in medicine (e.g., trials where compliance is used as binary IV). In this paper, we propose a novel, multiply robust machine learning framework, called MRIV, for estimating ITEs using binary IVs and thus yield an unbiased ITE estimator. Different from previous work for binary IVs, our framework estimates the ITE directly via a pseudo outcome regression. (1) We provide a theoretical analysis where we show that our framework yields multiply robust convergence rates: our ITE estimator achieves fast convergence even if several nuisance estimators converge slowly. (2) We further show that our framework asymptotically outperforms state-of-the-art plug-in IV methods for ITE estimation. (3) We build upon our theoretical results and propose a tailored deep neural network architecture called MRIV-Net for ITE estimation using binary IVs. Across various computational experiments, we demonstrate empirically that our MRIV-Net achieves state-of-the-art performance. To the best of our knowledge, our MRIV is the first machine learning framework for estimating ITEs in the binary IV setting shown to be multiply robust.
    Meta Sparse Principle Component Analysis. (arXiv:2208.08938v1 [stat.ML])
    We study the meta-learning for support (i.e. the set of non-zero entries) recovery in high-dimensional Principal Component Analysis. We reduce the sufficient sample complexity in a novel task with the information that is learned from auxiliary tasks. We assume each task to be a different random Principal Component (PC) matrix with a possibly different support and that the support union of the PC matrices is small. We then pool the data from all the tasks to execute an improper estimation of a single PC matrix by maximising the $l_1$-regularised predictive covariance to establish that with high probability the true support union can be recovered provided a sufficient number of tasks $m$ and a sufficient number of samples $ O\left(\frac{\log(p)}{m}\right)$ for each task, for $p$-dimensional vectors. Then, for a novel task, we prove that the maximisation of the $l_1$-regularised predictive covariance with the additional constraint that the support is a subset of the estimated support union could reduce the sufficient sample complexity of successful support recovery to $O(\log |J|)$, where $J$ is the support union recovered from the auxiliary tasks. Typically, $|J|$ would be much less than $p$ for sparse matrices. Finally, we demonstrate the validity of our experiments through numerical simulations.
    DIET: Conditional independence testing with marginal dependence measures of residual information. (arXiv:2208.08579v1 [stat.ME])
    Conditional randomization tests (CRTs) assess whether a variable $x$ is predictive of another variable $y$, having observed covariates $z$. CRTs require fitting a large number of predictive models, which is often computationally intractable. Existing solutions to reduce the cost of CRTs typically split the dataset into a train and test portion, or rely on heuristics for interactions, both of which lead to a loss in power. We propose the decoupled independence test (DIET), an algorithm that avoids both of these issues by leveraging marginal independence statistics to test conditional independence relationships. DIET tests the marginal independence of two random variables: $F(x \mid z)$ and $F(y \mid z)$ where $F(\cdot \mid z)$ is a conditional cumulative distribution function (CDF). These variables are termed "information residuals." We give sufficient conditions for DIET to achieve finite sample type-1 error control and power greater than the type-1 error rate. We then prove that when using the mutual information between the information residuals as a test statistic, DIET yields the most powerful conditionally valid test. Finally, we show DIET achieves higher power than other tractable CRTs on several synthetic and real benchmarks.
    Memory and Capacity of Graph Embedding Methods. (arXiv:2208.08769v1 [stat.ML])
    We introduce a method for embedding graphs as vectors in a structure-preserving manner. In this paper, we showcase its rich representational capacity and give some theoretical properties of our method. In particular, our procedure falls under the bind-and-sum approach, and we show that our binding operation - the tensor product - is the most general binding operation that respects the principle of superposition. Similarly, we show that the spherical code achieves optimal compression. We then establish some precise results characterizing the performance our method as well as some experimental results showcasing how it can accurately perform various graph operations even when the number of edges is quite large. Finally, we conclude with establishing a link to adjacency matrices, showing that our method is, in some sense, a generalization of adjacency matrices with applications towards large sparse graphs.
    Network inference via process motifs for lagged correlation in linear stochastic processes. (arXiv:2208.08871v1 [stat.ML])
    A major challenge for causal inference from time-series data is the trade-off between computational feasibility and accuracy. Motivated by process motifs for lagged covariance in an autoregressive model with slow mean-reversion, we propose to infer networks of causal relations via pairwise edge measure (PEMs) that one can easily compute from lagged correlation matrices. Motivated by contributions of process motifs to covariance and lagged variance, we formulate two PEMs that correct for confounding factors and for reverse causation. To demonstrate the performance of our PEMs, we consider network interference from simulations of linear stochastic processes, and we show that our proposed PEMs can infer networks accurately and efficiently. Specifically, for slightly autocorrelated time-series data, our approach achieves accuracies higher than or similar to Granger causality, transfer entropy, and convergent crossmapping -- but with much shorter computation time than possible with any of these methods. Our fast and accurate PEMs are easy-to-implement methods for network inference with a clear theoretical underpinning. They provide promising alternatives to current paradigms for the inference of linear models from time-series data, including Granger causality, vector-autoregression, and sparse inverse covariance estimation.
    CP-PINNs: Changepoints Detection in PDEs using Physics Informed Neural Networks with Total-Variation Penalty. (arXiv:2208.08626v1 [stat.ML])
    We consider the inverse problem for the Partial Differential Equations (PDEs) such that the parameters of the dependency structure can exhibit random changepoints over time. This can arise, for example, when the physical system is either under malicious attack (e.g., hacker attacks on power grids and internet networks) or subject to extreme external conditions (e.g., weather conditions impacting electricity grids or large market movements impacting valuations of derivative contracts). For that purpose, we employ Physics Informed Neural Networks (PINNs) -- universal approximators that can incorporate prior information from any physical law described by a system of PDEs. This prior knowledge acts in the training of the neural network as a regularization that limits the space of admissible solutions and increases the correctness of the function approximation. We show that when the true data generating process exhibits changepoints in the PDE dynamics, this regularization can lead to a complete miss-calibration and a failure of the model. Therefore, we propose an extension of PINNs using a Total-Variation penalty which accommodates (multiple) changepoints in the PDE dynamics. These changepoints can occur at random locations over time, and they are estimated together with the solutions. We propose an additional refinement algorithm that combines changepoints detection with a reduced dynamic programming method that is feasible for the computationally intensive PINNs methods, and we demonstrate the benefits of the proposed model empirically using examples of different equations with changes in the parameters. In case of no changepoints in the data, the proposed model reduces to the original PINNs model. In the presence of changepoints, it leads to improvements in parameter estimation, better model fitting, and a lower training error compared to the original PINNs model.
    ManiFlow: Implicitly Representing Manifolds with Normalizing Flows. (arXiv:2208.08932v1 [cs.CV])
    Normalizing Flows (NFs) are flexible explicit generative models that have been shown to accurately model complex real-world data distributions. However, their invertibility constraint imposes limitations on data distributions that reside on lower dimensional manifolds embedded in higher dimensional space. Practically, this shortcoming is often bypassed by adding noise to the data which impacts the quality of the generated samples. In contrast to prior work, we approach this problem by generating samples from the original data distribution given full knowledge about the perturbed distribution and the noise model. To this end, we establish that NFs trained on perturbed data implicitly represent the manifold in regions of maximum likelihood. Then, we propose an optimization objective that recovers the most likely point on the manifold given a sample from the perturbed distribution. Finally, we focus on 3D point clouds for which we utilize the explicit nature of NFs, i.e. surface normals extracted from the gradient of the log-likelihood and the log-likelihood itself, to apply Poisson surface reconstruction to refine generated point sets.
    Choquet regularization for reinforcement learning. (arXiv:2208.08497v1 [stat.ML])
    We propose \emph{Choquet regularizers} to measure and manage the level of exploration for reinforcement learning (RL), and reformulate the continuous-time entropy-regularized RL problem of Wang et al. (2020, JMLR, 21(198)) in which we replace the differential entropy used for regularization with a Choquet regularizer. We derive the Hamilton--Jacobi--Bellman equation of the problem, and solve it explicitly in the linear--quadratic (LQ) case via maximizing statically a mean--variance constrained Choquet regularizer. Under the LQ setting, we derive explicit optimal distributions for several specific Choquet regularizers, and conversely identify the Choquet regularizers that generate a number of broadly used exploratory samplers such as $\epsilon$-greedy, exponential, uniform and Gaussian.
    On an Application of Generative Adversarial Networks on Remaining Lifetime Estimation. (arXiv:2208.08666v1 [cs.LG])
    A major problem of structural health monitoring (SHM) has been the prognosis of damage and the definition of the remaining useful life of a structure. Both tasks depend on many parameters, many of which are often uncertain. Many models have been developed for the aforementioned tasks but they have been either deterministic or stochastic with the ability to take into account only a restricted amount of past states of the structure. In the current work, a generative model is proposed in order to make predictions about the damage evolution of structures. The model is able to perform in a population-based SHM (PBSHM) framework, to take into account many past states of the damaged structure, to incorporate uncertainties in the modelling process and to generate potential damage evolution outcomes according to data acquired from a structure. The algorithm is tested on a simulated damage evolution example and the results reveal that it is able to provide quite confident predictions about the remaining useful life of structures within a population.
    Discovering Bugs in Vision Models using Off-the-shelf Image Generation and Captioning. (arXiv:2208.08831v1 [cs.CV])
    Automatically discovering failures in vision models under real-world settings remains an open challenge. This work demonstrates how off-the-shelf, large-scale, image-to-text and text-to-image models, trained on vast amounts of data, can be leveraged to automatically find such failures. In essence, a conditional text-to-image generative model is used to generate large amounts of synthetic, yet realistic, inputs given a ground-truth label. Misclassified inputs are clustered and a captioning model is used to describe each cluster. Each cluster's description is used in turn to generate more inputs and assess whether specific clusters induce more failures than expected. We use this pipeline to demonstrate that we can effectively interrogate classifiers trained on ImageNet to find specific failure cases and discover spurious correlations. We also show that we can scale the approach to generate adversarial datasets targeting specific classifier architectures. This work serves as a proof-of-concept demonstrating the utility of large-scale generative models to automatically discover bugs in vision models in an open-ended manner. We also describe a number of limitations and pitfalls related to this approach.
    Nearly Optimal Latent State Decoding in Block MDPs. (arXiv:2208.08480v1 [cs.LG])
    We investigate the problems of model estimation and reward-free learning in episodic Block MDPs. In these MDPs, the decision maker has access to rich observations or contexts generated from a small number of latent states. We are first interested in estimating the latent state decoding function (the mapping from the observations to latent states) based on data generated under a fixed behavior policy. We derive an information-theoretical lower bound on the error rate for estimating this function and present an algorithm approaching this fundamental limit. In turn, our algorithm also provides estimates of all the components of the MDP. We then study the problem of learning near-optimal policies in the reward-free framework. Based on our efficient model estimation algorithm, we show that we can infer a policy converging (as the number of collected samples grows large) to the optimal policy at the best possible rate. Interestingly, our analysis provides necessary and sufficient conditions under which exploiting the block structure yields improvements in the sample complexity for identifying near-optimal policies. When these conditions are met, the sample complexity in the minimax reward-free setting is improved by a multiplicative factor $n$, where $n$ is the number of possible contexts.
    Data-driven emergence of convolutional structure in neural networks. (arXiv:2202.00565v2 [cond-mat.dis-nn] UPDATED)
    Exploiting data invariances is crucial for efficient learning in both artificial and biological neural circuits. Understanding how neural networks can discover appropriate representations capable of harnessing the underlying symmetries of their inputs is thus crucial in machine learning and neuroscience. Convolutional neural networks, for example, were designed to exploit translation symmetry and their capabilities triggered the first wave of deep learning successes. However, learning convolutions directly from translation-invariant data with a fully-connected network has so far proven elusive. Here, we show how initially fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs, resulting in localised, space-tiling receptive fields. These receptive fields match the filters of a convolutional network trained on the same task. By carefully designing data models for the visual scene, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs, which has long been recognised as the hallmark of natural images. We provide an analytical and numerical characterisation of the pattern-formation mechanism responsible for this phenomenon in a simple model and find an unexpected link between receptive field formation and tensor decomposition of higher-order input correlations. These results provide a new perspective on the development of low-level feature detectors in various sensory modalities, and pave the way for studying the impact of higher-order statistics on learning in neural networks.
    A spatiotemporal machine learning approach to forecasting COVID-19 incidence at the county level in the USA. (arXiv:2109.12094v4 [stat.ML] UPDATED)
    With COVID-19 affecting every country globally and changing everyday life, the ability to forecast the spread of the disease is more important than any previous epidemic. The conventional methods of disease-spread modeling, compartmental models, are based on the assumption of spatiotemporal homogeneity of the spread of the virus, which may cause forecasting to underperform, especially at high spatial resolutions. In this paper we approach the forecasting task with an alternative technique - spatiotemporal machine learning. We present COVID-LSTM, a data-driven model based on a Long Short-term Memory deep learning architecture for forecasting COVID-19 incidence at the county-level in the US. We use the weekly number of new positive cases as temporal input, and hand-engineered spatial features from Facebook movement and connectedness datasets to capture the spread of the disease in time and space. COVID-LSTM outperforms the COVID-19 Forecast Hub's Ensemble model (COVIDhub-ensemble) on our 17-week evaluation period, making it the first model to be more accurate than the COVIDhub-ensemble over one or more forecast periods. Over the 4-week forecast horizon, our model is on average 50 cases per county more accurate than the COVIDhub-ensemble. We highlight that the underutilization of data-driven forecasting of disease spread prior to COVID-19 is likely due to the lack of sufficient data available for previous diseases, in addition to the recent advances in machine learning methods for spatiotemporal forecasting. We discuss the impediments to the wider uptake of data-driven forecasting, and whether it is likely that more deep learning-based models will be used in the future.
    Never Worse, Mostly Better: Stable Policy Improvement in Deep Reinforcement Learning. (arXiv:1910.01062v3 [cs.LG] UPDATED)
    In recent years, there has been significant progress in applying deep reinforcement learning (RL) for solving challenging problems across a wide variety of domains. Nevertheless, convergence of various methods has been shown to suffer from inconsistencies, due to algorithmic instability and variance, as well as stochasticity in the benchmark environments. Particularly, despite the fact that the agent's performance may be improving on average, it may abruptly deteriorate at late stages of training. In this work, we study methods for enhancing the agent's learning process, by providing conservative updates with respect to either the obtained history or a reference benchmark policy. Our method, termed EVEREST, obtains high confidence improvements via confidence bounds of a reference policy. Through extensive empirical analysis we demonstrate the benefit of our approach in terms of both performance and stabilization, with significant improvements in continuous control and Atari benchmarks.
    Restructurable Activation Networks. (arXiv:2208.08562v1 [cs.CV])
    Is it possible to restructure the non-linear activation functions in a deep network to create hardware-efficient models? To address this question, we propose a new paradigm called Restructurable Activation Networks (RANs) that manipulate the amount of non-linearity in models to improve their hardware-awareness and efficiency. First, we propose RAN-explicit (RAN-e) -- a new hardware-aware search space and a semi-automatic search algorithm -- to replace inefficient blocks with hardware-aware blocks. Next, we propose a training-free model scaling method called RAN-implicit (RAN-i) where we theoretically prove the link between network topology and its expressivity in terms of number of non-linear units. We demonstrate that our networks achieve state-of-the-art results on ImageNet at different scales and for several types of hardware. For example, compared to EfficientNet-Lite-B0, RAN-e achieves a similar accuracy while improving Frames-Per-Second (FPS) by 1.5x on Arm micro-NPUs. On the other hand, RAN-i demonstrates up to 2x reduction in #MACs over ConvNexts with a similar or better accuracy. We also show that RAN-i achieves nearly 40% higher FPS than ConvNext on Arm-based datacenter CPUs. Finally, RAN-i based object detection networks achieve a similar or higher mAP and up to 33% higher FPS on datacenter CPUs compared to ConvNext based models.
    Lost in the Shuffle: Testing Power in the Presence of Errorful Network Vertex Labels. (arXiv:2208.08638v1 [stat.ME])
    Many two-sample network hypothesis testing methodologies operate under the implicit assumption that the vertex correspondence across networks is a priori known. In this paper, we consider the degradation of power in two-sample graph hypothesis testing when there are misaligned/label-shuffled vertices across networks. In the context of stochastic block model networks, we theoretically explore the power loss due to shuffling for a pair of hypothesis tests based on Frobenius norm differences between estimated edge probability matrices or between adjacency matrices. The loss in testing power is further reinforced by numerous simulations and experiments, both in the stochastic block model and in the random dot product graph model, where we compare the power loss across multiple recently proposed tests in the literature. Lastly, we demonstrate the impact that shuffling can have in real-data testing in a pair of examples from neuroscience and from social network analysis.
    Learning to Generate Image Source-Agnostic Universal Adversarial Perturbations. (arXiv:2009.13714v4 [cs.LG] UPDATED)
    Adversarial perturbations are critical for certifying the robustness of deep learning models. A universal adversarial perturbation (UAP) can simultaneously attack multiple images, and thus offers a more unified threat model, obviating an image-wise attack algorithm. However, the existing UAP generator is underdeveloped when images are drawn from different image sources (e.g., with different image resolutions). Towards an authentic universality across image sources, we take a novel view of UAP generation as a customized instance of few-shot learning, which leverages bilevel optimization and learning-to-optimize (L2O) techniques for UAP generation with improved attack success rate (ASR). We begin by considering the popular model agnostic meta-learning (MAML) framework to meta-learn a UAP generator. However, we see that the MAML framework does not directly offer the universal attack across image sources, requiring us to integrate it with another meta-learning framework of L2O. The resulting scheme for meta-learning a UAP generator (i) has better performance (50% higher ASR) than baselines such as Projected Gradient Descent, (ii) has better performance (37% faster) than the vanilla L2O and MAML frameworks (when applicable), and (iii) is able to simultaneously handle UAP generation for different victim models and image data sources.
    UN-AVOIDS: Unsupervised and Nonparametric Approach for Visualizing Outliers and Invariant Detection Scoring. (arXiv:2111.10010v2 [cs.LG] UPDATED)
    The visualization and detection of anomalies (outliers) are of crucial importance to many fields, particularly cybersecurity. Several approaches have been proposed in these fields, yet to the best of our knowledge, none of them has fulfilled both objectives, simultaneously or cooperatively, in one coherent framework. The visualization methods of these approaches were introduced for explaining the output of a detection algorithm, not for data exploration that facilitates a standalone visual detection. This is our point of departure: UN-AVOIDS, an unsupervised and nonparametric approach for both visualization (a human process) and detection (an algorithmic process) of outliers, that assigns invariant anomalous scores (normalized to $[0,1]$), rather than hard binary-decision. The main aspect of novelty of UN-AVOIDS is that it transforms data into a new space, which is introduced in this paper as neighborhood cumulative density function (NCDF), in which both visualization and detection are carried out. In this space, outliers are remarkably visually distinguishable, and therefore the anomaly scores assigned by the detection algorithm achieved a high area under the ROC curve (AUC). We assessed UN-AVOIDS on both simulated and two recently published cybersecurity datasets, and compared it to three of the most successful anomaly detection methods: LOF, IF, and FABOD. In terms of AUC, UN-AVOIDS was almost an overall winner. The article concludes by providing a preview of new theoretical and practical avenues for UN-AVOIDS. Among them is designing a visualization aided anomaly detection (VAAD), a type of software that aids analysts by providing UN-AVOIDS' detection algorithm (running in a back engine), NCDF visualization space (rendered to plots), along with other conventional methods of visualization in the original feature space, all of which are linked in one interactive environment.
    On the Universality of the Double Descent Peak in Ridgeless Regression. (arXiv:2010.01851v7 [stat.ML] UPDATED)
    We prove a non-asymptotic distribution-independent lower bound for the expected mean squared generalization error caused by label noise in ridgeless linear regression. Our lower bound generalizes a similar known result to the overparameterized (interpolating) regime. In contrast to most previous works, our analysis applies to a broad class of input distributions with almost surely full-rank feature matrices, which allows us to cover various types of deterministic or random feature maps. Our lower bound is asymptotically sharp and implies that in the presence of label noise, ridgeless linear regression does not perform well around the interpolation threshold for any of these feature maps. We analyze the imposed assumptions in detail and provide a theory for analytic (random) feature maps. Using this theory, we can show that our assumptions are satisfied for input distributions with a (Lebesgue) density and feature maps given by random deep neural networks with analytic activation functions like sigmoid, tanh, softplus or GELU. As further examples, we show that feature maps from random Fourier features and polynomial kernels also satisfy our assumptions. We complement our theory with further experimental and analytic results.
    Semi-self-supervised Automated ICD Coding. (arXiv:2205.10088v2 [cs.CL] UPDATED)
    Clinical Text Notes (CTNs) contain physicians' reasoning process, written in an unstructured free text format, as they examine and interview patients. In recent years, several studies have been published that provide evidence for the utility of machine learning for predicting doctors' diagnoses from CTNs, a task known as ICD coding. Data annotation is time consuming, particularly when a degree of specialization is needed, as is the case for medical data. This paper presents a method of augmenting a sparsely annotated dataset of Icelandic CTNs with a machine-learned imputation in a semi-self-supervised manner. We train a neural network on a small set of annotated CTNs and use it to extract clinical features from a set of un-annotated CTNs. These clinical features consist of answers to about a thousand potential questions that a physician might find the answers to during a consultation of a patient. The features are then used to train a classifier for the diagnosis of certain types of diseases. We report the results of an evaluation of this data augmentation method over three tiers of data availability to the physician. Our data augmentation method shows a significant positive effect which is diminished when clinical features from the examination of the patient and diagnostics are made available. We recommend our method for augmenting scarce datasets for systems that take decisions based on clinical features that do not include examinations or tests.
    Selective Classification Via Neural Network Training Dynamics. (arXiv:2205.13532v2 [cs.LG] UPDATED)
    Selective classification is the task of rejecting inputs a model would predict incorrectly on through a trade-off between input space coverage and model accuracy. Current methods for selective classification impose constraints on either the model architecture or the loss function; this inhibits their usage in practice. In contrast to prior work, we show that state-of-the-art selective classification performance can be attained solely from studying the (discretized) training dynamics of a model. We propose a general framework that, for a given test input, monitors metrics capturing the disagreement with the final predicted label over intermediate models obtained during training; we then reject data points exhibiting too much disagreement at late stages in training. In particular, we instantiate a method that tracks when the label predicted during training stops disagreeing with the final predicted label. Our experimental evaluation shows that our method achieves state-of-the-art accuracy/coverage trade-offs on typical selective classification benchmarks.

  • Open

    This is what DeepAI art generator came up with for "typical Reddit user". These things are getting good!
    submitted by /u/dingdongschlonglong [link] [comments]  ( 87 min )
    Building a App for Stable Diffusion: Text to Image generation in Python
    submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 87 min )
    My favorite creation from midjourney
    submitted by /u/ExtensionVirtual471 [link] [comments]  ( 87 min )
    Amazing AI Clip!
    submitted by /u/nalr00n [link] [comments]  ( 87 min )
    Eteria AI - API for BLOOM 176B Language Model
    submitted by /u/brthornbury [link] [comments]  ( 87 min )
    Hey, this is pretty cool, man!
    submitted by /u/NebelLicht [link] [comments]  ( 87 min )
    Come join me for the very first global online Cohere + lablab.ai hackathon. 1000$ in Cohere credits and really cute swag are waiting for the winners. We start on Friday 12:oo pm EDT. Bring ur friends. Come check how fun large language models are.
    submitted by /u/techn0_cratic [link] [comments]  ( 87 min )
    Researchers at Stanford and Meta AI have Developed a Dataset Pruning Technique for Scaling Artificial Intelligence AI Training
    submitted by /u/ai-lover [link] [comments]  ( 89 min )
    Has anyone done anything with machine learning and the The On-Line Encyclopedia of Integer Sequences (OEIS)?
    I am not trained in machine learning, but I am curious about machine learning and integer sequences. Website: https://oeis.org/ I was thinking in terms of transformers like GPT-3 or other similar algorithms. For those who are unfamiliar here is the Wikipedia page intro: The On-Line Encyclopedia of Integer Sequences (OEIS) is an online database of integer sequences. It was created and maintained by Neil Sloane while researching at AT&T Labs. He transferred the intellectual property and hosting of the OEIS to the OEIS Foundation in 2009.[4] Sloane is chairman of the OEIS Foundation. OEIS records information on integer sequences of interest to both professional and amateur mathematicians, and is widely cited. As of January 2022, it contains over 350,000 sequences, making it the largest database of its kind. Each entry contains the leading terms of the sequence, keywords, mathematical motivations, literature links, and more, including the option to generate a graph or play a musical representation of the sequence. The database is searchable by keyword, by subsequence, or by any of 16 fields. submitted by /u/arisbe__ [link] [comments]  ( 88 min )
    Drift Detection: Automated Monitoring for Production ML Models
    One of the elements of an MLOps pipeline is drift detection, which helps you monitor model performance over time and identify when it's time for retraining. Once you've successfully deployed models and are running live inferences in production, you'll encounter yet another obstacle: monitoring model performance over time. We monitor several model performance indicators, including overall scoring, inference speed, latency, accuracy, and finally data drift and model drift. This tech talk covers the algorithms we've developed to automate detection of data drift and model drift, or input and output drift. https://youtu.be/EHjO4k7SE44 submitted by /u/modzykirsten [link] [comments]  ( 87 min )
    As It Was by Harry Styles, but visualized by AI!
    submitted by /u/Wingman143 [link] [comments]  ( 87 min )
    General solo undergraduate paper expectation, do I need to implement part of my paper or can I just describe the architecture and theory?
    I’m working on a undergrad paper, not a top tier school that needs to be completed within a term. I see deepmind papers always supported by rough implementations and results of what they’ve theorised. I understand that is the gold standard. What is the general consensus on undergrad papers do they expect you to write code and implement some of what you’re theorising? In my research I’ve came across papers like this which seem only theorised, without implementation: http://www.paulmckevitt.com/pubs/artsit16scenemaker.pdf submitted by /u/abittooambitious [link] [comments]  ( 103 min )
    What Is More Beneficial for Data Science Career: Projects or Certifications?
    submitted by /u/saik2363 [link] [comments]  ( 87 min )
    Open source language models for non English language?
    Hello, What are some good opensource model for non Latin alphabet. I think it's possible to retrain GPT-2 layers but are there other option that are specifically built up for non english language. Like arabic or japanese for example. submitted by /u/Mister-Khalifa [link] [comments]  ( 87 min )
    Create Apps without any coding using AI !
    submitted by /u/SamuelSmith1416 [link] [comments]  ( 87 min )
    Chatbots for SEO, benefits and tips
    Hi there, I'm from chatbot development company and we build custom chatbots. We recently published an article on how chatbots can help in seo campaigns, reduce bounce rate, boost returning visitors rate, increase time spent on page, and if you've already have a chatbot, then tips on optimazing your chatbot. Link here. submitted by /u/Avandegraund [link] [comments]  ( 91 min )
    Wanted: Coder to develop a poker training AI
    Apologies if this is post isn't appropriate for the subreddit. I'm looking for a coder to develop a poker AI for me that uses deep learning, specifications will be given upon request. contact me on discord for more info: cherrymx#5304 submitted by /u/silvercoinz12 [link] [comments]  ( 87 min )
    I make video with ai
    submitted by /u/Due-Ad9795 [link] [comments]  ( 87 min )
    Three step-by-step tutorials using Cohere AI
    Hello everybody! This weekend starting 19/08/2022 lablab.ai is hosting AI hackathon in partnership with Cohere. It is the last chance to get involved in by enrolling for FREE here. You can create your own team or look for an already existing one. During the event our participants will be using Cohere's one of the world’s most powerful NLP engines to build applications based on large language models. If you are not familiar with its functionalities we have prepared some tutorials for you! Find out how to create AI powered Google Chrome extensions for your medium.com posts Or how to create your own Q&A chatbot Or your own product description generator API with Cohere. ​ https://preview.redd.it/y5jdoh68ufi91.png?width=1200&format=png&auto=webp&s=1dc7210577e4e56680085428e8eca87f2d0b8e85 submitted by /u/Viseden [link] [comments]  ( 94 min )
    Created by Kein Künstler
    submitted by /u/widgia [link] [comments]  ( 88 min )
    Am I the only one scared from AI replacing his job?
    Hello everyone , I'm 23 , and I'm thinking about going to learn AI/ML even tho coding isn't really my dream job , I don't really know if I like it or hate it since I never worked with people , I was doing it alone but never tried working with it so idk . the problem is I'm a video editor , I like commercials stuff and marketing , I want to create a business BUT I'm really scared from AI . I read alot of posts some are saying that it will not replaces us and some tend to say it will . to be honest I don't see why not in the next 10-20years , yes now that's still in development and its not as good as a human , but those stuff are upgrading over the years and gets better and better . I'm so confused , I don't really knows what to choose for my main career in the future , I'm currently working as video editor and my goal was to make a business out of it , but after watching some posts it made me think twice .. in the other hand , I tend to believe it will help us the editors instead of taking our jobs . What do you guys think? submitted by /u/furytayx [link] [comments]  ( 92 min )
    Midjourney + GPT-3 = Amazing results?
    submitted by /u/kbf_ [link] [comments]  ( 87 min )
  • Open

    More planning steps in Dyna = less episodes but more computation/training time?
    Hi, In Sutton's book, it says that as the # of planning steps increases in dyna, the # of episodes needed for the optimal policy decreases substantially. However, I was wondering about the pure training time between N=0 and N=~50. In pure direct RL, more episodes may be needed, but in general, does it converge at a similar time to N=50? (assuming that interaction with the environment is not time-costly) Thanks! submitted by /u/JonathanMonathan62 [link] [comments]  ( 87 min )
    Training Atari using RAM
    I'm having some issues training Atari with RAM observation and i'm wondering if its necessary to use state frame stacking to train the newtork. As stated in the paper if you are training from pixels you have to stack last 4 observations but i dont know how to in RAM. In such case which whould be the input for the network?. A 128x4 array? Im using PyTorch and any helped is thanked. submitted by /u/aarribas12 [link] [comments]  ( 88 min )
    Please confirm my reasoning about Off-Policy SARSA / Monte Carlo
    Hi, I would greatly appreciate it if anyone could confirm if my understanding about One-step Q-learning Off-policy Monte-Carlo Off-policy, n-step SARSA are correct, specifically in the area of importance sampling. (Bolded regions are where I'm uncertain if I'm correct!) Off-policy monte-carlo: After reaching the end of the episode, the value back propagates through the trajectory, and if the behavior policy's action (B-Action) != target policy's greedy (deterministic) action (T-Action), the rest of the back propagation of the value is canceled out due to importance sampling. However, Q values and the target policy behavior are updated for a B-action BEFORE T-action = argmax(actions) and it's evaluated on whether the B-action == T-action. Thus, the algorithm is still able to explore because the tails become bigger and bigger until the T-policy is improving over time. One-step Q-learning: There is no importance sampling because the algorithm must explore, thus it shouldn't implement importance sampling in the first action because the Q(S,A) needs to be updated for every state-action pair in order to explore Off-policy, n-step SARSA Importance sampling is NOT implemented for the first time step because the algorithm must explore? In other words, Q(S,A) needs to be updated for every state-action pair, so it shouldn't cancel out the first action if T-action != B-action. However, importance sampling is implemented in subsequent steps because potentially incorrect actions shouldn't be used to optimize potentially correct actions. ​ Thanks so much! submitted by /u/JonathanMonathan62 [link] [comments]  ( 88 min )
    Training DDPG Agent
    I'm trying to train a DDPG agent but I keep running into the same problem. The agent explores well at the beginning and starts to accumulate positive reward, but just before hitting the reward threshold it just seems to not know that it's doing well and just plunge back down into the negative. I'm not sure what I need to change to have more consistent results. Edit: I wrote this post in a rush and realized later I have barely included any details. I am using a Simulink RL Agent and my environment is rendered through Simulink. The agent has to learn to apply force (through some forward kinematics) to track a random sinusoidal target. The agent is able to track the signal fairly well before it crashes and either stops applying any force or saturates the force input. I am using a buffer size of 1e6, a discount factor of 0.6, and a minibatch size of 125. submitted by /u/aok76 [link] [comments]  ( 89 min )
    How can Heuristic search and RL be related?
    HI guys, What is the relationship between Heuristic search methods and RL ? submitted by /u/souhaielbensalem [link] [comments]  ( 102 min )
  • Open

    Run PyTorch Lightning and native PyTorch DDP on Amazon SageMaker Training, featuring Amazon Search
    So much data, so little time. Machine learning (ML) experts, data scientists, engineers and enthusiasts have encountered this problem the world over. From natural language processing to computer vision, tabular to time series, and everything in-between, the age-old problem of optimizing for speed when running data against as many GPUs as you can get has […]  ( 9 min )
    Visualize your Amazon Lookout for Metrics anomaly results with Amazon QuickSight
    One of the challenges encountered by teams using Amazon Lookout for Metrics is quickly and efficiently connecting it to data visualization. The anomalies are presented individually on the Lookout for Metrics console, each with their own graph, making it difficult to view the set as a whole. An automated, integrated solution is needed for deeper […]  ( 14 min )
  • Open

    [News] SCALE Transform X conference
    Hello everyone! SCALE AI is hosting a set of conferences from OCT 19- 21ST with the world's top experts on AI & Machine Learning. Here's the link if someone wants to register to the event! ​ https://preview.redd.it/omxo8oqi6ji91.png?width=843&format=png&auto=webp&s=33954cd578a4c4209aec4a10bd0cf86209924fbf The event is both online or in person for those in the Bay Area and has no cost at all. Thought it might be of your interest as there's gonna be discussions about AI & ML for all sort of industries. Here's the link if someone wants to register to the event! submitted by /u/Financial_Astronaut_ [link] [comments]  ( 88 min )
    [N] Driving SMARTS Competition @ NeurIPS 2022
    Driving SMARTS Competition at NeurIPS 2022 This competition seeks to advance autonomous driving by developing agents that can drive as quickly and safely as possible from the start to destination amid background traffic. Data for the competition consists of large-scale naturalistic driving data replayed within SMARTS simulation environment. The following typical driving scenarios are tested: cruising, overtaking, merging, left turns at unsignalized intersections and being cut off by another vehicle. These scenarios are mined from the naturalistic data, manipulated and replayed in SMARTS. For some scenarios, interactive background vehicles are added in SMARTS. Agents will be ranked according to metrics on safety and comfort (smoothness and safe driving), task completion (% of completed scenarios), traffic rule violation, and completion time. Competition tracks There are two tracks in the competition. Track 1: The participants may use any method and training data to develop their solutions. Track 2: The participants are only allowed to train their methods on the offline datasets. Competition timeline • Aug. 1, 2022: competition opens. • Nov. 1, 2022: competition closes at 11:59pm Pacific Time. • Nov. 5, 2022: finalists will be announced and will be asked to submit their code and models for evaluation for track 2. • Nov. 15, 2022: winning teams announced. Prizes Top participants in each track will receive the following prizes: • Gold US$6000 • Silver US$4000 • Bronze US$2000 Additional prizes: • US$1000 for the most innovative approach out of top-6 entries in both tracks • US$1000 given to one of the valid submissions not in top-3 positions in either track For more information regarding the rules and how to participate please refer to the competition website: https://smarts-project.github.io/ submitted by /u/driving_science [link] [comments]  ( 90 min )
    [R] LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale - Facebook AI 2022 - Inference in LLMs with up to 175B parameters without performance degradation and making it possible to use these models on a single server with consumer GPUs!
    Paper: https://arxiv.org/abs/2208.07339 Github: https://github.com/timdettmers/bitsandbytes Software Blogpost: https://huggingface.co/blog/hf-bitsandbytes-integration Emergent Features Blogpost: https://timdettmers.com/2022/08/17/llm-int8-and-emergent-features/ Abstract: Large language models have been widely adopted but require significant GPU memory for inference. We develop a procedure for Int8 matrix multiplication for feed-forward and attention projection layers in transformers, which cut the memory needed for inference by half while retaining full precision performance. With our method, a 175B parameter 16/32-bit checkpoint can be loaded, converted to Int8, and used immediately without performance degradation. This is made possible by understanding and working around propertie…  ( 109 min )
    [D] Drift Detection: Automated Monitoring for Production ML
    One of the elements of an MLOps pipeline is drift detection, which helps you monitor model performance over time and identify when it's time for retraining. Once you've successfully deployed models and are running live inferences in production, you'll encounter yet another obstacle: monitoring model performance over time. We monitor several model performance indicators, including overall scoring, inference speed, latency, accuracy, and finally data drift and model drift. This tech talk covers the algorithms we've developed to automate detection of data drift and model drift, or input and output drift. https://youtu.be/EHjO4k7SE44 submitted by /u/modzykirsten [link] [comments]  ( 89 min )
    [D] How to use Ai to mix images together ?
    Bonjour everybody, I’m a professional artist from France, and I’m really interested by Art assisted by Ai to boost my workflow or give me more inspiration, and creating a final oil painting inspired by these Ai references. But problem : I’m pretty newb to Ai world and there is so many different to use. Usually, with my regular painting process, I just use few references from photographies I took myself, I pin them on a wall, and I create oil painting sketches and color studies after mixing all the element in my head. After that, I will do my final sketch and start painting on a bigger canvas size. Sometimes, I will do the exact same thing, but will do all my sketches on photoshop first and then reproduce the final painting on canvas. But with Ai, everything is changing, and that’s wh…  ( 91 min )
    [D] How do you decide which ML algorithm will work on which data ?
    So, ok I wanna know can you get intuition from data that this ML algorithm will work best for this data or just run all and compare each ones result ? Or there is some thumb rule ? submitted by /u/kanda_bhaji_pav [link] [comments]  ( 91 min )
    [D] any good resources to learn/advance in ML and AI?
    youtube, websites, ect. submitted by /u/sonderind [link] [comments]  ( 88 min )
    NLP Project [P] for Master's Thesis
    I am doing an Online master's program [P] and need recommendations on the topic for a master's thesis. Background: I am data scientist with 10 years of experience but have done only 1-2 projects in NLP/Vision. I've been trying to learn transformers and others deep learning techniques recently. Thought it will be a good idea to take up a topic that covers breadth of nlp.For example - comparing the performance of fine tuning a model against various nlp tasks. What do you think? Suggest me something that could boost my overall profile. submitted by /u/aalwiz099 [link] [comments]  ( 89 min )
    [D] Temporal generalisation in Transformer networks
    Hi, I've been trying to understand temporal generalisation/extrapolation in transformers and i have had conflicting information regarding the capacity of transformers to do the same from various papers. I guess lstms and transformers can do linear extrapolation but nothing otherwise. Is there a way to do parametric extrapolation in transformers ? Any references are welcome. submitted by /u/Cool_Abbreviations_9 [link] [comments]  ( 89 min )
    [D] No Shortcuts To Knowledge: Why AI Needs To Ease Up On Scaling And Learn How To Code
    https://deoxyribose.github.io/No-Shortcuts-to-Knowledge/ submitted by /u/EducationalCicada [link] [comments]  ( 93 min )
    [D] Approaches to new code: create a map of the code structure, does it make sense?
    Most of the time, when I try to implement (and understand) a new code on GitHub I feel overwhelmed by the complexity of it. In my opinion, one of the hardest part is trying to understand its structure and how it works. When you are facing this issue, what is your modus operandi? Does it make sense to produce a map of the code structure? Does it make sense to spend a bit of time to create it while digging into it? If you are already doing it, is there any tool that produces an automatic map of the code? An example can be found here: https://github.com/dvlab-research/ECCV22-P3AFormer-Tracking-Objects-as-Pixel-wise-Distributions/raw/main/figs/model_mind_flow.png submitted by /u/madeInSwamp [link] [comments]  ( 89 min )
    [D] Conferences in the Winter Break
    Hi, currently there are less conference deadlines over the fall and winter period. Which conferences do you recommend submitting computer vision papers to? I have read about SCIA 2023 and ICMVA 2023, has anybody an idea of these conferences or an idea of another CV conferences which have a deadline this year? submitted by /u/SeucheAchat9115 [link] [comments]  ( 89 min )
    [P] Win $150K Prize from PSG Challenge
    Hi Community, we are currently hosting a challenge called “Panoptic Scene Graph Generation (PSG) Challenge”, which asks the participants to solve the PSG task: given a complex scene image as the input, the model should interpret the image with several “subject-verb-object” triplets, which should comprehensively cover the relations in the image. The subject/object should be grounded by a pixel-accurate segmentation mask at the same time. The task is based on our ECCV’22 work: Panoptic Scene Graph Generation. The PSG Challenge is jointly sponsored by International Algorithm Case Competition (hosted by Pazhou Lab) and ECCV’22 SenseHuman Workshop (hosted by MMLab@NTU), with an astonishing $150,000 Prize Pool for the winners (fully sponsored by Pazhou Lab). Champion: 1 Team - 600,000 RMB (~US$90,000) Second Prize: 2 Teams - 100,000 RMB each (~US$15,000 each) Third Prize: 5 Teams - 40,000 RMB each (~US$6,000 each) The preliminary round of the challenge will be ended on Oct. 6th. The top 15 teams will be able to join the final round, which will end around Nov. 25th, (after the CVPR deadline). Official Info of PSG Challenge: The main page of the PSG challenge is https://www.cvmart.net/race/10349/base. We provide an English version in CodaLab to be accessed here. However, the purpose of CodaLab is only to provide information for non-Chinese participants. To download the dataset or evaluate the model for ranking and prize-winning, the participants should only use the links from the official competition website. To Download PSG Dataset: https://www.cvmart.net/race/10349/dataset For Submission and Ranking: https://www.cvmart.net/race/10349/my-submission If you have any questions regarding the registration or the competition itself, please join our slack space and ask us! ​ Hugging Face Demo for the PSG model. The PSG Challenge asks participants to develop a better PSG model. submitted by /u/Sad-Barber-60 [link] [comments]  ( 90 min )
    [P] Compiling ML models' hyperparameter and accuracy metadata. Any previous review papers or projects that can help me along?
    Just wanted to know if there was any previously collected data that could speed things along. Couldn't find any such thing but let me know if there are any past papers that do this. submitted by /u/Typical-Ad-7443 [link] [comments]  ( 104 min )
    [D] How many NN optimization techniques are actually incorporated?
    Hey guys, I’m an undergraduate student just getting into this field so forgive me if this question is dumb. I noticed that a lot of papers at NIPS/ICML/ICLR are concerned with NN optimization; for example, this paper https://arxiv.org/pdf/2010.14501.pdf from ICLR 2021 claims to reduce memory requirement by 3x for various PyTorch models, which ostensibly would be massive. However, it was also “only” cited 8 times, which indicates that it isn’t being used that often. I know that a lot of these NN optimization things are overrated in the sense that they tend to only work in specific circumstances, and aren’t actually broadly applicable. I’d also imagine that this fact means that most DL researchers don’t have time to comb through swaths of NN optimization papers accepted into top conferences to find out which ones are actually broadly useful and reliable. So with that being said: how many NN optimization techniques are actually applied on a large scale? Are the NN libraries typically used like Tensorflow and PyTorch being continuously updated with the latest research? submitted by /u/sjames898 [link] [comments]  ( 93 min )
  • Open

    OptFormer: Towards Universal Hyperparameter Optimization with Transformers
    Posted by Yutian Chen, Staff Research Scientist, DeepMind, and Xingyou (Richard) Song, Research Scientist, Google Research, Brain Team One of the most important aspects in machine learning is hyperparameter optimization, as finding the right hyperparameters for a machine learning task can make or break a model’s performance. Internally, we regularly use Google Vizier as the default platform for hyperparameter optimization. Throughout its deployment over the last 5 years, Google Vizier has been used more than 10 million times, over a vast class of applications, including machine learning applications from vision, reinforcement learning, and language but also scientific applications such as protein discovery and hardware acceleration. As Google Vizier is able to keep track of use patterns i…  ( 22 min )
  • Open

    Startup Digs Into Public Filings With GPU-Driven Machine Learning to Serve Up Alternative Financial Data Services
    When Rachel Carpenter and Joseph French founded Intrinio a decade ago, the fintech revolution had only just begun. But they saw an opportunity to apply machine learning to vast amounts of financial filings to create an alternative data provider among the giants. The startup, based in St. Petersburg, Fla., delivers financial data to hedge funds, Read article > The post Startup Digs Into Public Filings With GPU-Driven Machine Learning to Serve Up Alternative Financial Data Services appeared first on NVIDIA Blog.  ( 6 min )
    Boldly Go: Discover New Frontiers in AI-Powered Transportation at GTC
    AI and the metaverse are revolutionizing every aspect of the way we live, work and play — including how we move. Leaders in the automotive and technology industries will come together at NVIDIA GTC to discuss the newest breakthroughs driving intelligent vehicles, whether in the real world or in simulation. The virtual conference, which runs Read article > The post Boldly Go: Discover New Frontiers in AI-Powered Transportation at GTC appeared first on NVIDIA Blog.  ( 5 min )
    Startup’s Vision AI Software Trains Itself — in One Hour — to Detect Manufacturing Defects in Real Time
    Cameras have been deployed in factories for over a decade — so why, Franz Tschimben wondered, hasn’t automated visual inspection yet become the worldwide standard? This question motivated Tschimben and his colleagues to found Covision Quality, an AI-based visual-inspection software startup that uses NVIDIA technology to transform end-of-line defect detection for the manufacturing industry. “The Read article > The post Startup’s Vision AI Software Trains Itself — in One Hour — to Detect Manufacturing Defects in Real Time appeared first on NVIDIA Blog.  ( 6 min )
    Easy A: GeForce NOW Brings Higher Resolution and Frame Rates for Browser Streaming on PC
    Class is in session this GFN Thursday as GeForce NOW makes the up-grade with support for higher resolutions and frame rates in Chrome browser on PC. It’s the easiest way to spice up a boring study session. When the lecture is over, dive into the six games joining the GeForce NOW library this week, where Read article > The post Easy A: GeForce NOW Brings Higher Resolution and Frame Rates for Browser Streaming on PC appeared first on NVIDIA Blog.  ( 5 min )
  • Open

    How AI and ML are Transforming the FinTech Industry
    By 2022, the AI in FinTech market will be worth $7.25 billion. Artificial Intelligence (AI) is driving a new wave in FinTech. From banks to…  ( 12 min )
  • Open

    Help with CNN LSTM Image Captioning
    A project I have been working on, and I am completely a beginner in these topics (familiar with Image Processing). I'm literally struggling to get it started, got multiple problems, but mostly with some array size issues. Can somebody please help me figure it out? submitted by /u/Maleficient_Entity [link] [comments]  ( 96 min )
  • Open

    Learning to Operate an Electric Vehicle Charging Station Considering Vehicle-grid Integration. (arXiv:2111.01294v2 [cs.LG] UPDATED)
    The rapid adoption of electric vehicles (EVs) calls for the widespread installation of EV charging stations. To maximize the profitability of charging stations, intelligent controllers that provide both charging and electric grid services are in great need. However, it is challenging to determine the optimal charging schedule due to the uncertain arrival time and charging demands of EVs. In this paper, we propose a novel centralized allocation and decentralized execution (CADE) reinforcement learning (RL) framework to maximize the charging station's profit. In the centralized allocation process, EVs are allocated to either the waiting or charging spots. In the decentralized execution process, each charger makes its own charging/discharging decision while learning the action-value functions from a shared replay memory. This CADE framework significantly improves the scalability and sample efficiency of the RL algorithm. Numerical results show that the proposed CADE framework is both computationally efficient and scalable, and significantly outperforms the baseline model predictive control (MPC). We also provide an in-depth analysis of the learned action-value function to explain the inner working of the reinforcement learning agent.  ( 3 min )
    Model-Aware Contrastive Learning: Towards Escaping Uniformity-Tolerance Dilemma in Training. (arXiv:2207.07874v2 [cs.LG] UPDATED)
    Contrastive learning (CL) has achieved remarkable success in learning transferable representations. It has been identified that the temperature $ \tau $ of CL loss plays an essential role in automatically concentrating on hard negative samples. However, recent work also indicates a uniformity-tolerance dilemma (UTD) connected to $ \tau $, which will lead to unexpected performance degradation. We argue that it is the fixity of temperature that is inextricably linked to UTD and suboptimal embedding space. To tackle the challenge of UTD, we enrich the CL loss family by presenting a Model-Aware Contrastive Learning (MACL) strategy. In MACL, the temperature parameter is adaptive to the magnitude of alignment that reflects the basic confidence of the instance discrimination task. Lower alignment implies poor discrimination for the undertrained phase, then there is less possibility that the high similarity region contains latent positive samples (LPs). Thus, a small $ \tau $ can impose larger penalties on hard negative samples to learn uniformly informative embeddings. Instead, a larger $ \tau $ in the well-trained phase facilitates the exploration of semantic structures due to its increased tolerance for LPs. Besides, theoretically, we uncover why contrastive learning requires a large number of negative samples from a unified gradient reduction perspective. Based on MACL and these analyses, a new CL loss is proposed. Experimental results validate the effectiveness of our approach to escape UTD, which can achieve state-of-the-art performance and training with fewer negative samples.  ( 3 min )
    Classifications of Skull Fractures using CT Scan Images via CNN with Lazy Learning Approach. (arXiv:2203.10786v1 [eess.IV] CROSS LISTED)
    Classification of skull fracture is a challenging task for both radiologists and researchers. Skull fractures result in broken pieces of bone, which can cut into the brain and cause bleeding and other injury types. So it is vital to detect and classify the fracture very early. In real world, often fractures occur at multiple sites. This makes it harder to detect the fracture type where many fracture types might summarize a skull fracture. Unfortunately, manual detection of skull fracture and the classification process is time-consuming, threatening a patient's life. Because of the emergence of deep learning, this process could be automated. Convolutional Neural Networks (CNNs) are the most widely used deep learning models for image categorization because they deliver high accuracy and outstanding outcomes compared to other models. We propose a new model called SkullNetV1 comprising a novel CNN by taking advantage of CNN for feature extraction and lazy learning approach which acts as a classifier for classification of skull fractures from brain CT images to classify five fracture types. Our suggested model achieved a subset accuracy of 88%, an F1 score of 93%, the Area Under the Curve (AUC) of 0.89 to 0.98, a Hamming score of 92% and a Hamming loss of 0.04 for this seven-class multi-labeled classification.  ( 3 min )
    Deep Representations for Time-varying Brain Datasets. (arXiv:2205.11648v3 [cs.LG] UPDATED)
    Finding an appropriate representation of dynamic activities in the brain is crucial for many downstream applications. Due to its highly dynamic nature, temporally averaged fMRI (functional magnetic resonance imaging) can only provide a narrow view of underlying brain activities. Previous works lack the ability to learn and interpret the latent dynamics in brain architectures. This paper builds an efficient graph neural network model that incorporates both region-mapped fMRI sequences and structural connectivities obtained from DWI (diffusion-weighted imaging) as inputs. We find good representations of the latent brain dynamics through learning sample-level adaptive adjacency matrices and performing a novel multi-resolution inner cluster smoothing. We also attribute inputs with integrated gradients, which enables us to infer (1) highly involved brain connections and subnetworks for each task, (2) temporal keyframes of imaging sequences that characterize tasks, and (3) subnetworks that discriminate between individual subjects. This ability to identify critical subnetworks that characterize signal states across heterogeneous tasks and individuals is of great importance to neuroscience and other scientific domains. Extensive experiments and ablation studies demonstrate our proposed method's superiority and efficiency in spatial-temporal graph signal modeling with insightful interpretations of brain dynamics.  ( 3 min )
    Supervised PCA: A Multiobjective Approach. (arXiv:2011.05309v4 [stat.ML] UPDATED)
    Methods for supervised principal component analysis (SPCA) aim to incorporate label information into principal component analysis (PCA), so that the extracted features are more useful for a prediction task of interest. Prior work on SPCA has focused primarily on optimizing prediction error, and has neglected the value of maximizing variance explained by the extracted features. We propose a new method for SPCA that addresses both of these objectives jointly, and demonstrate empirically that our approach dominates existing approaches, i.e., outperforms them with respect to both prediction error and variation explained. Our approach accommodates arbitrary supervised learning losses and, through a statistical reformulation, provides a novel low-rank extension of generalized linear models.  ( 2 min )
    Wave simulation in non-smooth media by PINN with quadratic neural network and PML condition. (arXiv:2208.08276v1 [physics.geo-ph])
    Frequency-domain simulation of seismic waves plays an important role in seismic inversion, but it remains challenging in large models. The recently proposed physics-informed neural network (PINN), as an effective deep learning method, has achieved successful applications in solving a wide range of partial differential equations (PDEs), and there is still room for improvement on this front. For example, PINN can lead to inaccurate solutions when PDE coefficients are non-smooth and describe structurally-complex media. In this paper, we solve the acoustic and visco-acoustic scattered-field wave equation in the frequency domain with PINN instead of the wave equation to remove source singularity. We first illustrate that non-smooth velocity models lead to inaccurate wavefields when no boundary conditions are implemented in the loss function. Then, we add the perfectly matched layer (PML) conditions in the loss function of PINN and design a quadratic neural network to overcome the detrimental effects of non-smooth models in PINN. We show that PML and quadratic neurons improve the results as well as attenuation and discuss the reason for this improvement. We also illustrate that a network trained during a wavefield simulation can be used to pre-train the neural network of another wavefield simulation after PDE-coefficient alteration and improve the convergence speed accordingly. This pre-training strategy should find application in iterative full waveform inversion (FWI) and time-lag target-oriented imaging when the model perturbation between two consecutive iterations or two consecutive experiments can be small.
    Learning Generative Factors of EEG Data with Variational auto-encoders. (arXiv:2206.01939v3 [cs.LG] UPDATED)
    Electroencephalography produces high-dimensional, stochastic data from which it might be challenging to extract high-level knowledge about the phenomena of interest. We address this challenge by applying the framework of variational auto-encoders to 1) classify multiple pathologies and 2) recover the neurological mechanisms of those pathologies in a data-driven manner. Our framework learns generative factors of data related to pathologies. We provide an algorithm to decode those factors further and discover how different pathologies affect observed data. We illustrate the applicability of the proposed approach to identifying schizophrenia, either followed or not by auditory verbal hallucinations. We further demonstrate the ability of the framework to learn disease-related mechanisms consistent with current domain knowledge. We also compare the proposed framework with several benchmark approaches and indicate its classification performance and interpretability advantages.  ( 2 min )
    Discovering Agents. (arXiv:2208.08345v1 [cs.AI])
    Causal models of agents have been used to analyse the safety aspects of machine learning systems. But identifying agents is non-trivial -- often the causal model is just assumed by the modeler without much justification -- and modelling failures can lead to mistakes in the safety analysis. This paper proposes the first formal causal definition of agents -- roughly that agents are systems that would adapt their policy if their actions influenced the world in a different way. From this we derive the first causal discovery algorithm for discovering agents from empirical data, and give algorithms for translating between causal models and game-theoretic influence diagrams. We demonstrate our approach by resolving some previous confusions caused by incorrect causal modelling of agents.
    HypoSVI: Hypocenter inversion with Stein variational inference and Physics Informed Neural Networks. (arXiv:2101.03271v3 [physics.geo-ph] UPDATED)
    We introduce a scheme for probabilistic hypocenter inversion with Stein variational inference. Our approach uses a differentiable forward model in the form of a physics informed neural network, which we train to solve the Eikonal equation. This allows for rapid approximation of the posterior by iteratively optimizing a collection of particles against a kernelized Stein discrepancy. We show that the method is well-equipped to handle highly multimodal posterior distributions, which are common in hypocentral inverse problems. A suite of experiments is performed to examine the influence of the various hyperparameters. Once trained, the method is valid for any seismic network geometry within the study area without the need to build travel time tables. We show that the computational demands scale efficiently with the number of differential times, making it ideal for large-N sensing technologies like Distributed Acoustic Sensing. The techniques outlined in this manuscript have considerable implications beyond just ray-tracing procedures, with the work flow applicable to other fields with computationally expensive inversion procedures such as full waveform inversion.
    Predicting Corporate Risk by Jointly Modeling Company Networks and Dialogues in Earnings Conference Calls. (arXiv:2206.06174v3 [cs.CL] UPDATED)
    Earnings conference calls are significant information events for volatility forecasting, which is essential for financial risk management and asset pricing. Although some recent volatility forecasting models have utilized the textual content of conference calls, the dialogue structures of conference calls and company relationships are almost ignored in extant literature. To bridge this gap, we propose a new model called Temporal Virtual Graph Neural Network (TVGNN) for volatility forecasting by jointly modeling conference call dialogues and company networks. Our model differs from existing models in several important ways. First, we propose to exploit more dialogue structures by encoding position, utterance, speaker role, and Q\&A segments. Second, we propose to encode the market states for volatility forecasting by extending the Gated Recurrent Units (GRU). Third, we propose a new method for constructing temporal company networks in which the messages can only flow from temporally preceding to successive nodes, and extend the Graph Attention Networks (GAT) for modeling company relationships. We collect conference call transcripts of S\&P500 companies from 2008 to 2019, and construct a dataset of conference call dialogues with additional information on dialogue structures and company networks. Empirical results on our dataset demonstrate the superiority of our model over competitive baselines for volatility forecasting. We also conduct supplementary analyses to examine the effectiveness of our model's key components and interpretability.
    From Shapley back to Pearson: Hypothesis Testing via the Shapley Value. (arXiv:2207.07038v2 [cs.LG] UPDATED)
    Machine learning models, in particular artificial neural networks, are increasingly used to inform decision making in high-stakes scenarios across a variety of fields -- from financial services, to public safety, and healthcare. While neural networks have achieved remarkable performance in many settings, their complex nature raises concerns on their reliability, trustworthiness, and fairness in real-world scenarios. As a result, several a-posteriori explanation methods have been proposed to highlight the features that influence a model's prediction. Notably, the Shapley value -- a game theoretic quantity that satisfies several desirable properties -- has gained popularity in the machine learning explainability literature. More traditionally, however, feature importance in statistical learning has been formalized by conditional independence, and a standard way to test for it is via Conditional Randomization Tests (CRTs). So far, these two perspectives on interpretability and feature importance have been considered distinct and separate. In this work, we show that Shapley-based explanation methods and conditional independence testing for feature importance are closely related. More precisely, we prove that for binary classification problems, evaluating a Shapley coefficient amounts to performing a specific set of conditional independence tests, as implemented by a procedure similar to the CRT but for a different null hypothesis. Furthermore, the obtained game-theoretic values upper bound the $p$-values of such tests. As a result, we grant large Shapley coefficients with a precise statistical sense of importance with controlled type I error.
    Feature Structure Distillation with Centered Kernel Alignment in BERT Transferring. (arXiv:2204.08922v2 [cs.CL] UPDATED)
    Knowledge distillation is an approach to transfer information on representations from a teacher to a student by reducing their difference. A challenge of this approach is to reduce the flexibility of the student's representations inducing inaccurate learning of the teacher's knowledge. To resolve it in BERT transferring, we investigate distillation of structures of representations specified to three types: intra-feature, local inter-feature, global inter-feature structures. To transfer them, we introduce \textit{feature structure distillation} methods based on the Centered Kernel Alignment, which assigns a consistent value to similar features structures and reveals more informative relations. In particular, a memory-augmented transfer method with clustering is implemented for the global structures. In the experiments on the nine tasks for language understanding of the GLUE dataset, the proposed methods effectively transfer the three types of structures and improve performance compared to state-of-the-art distillation methods. Indeed, the code for the methods is available in https://github.com/maroo-sky/FSD  ( 2 min )
    Learning Neural Set Functions Under the Optimal Subset Oracle. (arXiv:2203.01693v2 [cs.LG] UPDATED)
    Learning neural set functions becomes increasingly more important in many applications like product recommendation and compound selection in AI-aided drug discovery. The majority of existing works study methodologies of set function learning under the function value oracle, which, however, requires expensive supervision signals. This renders it impractical for applications with only weak supervisions under the Optimal Subset (OS) oracle, the study of which is surprisingly overlooked. In this work, we present a principled yet practical maximum likelihood learning framework, termed as EquiVSet, that simultaneously meets the following desiderata of learning set functions under the OS oracle: i) permutation invariance of the set mass function being modeled; ii) permission of varying ground set; iii) minimum prior; and iv) scalability. The main components of our framework involve: an energy-based treatment of the set mass function, DeepSet-style architectures to handle permutation invariance, mean-field variational inference, and its amortized variants. Thanks to the elegant combination of these advanced architectures, empirical studies on three real-world applications (including Amazon product recommendation, set anomaly detection, and compound selection for virtual screening) demonstrate that EquiVSet outperforms the baselines by a large margin.  ( 2 min )
    Analysis of Digitalized ECG Signals Based on Artificial Intelligence and Spectral Analysis Methods Specialized in ARVC. (arXiv:2203.00504v2 [eess.SP] UPDATED)
    Arrhythmogenic right ventricular cardiomyopathy (ARVC) is an inherited heart muscle disease that appears between the second and forth decade of a patient's life, being responsible for 20% of sudden cardiac deaths before the age of 35. The effective and punctual diagnosis of this disease based on Electrocardiograms (ECGs) could have a vital role in reducing premature cardiovascular mortality. In our analysis, we firstly outline the digitalization process of paper - based ECG signals enhanced by a spatial filter aiming to eliminate dark regions in the dataset's images that do not correspond to ECG waveform, producing undesirable noise. Next, we propose the utilization of a low - complexity convolutional neural network for the detection of an arrhythmogenic heart disease, that has not been studied through the usage of deep learning methodology to date, achieving high classification accuracy, namely 99.98% training and 98.6% testing accuracy, on a disease the major identification criterion of which are infinitesimal millivolt variations in the ECG's morphology, in contrast with other arrhythmogenic abnormalities. Finally, by performing spectral analysis we investigate significant differentiations in the field of frequencies between normal ECGs and ECGs corresponding to patients suffering from ARVC. In 16 out of the 18 frequencies where we encounter statistically significant differentiations, the normal ECGs are characterized by greater normalized amplitudes compared to the abnormal ones. The overall research carried out in this article highlights the importance of integrating mathematical methods into the examination and effective diagnosis of various diseases, aiming to a substantial contribution to their successful treatment.  ( 3 min )
    E2FL: Equal and Equitable Federated Learning. (arXiv:2205.10454v2 [cs.LG] UPDATED)
    Federated Learning (FL) enables data owners to train a shared global model without sharing their private data. Unfortunately, FL is susceptible to an intrinsic fairness issue: due to heterogeneity in clients' data distributions, the final trained model can give disproportionate advantages across the participating clients. In this work, we present Equal and Equitable Federated Learning (E2FL) to produce fair federated learning models by preserving two main fairness properties, equity and equality, concurrently. We validate the efficiency and fairness of E2FL in different real-world FL applications, and show that E2FL outperforms existing baselines in terms of the resulting efficiency, fairness of different groups, and fairness among all individual clients.  ( 2 min )
    FRL: Federated Rank Learning. (arXiv:2110.04350v3 [cs.LG] UPDATED)
    Federated learning (FL) allows mutually untrusted clients to collaboratively train a common machine learning model without sharing their private/proprietary training data among each other. FL is unfortunately susceptible to poisoning by malicious clients who aim to hamper the accuracy of the commonly trained model through sending malicious model updates during FL's training process. We argue that the key factor to the success of poisoning attacks against existing FL systems is the large space of model updates available to the clients, allowing malicious clients to search for the most poisonous model updates, e.g., by solving an optimization problem. To address this, we propose Federated Rank Learning (FRL). FRL reduces the space of client updates from model parameter updates (a continuous space of float numbers) in standard FL to the space of parameter rankings (a discrete space of integer values). To be able to train the global model using parameter ranks (instead of parameter weights), FRL leverage ideas from recent supermasks training mechanisms. Specifically, FRL clients rank the parameters of a randomly initialized neural network (provided by the server) based on their local training data. The FRL server uses a voting mechanism to aggregate the parameter rankings submitted by clients in each training epoch to generate the global ranking of the next training epoch. Intuitively, our voting-based aggregation mechanism prevents poisoning clients from making significant adversarial modifications to the global model, as each client will have a single vote! We demonstrate the robustness of FRL to poisoning through analytical proofs and experimentation. We also show FRL's high communication efficiency. Our experiments demonstrate the superiority of FRL in real-world FL settings.  ( 3 min )
    Are Transformers Effective for Time Series Forecasting?. (arXiv:2205.13504v3 [cs.AI] UPDATED)
    Recently, there has been a surge of Transformer-based solutions for the long-term time series forecasting (LTSF) task. Despite the growing performance over the past few years, we question the validity of this line of research in this work. Specifically, Transformers is arguably the most successful solution to extract the semantic correlations among the elements in a long sequence. However, in time series modeling, we are to extract the temporal relations in an ordered set of continuous points. While employing positional encoding and using tokens to embed sub-series in Transformers facilitate preserving some ordering information, the nature of the \emph{permutation-invariant} self-attention mechanism inevitably results in temporal information loss. To validate our claim, we introduce a set of embarrassingly simple one-layer linear models named LTSF-Linear for comparison. Experimental results on nine real-life datasets show that LTSF-Linear surprisingly outperforms existing sophisticated Transformer-based LTSF models in all cases, and often by a large margin. Moreover, we conduct comprehensive empirical studies to explore the impacts of various design elements of LTSF models on their temporal relation extraction capability. We hope this surprising finding opens up new research directions for the LTSF task. We also advocate revisiting the validity of Transformer-based solutions for other time series analysis tasks (e.g., anomaly detection) in the future. Code is available at: \url{https://github.com/cure-lab/LTSF-Linear}.
    Automated Learning for Deformable Medical Image Registration by Jointly Optimizing Network Architectures and Objective Functions. (arXiv:2203.06810v2 [cs.CV] UPDATED)
    Deformable image registration plays a critical role in various tasks of medical image analysis. A successful registration algorithm, either derived from conventional energy optimization or deep networks requires tremendous efforts from computer experts to well design registration energy or to carefully tune network architectures for the specific type of medical data. To tackle the aforementioned problems, this paper proposes an automated learning registration algorithm (AutoReg) that cooperatively optimizes both architectures and their corresponding training objectives, enabling non-computer experts, e.g., medical/clinical users, to conveniently find off-the-shelf registration algorithms for diverse scenarios. Specifically, we establish a triple-level framework to deduce registration network architectures and objectives with an auto-searching mechanism and cooperating optimization. We conduct image registration experiments on multi-site volume datasets and various registration tasks. Extensive results demonstrate that our AutoReg may automatically learn an optimal deep registration network for given volumes and achieve state-of-the-art performance, also significantly improving computation efficiency than the mainstream UNet architectures (from 0.558 to 0.270 seconds for a 3D image pair on the same configuration).  ( 2 min )
    Disentangled Modeling of Domain and Relevance for Adaptable Dense Retrieval. (arXiv:2208.05753v1 [cs.IR] CROSS LISTED)
    Recent advance in Dense Retrieval (DR) techniques has significantly improved the effectiveness of first-stage retrieval. Trained with large-scale supervised data, DR models can encode queries and documents into a low-dimensional dense space and conduct effective semantic matching. However, previous studies have shown that the effectiveness of DR models would drop by a large margin when the trained DR models are adopted in a target domain that is different from the domain of the labeled data. One of the possible reasons is that the DR model has never seen the target corpus and thus might be incapable of mitigating the difference between the training and target domains. In practice, unfortunately, training a DR model for each target domain to avoid domain shift is often a difficult task as it requires additional time, storage, and domain-specific data labeling, which are not always available. To address this problem, in this paper, we propose a novel DR framework named Disentangled Dense Retrieval (DDR) to support effective and flexible domain adaptation for DR models. DDR consists of a Relevance Estimation Module (REM) for modeling domain-invariant matching patterns and several Domain Adaption Modules (DAMs) for modeling domain-specific features of multiple target corpora. By making the REM and DAMs disentangled, DDR enables a flexible training paradigm in which REM is trained with supervision once and DAMs are trained with unsupervised data. Comprehensive experiments in different domains and languages show that DDR significantly improves ranking performance compared to strong DR baselines and substantially outperforms traditional retrieval methods in most scenarios.  ( 3 min )
    Score-Based Generative Models Detect Manifolds. (arXiv:2206.01018v2 [stat.ML] UPDATED)
    Score-based generative models (SGMs) need to approximate the scores $\nabla \log p_t$ of the intermediate distributions as well as the final distribution $p_T$ of the forward process. The theoretical underpinnings of the effects of these approximations are still lacking. We find precise conditions under which SGMs are able to produce samples from an underlying (low-dimensional) data manifold $\mathcal{M}$. This assures us that SGMs are able to generate the "right kind of samples". For example, taking $\mathcal{M}$ to be the subset of images of faces, we find conditions under which the SGM robustly produces an image of a face, even though the relative frequencies of these images might not accurately represent the true data generating distribution. Moreover, this analysis is a first step towards understanding the generalization properties of SGMs: Taking $\mathcal{M}$ to be the set of all training samples, our results provide a precise description of when the SGM memorizes its training data.  ( 2 min )
    Label Flipping Data Poisoning Attack Against Wearable Human Activity Recognition System. (arXiv:2208.08433v1 [cs.CR])
    Human Activity Recognition (HAR) is a problem of interpreting sensor data to human movement using an efficient machine learning (ML) approach. The HAR systems rely on data from untrusted users, making them susceptible to data poisoning attacks. In a poisoning attack, attackers manipulate the sensor readings to contaminate the training set, misleading the HAR to produce erroneous outcomes. This paper presents the design of a label flipping data poisoning attack for a HAR system, where the label of a sensor reading is maliciously changed in the data collection phase. Due to high noise and uncertainty in the sensing environment, such an attack poses a severe threat to the recognition system. Besides, vulnerability to label flipping attacks is dangerous when activity recognition models are deployed in safety-critical applications. This paper shades light on how to carry out the attack in practice through smartphone-based sensor data collection applications. This is an earlier research work, to our knowledge, that explores attacking the HAR models via label flipping poisoning. We implement the proposed attack and test it on activity recognition models based on the following machine learning algorithms: multi-layer perceptron, decision tree, random forest, and XGBoost. Finally, we evaluate the effectiveness of K-nearest neighbors (KNN)-based defense mechanism against the proposed attack.  ( 3 min )
    DSFormer: A Dual-domain Self-supervised Transformer for Accelerated Multi-contrast MRI Reconstruction. (arXiv:2201.10776v2 [eess.IV] UPDATED)
    Multi-contrast MRI (MC-MRI) captures multiple complementary imaging modalities to aid in radiological decision-making. Given the need for lowering the time cost of multiple acquisitions, current deep accelerated MRI reconstruction networks focus on exploiting the redundancy between multiple contrasts. However, existing works are largely supervised with paired data and/or prohibitively expensive fully-sampled MRI sequences. Further, reconstruction networks typically rely on convolutional architectures which are limited in their capacity to model long-range interactions and may lead to suboptimal recovery of fine anatomical detail. To these ends, we present a dual-domain self-supervised transformer (DSFormer) for accelerated MC-MRI reconstruction. DSFormer develops a deep conditional cascade transformer (DCCT) consisting of several cascaded Swin transformer reconstruction networks (SwinRN) trained under two deep conditioning strategies to enable MC-MRI information sharing. We further present a dual-domain (image and k-space) self-supervised learning strategy for DCCT to alleviate the costs of acquiring fully sampled training data. DSFormer generates high-fidelity reconstructions which experimentally outperform current fully-supervised baselines. Moreover, we find that DSFormer achieves nearly the same performance when trained either with full supervision or with our proposed dual-domain self-supervision.
    Conformal Inference for Online Prediction with Arbitrary Distribution Shifts. (arXiv:2208.08401v1 [stat.ME])
    Conformal inference is a flexible methodology for transforming the predictions made by any black-box model (e.g. neural nets, random forests) into valid prediction sets. The only necessary assumption is that the training and test data be exchangeable (e.g. i.i.d.). Unfortunately, this assumption is usually unrealistic in online environments in which the processing generating the data may vary in time and consecutive data-points are often temporally correlated. In this article, we develop an online algorithm for producing prediction intervals that are robust to these deviations. Our methods build upon conformal inference and thus can be combined with any black-box predictor. We show that the coverage error of our algorithm is controlled by the size of the underlying change in the environment and thus directly connect the size of the distribution shift with the difficulty of the prediction problem. Finally, we apply our procedure in two real-world settings and find that our method produces robust prediction intervals under real-world dynamics.
    Motion Inbetweening via Deep $\Delta$-Interpolator. (arXiv:2201.06701v4 [cs.LG] UPDATED)
    We show that the task of synthesizing human motion conditioned on a set of key frames can be solved more accurately and effectively if a deep learning based interpolator operates in the delta mode using the spherical linear interpolator as a baseline. We empirically demonstrate the strength of our approach on publicly available datasets achieving state-of-the-art performance. We further generalize these results by showing that the $\Delta$-regime is viable with respect to the reference of the last known frame (also known as the zero-velocity model). This supports the more general conclusion that operating in the reference frame local to input frames is more accurate and robust than in the global (world) reference frame advocated in previous work. Our code is publicly available at https://github.com/boreshkinai/delta-interpolator.  ( 2 min )
    Physics-Guided Discovery of Highly Nonlinear Parametric Partial Differential Equations. (arXiv:2106.01078v2 [cs.LG] UPDATED)
    Partial differential equations (PDEs) fitting scientific data can represent physical laws with explainable mechanisms for various mathematically-oriented subjects. The data-driven discovery of PDEs from scientific data thrives as a new attempt to model complex phenomena in nature, but the effectiveness of current practice is typically limited by the scarcity of data and the complexity of phenomena. Especially, the discovery of PDEs with highly nonlinear coefficients from low-quality data remains largely under-addressed. To deal with this challenge, we propose a novel physics-guided learning method, which can not only encode observation knowledge such as initial and boundary conditions but also incorporate the basic physical principles and laws to guide the model optimization. We empirically demonstrate that the proposed method is more robust against data noise and sparsity, and can reduce the estimation error by a large margin; moreover, for the first time we are able to discover PDEs with highly nonlinear coefficients. With the promising performance, the proposed method pushes forward the boundary of the PDEs that can be found by machine learning models for scientific discovery.  ( 2 min )
    Frequency-Severity Experience Rating based on Latent Markovian Risk Profiles. (arXiv:2109.01413v2 [stat.AP] UPDATED)
    Bonus-Malus Systems traditionally consider a customer's number of claims irrespective of their sizes, even though these components are dependent in practice. We propose a novel joint experience rating approach based on latent Markovian risk profiles to allow for a positive or negative individual frequency-severity dependence. The latent profiles evolve over time in a Hidden Markov Model to capture updates in a customer's claims experience, making claim counts and sizes conditionally independent. We show that the resulting risk premia lead to a dynamic, claims experience-weighted mixture of standard credibility premia. The proposed approach is applied to a Dutch automobile insurance portfolio and identifies customer risk profiles with distinctive claiming behavior. These profiles, in turn, enable us to better distinguish between customer risks.  ( 2 min )
    A Hybrid SFANC-FxNLMS Algorithm for Active Noise Control based on Deep Learning. (arXiv:2208.08082v1 [eess.SY])
    The selective fixed-filter active noise control (SFANC) method selecting the best pre-trained control filters for various types of noise can achieve a fast response time. However, it may lead to large steady-state errors due to inaccurate filter selection and the lack of adaptability. In comparison, the filtered-X normalized least-mean-square (FxNLMS) algorithm can obtain lower steady-state errors through adaptive optimization. Nonetheless, its slow convergence has a detrimental effect on dynamic noise attenuation. Therefore, this paper proposes a hybrid SFANC-FxNLMS approach to overcome the adaptive algorithm's slow convergence and provide a better noise reduction level than the SFANC method. A lightweight one-dimensional convolutional neural network (1D CNN) is designed to automatically select the most suitable pre-trained control filter for each frame of the primary noise. Meanwhile, the FxNLMS algorithm continues to update the coefficients of the chosen pre-trained control filter at the sampling rate. Owing to the effective combination of the two algorithms, experimental results show that the hybrid SFANC-FxNLMS algorithm can achieve a rapid response time, a low noise reduction error, and a high degree of robustness.
    Generative Thermal Design Through Boundary Representation and Multi-Agent Cooperative Environment. (arXiv:2208.07952v1 [cs.LG])
    Generative design has been growing across the design community as a viable method for design space exploration. Thermal design is more complex than mechanical or aerodynamic design because of the additional convection-diffusion equation and its pertinent boundary interaction. We present a generative thermal design using cooperative multi-agent deep reinforcement learning and continuous geometric representation of the fluid and solid domain. The proposed framework consists of a pre-trained neural network surrogate model as an environment to predict heat transfer and pressure drop of the generated geometries. The design space is parameterized by composite Bezier curve to solve multiple fin shape optimization. We show that our multi-agent framework can learn the policy for design strategy using multi-objective reward without the need for shape derivation or differentiable objective function.
    CoSimGNN: Towards Large-scale Graph Similarity Computation. (arXiv:2005.07115v7 [cs.LG] UPDATED)
    The ability to compute similarity scores between graphs based on metrics such as Graph Edit Distance (GED) is important in many real-world applications. Computing exact GED values is typically an NP-hard problem and traditional algorithms usually achieve an unsatisfactory trade-off between accuracy and efficiency. Recently, Graph Neural Networks (GNNs) provide a data-driven solution for this task, which is more efficient while maintaining prediction accuracy in small graph (around 10 nodes per graph) similarity computation. Existing GNN-based methods, which either respectively embeds two graphs (lack of low-level cross-graph interactions) or deploy cross-graph interactions for whole graph pairs (redundant and time-consuming), are still not able to achieve competitive results when the number of nodes in graphs increases. In this paper, we focus on similarity computation for large-scale graphs and propose the "embedding-coarsening-matching" framework CoSimGNN, which first embeds and coarsens large graphs with adaptive pooling operation and then deploys fine-grained interactions on the coarsened graphs for final similarity scores. Furthermore, we create several synthetic datasets which provide new benchmarks for graph similarity computation. Detailed experiments on both synthetic and real-world datasets have been conducted and CoSimGNN achieves the best performance while the inference time is at most 1/3 of that of previous state-of-the-art.  ( 3 min )
    Commander's Intent: A Dataset and Modeling Approach for Human-AI Task Specification in Strategic Play. (arXiv:2208.08374v1 [cs.AI])
    Effective Human-AI teaming requires the ability to communicate the goals of the team and constraints under which you need the agent to operate. Providing the ability to specify the shared intent or operation criteria of the team can enable an AI agent to perform its primary function while still being able to cater to the specific desires of the current team. While significant work has been conducted to instruct an agent to perform a task, via language or demonstrations, prior work lacks a focus on building agents which can operate within the parameters specified by a team. Worse yet, there is a dearth of research pertaining to enabling humans to provide their specifications through unstructured, naturalist language. In this paper, we propose the use of goals and constraints as a scaffold to modulate and evaluate autonomous agents. We contribute to this field by presenting a novel dataset, and an associated data collection protocol, which maps language descriptions to goals and constraints corresponding to specific strategies developed by human participants for the board game Risk. Leveraging state-of-the-art language models and augmentation procedures, we develop a machine learning framework which can be used to identify goals and constraints from unstructured strategy descriptions. To empirically validate our approach we conduct a human-subjects study to establish a human-baseline for our dataset. Our results show that our machine learning architecture is better able to interpret unstructured language descriptions into strategy specifications than human raters tasked with performing the same machine translation task (F(1,272.53) = 17.025, p < 0.001).  ( 3 min )
    Superior generalization of smaller models in the presence of significant label noise. (arXiv:2208.08003v1 [cs.LG])
    The benefits of over-parameterization in achieving superior generalization performance have been shown in several recent studies, justifying the trend of using larger models in practice. In the context of robust learning however, the effect of neural network size has not been well studied. In this work, we find that in the presence of a substantial fraction of mislabeled examples, increasing the network size beyond some point can be harmful. In particular, the originally monotonic or `double descent' test loss curve (w.r.t. network width) turns into a U-shaped or a double U-shaped curve when label noise increases, suggesting that the best generalization is achieved by some model with intermediate size. We observe that when network size is controlled by density through random pruning, similar test loss behaviour is observed. We also take a closer look into both phenomenon through bias-variance decomposition and theoretically characterize how label noise shapes the variance term. Similar behavior of the test loss can be observed even when state-of-the-art robust methods are applied, indicating that limiting the network size could further boost existing methods. Finally, we empirically examine the effect of network size on the smoothness of learned functions, and find that the originally negative correlation between size and smoothness is flipped by label noise.
    An Adjoint-Free Algorithm for CNOP via Sampling. (arXiv:2208.00956v2 [math.OC] UPDATED)
    In this paper, we propose a sampling algorithm based on statistical machine learning to obtain conditional nonlinear optimal perturbation (CNOP), which is essentially different from the traditional deterministic optimization methods. The new approach does not only reduce the extremely expensive gradient (first-order) information directly by the objective value (zeroth-order) information, but also avoid the use of adjoint technique that gives rise to the huge storage problem and the instability from linearization. Meanwhile, an intuitive anlysis and a rigorous concentration inequality for the approximate gradient by sampling are shown. The numerical experiments to obtain the CNOPs by the performance of standard spatial sturctures for a theoretical model, Burgers equation with small viscosity, demonstrate that at the cost of losing accuracy, fewer samples spend time relatively shorter than the adjoint-based method and directly from definition. Finally, we reveal that the nonlinear time evolution of the CNOPs obtained by all the algorithms are almost consistent with the quantity of norm square of perturbations, their difference and relative difference on the basis of the definition method.
    Which Factors Drive Open Access Publishing? A Springer Nature Case Study. (arXiv:2208.08221v1 [cs.DL])
    Open Access (OA) facilitates access to articles. But, authors or funders often must pay the publishing costs preventing authors who do not receive financial support from participating in OA publishing and citation advantage for OA articles. OA may exacerbate existing inequalities in the publication system rather than overcome them. To investigate this, we studied 522,664 articles published by Springer Nature. Employing statistical methods, we describe the relationship between authors affiliated with countries from different income levels, their choice of publishing (OA or closed access), and the citation impact of their papers. A machine learning classification method helped us to explore the association between OA-publishing and attributes of the author, especially eligibility for APC-waivers or discounts, journal, country, and paper. The results indicate that authors eligible for the APC-waivers publish more in gold-OA-journals than other authors. In contrast, authors eligible for an APC discount have the lowest ratio of OA publications, leading to the assumption that this discount insufficiently motivates authors to publish in a gold-OA-journal. The rank of journals is a significant driver for publishing in a gold-OA-journal, whereas the OA option is mostly avoided in hybrid journals. Seniority, experience with OA publications, and the scientific field are the most decisive factors in OA-publishing.
    Transformer Vs. MLP-Mixer Exponential Expressive Gap For NLP Problems. (arXiv:2208.08191v1 [cs.CL])
    Vision-Transformers are widely used in various vision tasks. Meanwhile, there is another line of works starting with the MLP-mixer trying to achieve similar performance using mlp-based architectures. Interestingly, until now none reported using them for NLP tasks, additionally until now non of those mlp-based architectures claimed to achieve state-of-the-art in vision tasks. In this paper, we analyze the expressive power of mlp-based architectures in modeling dependencies between multiple different inputs simultaneously, and show an exponential gap between the attention and the mlp-based mechanisms. Our results suggest a theoretical explanation for the mlp inability to compete with attention-based mechanisms in NLP problems, they also suggest that the performance gap in vision tasks may be due to the mlp relative weakness in modeling dependencies between multiple different locations, and that combining smart input permutations to the mlp architectures may not suffice alone to close the performance gap.
    GraPE: fast and scalable Graph Processing and Embedding. (arXiv:2110.06196v2 [cs.LG] UPDATED)
    Graph Representation Learning methods opened new possibilities for addressing complex, real-world problems represented by graphs. However, many graphs used in these applications comprise millions of nodes and billions of edges and are beyond the capabilities of current methods and software implementations. We present GRAPE, a software resource for graph processing and representation learning that is able to scale with big graphs by using specialized and smart data structures, algorithms, and a fast parallel implementation. When compared with state of the art software resources, GRAPE shows an improvement of orders of magnitude in empirical space and time complexity, as well as a substantial and statistically significant improvement in edge prediction and node label prediction performance. Furthermore, GRAPE provides over 80, 000 graphs from the literature and other sources, standardized interfaces allowing a straightforward integration of third-party libraries, 61 node embedding methods, 25 inference models, and 3 modular pipelines to allow a FAIR and reproducible comparison of methods and libraries for graph processing and embedding.  ( 3 min )
    Adversarial Inverse Reinforcement Learning for Mean Field Games. (arXiv:2104.14654v3 [cs.LG] UPDATED)
    Mean field games (MFGs) provide a mathematically tractable framework for modelling large-scale multi-agent systems by leveraging mean field theory to simplify interactions among agents. It enables applying inverse reinforcement learning (IRL) to predict behaviours of large populations by recovering reward signals from demonstrated behaviours. However, existing IRL methods for MFGs are powerless to reason about uncertainties in demonstrated behaviours of individual agents. This paper proposes a novel framework, Mean-Field Adversarial IRL (MF-AIRL), which is capable of tackling uncertainties in demonstrations. We build MF-AIRL upon maximum entropy IRL and a new equilibrium concept. We evaluate our approach on simulated tasks with imperfect demonstrations. Experimental results demonstrate the superiority of MF-AIRL over existing methods in reward recovery.  ( 2 min )
    A Framework for Machine Learning of Model Error in Dynamical Systems. (arXiv:2107.06658v3 [math.DS] UPDATED)
    The development of data-informed predictive models for dynamical systems is of widespread interest in many disciplines. We present a unifying framework for blending mechanistic and machine-learning approaches to identify dynamical systems from noisily and partially observed data. We compare pure data-driven learning with hybrid models which incorporate imperfect domain knowledge. Our formulation is agnostic to the chosen machine learning model, is presented in both continuous- and discrete-time settings, and is compatible both with model errors that exhibit substantial memory and errors that are memoryless. First, we study memoryless linear (w.r.t. parametric-dependence) model error from a learning theory perspective, defining excess risk and generalization error. For ergodic continuous-time systems, we prove that both excess risk and generalization error are bounded above by terms that diminish with the square-root of T, the time-interval over which training data is specified. Secondly, we study scenarios that benefit from modeling with memory, proving universal approximation theorems for two classes of continuous-time recurrent neural networks (RNNs): both can learn memory-dependent model error. In addition, we connect one class of RNNs to reservoir computing, thereby relating learning of memory-dependent error to recent work on supervised learning between Banach spaces using random features. Numerical results are presented (Lorenz '63, Lorenz '96 Multiscale systems) to compare purely data-driven and hybrid approaches, finding hybrid methods less data-hungry and more parametrically efficient. Finally, we demonstrate numerically how data assimilation can be leveraged to learn hidden dynamics from noisy, partially-observed data, and illustrate challenges in representing memory by this approach, and in the training of such models.  ( 3 min )
    A Low-Cost Neural ODE with Depthwise Separable Convolution for Edge Domain Adaptation on FPGAs. (arXiv:2107.12824v3 [cs.LG] UPDATED)
    High-performance deep neural network (DNN)-based systems are in high demand in edge environments. Due to its high computational complexity, it is challenging to deploy DNNs on edge devices with strict limitations on computational resources. In this paper, we derive a compact while highly-accurate DNN model, termed dsODENet, by combining recently-proposed parameter reduction techniques: Neural ODE (Ordinary Differential Equation) and DSC (Depthwise Separable Convolution). Neural ODE exploits a similarity between ResNet and ODE, and shares most of weight parameters among multiple layers, which greatly reduces the memory consumption. We apply dsODENet to a domain adaptation as a practical use case with image classification datasets. We also propose a resource-efficient FPGA-based design for dsODENet, where all the parameters and feature maps except for pre- and post-processing layers can be mapped onto on-chip memories. It is implemented on Xilinx ZCU104 board and evaluated in terms of domain adaptation accuracy, inference speed, FPGA resource utilization, and speedup rate compared to a software counterpart. The results demonstrate that dsODENet achieves comparable or slightly better domain adaptation accuracy compared to our baseline Neural ODE implementation, while the total parameter size without pre- and post-processing layers is reduced by 54.2% to 79.8%. Our FPGA implementation accelerates the inference speed by 23.8 times.
    Localized Debiased Machine Learning: Efficient Inference on Quantile Treatment Effects and Beyond. (arXiv:1912.12945v5 [stat.ML] UPDATED)
    We consider estimating a low-dimensional parameter in an estimating equation involving high-dimensional nuisances that depend on the parameter. A central example is the efficient estimating equation for the (local) quantile treatment effect ((L)QTE) in causal inference, which involves as a nuisance the covariate-conditional cumulative distribution function evaluated at the quantile to be estimated. Debiased machine learning (DML) is a data-splitting approach to estimating high-dimensional nuisances using flexible machine learning methods, but applying it to problems with parameter-dependent nuisances is impractical. For (L)QTE, DML requires we learn the whole covariate-conditional cumulative distribution function. We instead propose localized debiased machine learning (LDML), which avoids this burdensome step and needs only estimate nuisances at a single initial rough guess for the parameter. For (L)QTE, LDML involves learning just two regression functions, a standard task for machine learning methods. We prove that under lax rate conditions our estimator has the same favorable asymptotic behavior as the infeasible estimator that uses the unknown true nuisances. Thus, LDML notably enables practically-feasible and theoretically-grounded efficient estimation of important quantities in causal inference such as (L)QTEs when we must control for many covariates and/or flexible relationships, as we demonstrate in empirical studies.
    Learning low-rank latent mesoscale structures in networks. (arXiv:2102.06984v3 [cs.SI] UPDATED)
    It is common to use networks to encode the architecture of interactions between entities in complex systems in applications in the physical, biological, social, and information sciences. To study the large-scale behavior of complex systems, it is useful to study mesoscale structures in networks as building blocks that influence such behavior. We present a new approach for describing low-rank mesoscale structure in networks, and we illustrate our approach using several synthetic network models and empirical friendship, collaboration, and protein--protein interaction (PPI) networks. We find that these networks possess a relatively small number of `latent motifs' that together can successfully approximate most subgraphs of a network at a fixed mesoscale. We use an algorithm that we call `network dictionary learning' (NDL), which combines a network-sampling method and nonnegative matrix factorization, to learn the latent motifs of a given network. The ability to encode a network using a set of latent motifs has a wide variety of applications to network-analysis tasks, such as comparison, denoising, and edge inference. Additionally, using our new network denoising and reconstruction (NDR) algorithm, we demonstrate how to denoise a corrupted network by using only the latent motifs that one learns directly from the corrupted networks.  ( 3 min )
    Adversarial Image Color Transformations in Explicit Color Filter Space. (arXiv:2011.06690v2 [cs.CV] UPDATED)
    Deep Neural Networks have been shown to be vulnerable to adversarial images. Conventional attacks strive for indistinguishable adversarial images with strictly restricted perturbations. Recently, researchers have moved to explore distinguishable yet non-suspicious adversarial images and demonstrated that color transformation attacks are effective. In this work, we propose Adversarial Color Filter (AdvCF), a novel color transformation attack that is optimized with gradient information in the parameter space of a simple color filter. In particular, our color filter space is explicitly specified so that we are able to provide a systematic analysis of model robustness against adversarial color transformations, from both the attack and defense perspectives. In contrast, existing color transformation attacks do not offer the opportunity for systematic analysis due to the lack of such an explicit space. We further conduct extensive comparisons between different color transformation attacks on both the success rate and image acceptability, through a user study. Additional results provide interesting new insights into model robustness against AdvCF in another three visual tasks. We also highlight the human-interpretability of AdvCF, which is promising in practical use scenarios, and show its superiority over the state-of-the-art human-interpretable color transformation attack on both the image acceptability and efficiency.
    FCN-Transformer Feature Fusion for Polyp Segmentation. (arXiv:2208.08352v1 [eess.IV])
    Colonoscopy is widely recognised as the gold standard procedure for the early detection of colorectal cancer (CRC). Segmentation is valuable for two significant clinical applications, namely lesion detection and classification, providing means to improve accuracy and robustness. The manual segmentation of polyps in colonoscopy images is time-consuming. As a result, the use of deep learning (DL) for automation of polyp segmentation has become important. However, DL-based solutions can be vulnerable to overfitting and the resulting inability to generalise to images captured by different colonoscopes. Recent transformer-based architectures for semantic segmentation both achieve higher performance and generalise better than alternatives, however typically predict a segmentation map of $\frac{h}{4}\times\frac{w}{4}$ spatial dimensions for a $h\times w$ input image. To this end, we propose a new architecture for full-size segmentation which leverages the strengths of a transformer in extracting the most important features for segmentation in a primary branch, while compensating for its limitations in full-size prediction with a secondary fully convolutional branch. The resulting features from both branches are then fused for final prediction of a $h\times w$ segmentation map. We demonstrate our method's state-of-the-art performance with respect to the mDice, mIoU, mPrecision, and mRecall metrics, on both the Kvasir-SEG and CVC-ClinicDB dataset benchmarks. Additionally, we train the model on each of these datasets and evaluate on the other to demonstrate its superior generalisation performance.
    Quadratic Multiform Separation: A New Classification Model in Machine Learning. (arXiv:2110.04925v2 [stat.ML] UPDATED)
    In this paper we present a new classification model in machine learning. Our result is threefold: 1) The model produces comparable predictive accuracy to that of most common classification models. 2) It runs significantly faster than most common classification models. 3) It has the ability to identify a portion of unseen samples for which class labels can be found with much higher predictive accuracy. Currently there are several patents pending on the proposed model.  ( 2 min )
    RegMix: Data Mixing Augmentation for Regression. (arXiv:2106.03374v4 [cs.LG] UPDATED)
    Data augmentation is becoming essential for improving regression performance in critical applications including manufacturing, climate prediction, and finance. Existing techniques for data augmentation largely focus on classification tasks and do not readily apply to regression tasks. In particular, the recent Mixup techniques for classification have succeeded in improving the model performance, which is reasonable due to the characteristics of the classification task, but has limitations in regression. We show that mixing examples that have large data distances using linear interpolations may have increasingly-negative effects on model performance. Our key idea is thus to limit the distances between examples that are mixed. We propose RegMix, a data augmentation framework for regression that learns for each example how many nearest neighbors it should be mixed with for the best model performance using a validation set. Our experiments conducted both on synthetic and real datasets show that RegMix outperforms state-of-the-art data augmentation baselines applicable to regression.  ( 2 min )
    Novel Deep Learning Approach to Derive Cytokeratin Expression and Epithelium Segmentation from DAPI. (arXiv:2208.08284v1 [eess.IV])
    Generative Adversarial Networks (GANs) are state of the art for image synthesis. Here, we present dapi2ck, a novel GAN-based approach to synthesize cytokeratin (CK) staining from immunofluorescent (IF) DAPI staining of nuclei in non-small cell lung cancer (NSCLC) images. We use the synthetic CK to segment epithelial regions, which, compared to expert annotations, yield equally good results as segmentation on stained CK. Considering the limited number of markers in a multiplexed IF (mIF) panel, our approach allows to replace CK by another marker addressing the complexity of the tumor micro-environment (TME) to facilitate patient selection for immunotherapies. In contrast to stained CK, dapi2ck does not suffer from issues like unspecific CK staining or loss of tumoral CK expression.  ( 2 min )
    On the Privacy Effect of Data Enhancement via the Lens of Memorization. (arXiv:2208.08270v1 [cs.LG])
    Machine learning poses severe privacy concerns as it is shown that the learned models can reveal sensitive information about their training data. Many works have investigated the effect of widely-adopted data augmentation (DA) and adversarial training (AT) techniques, termed data enhancement in the paper, on the privacy leakage of machine learning models. Such privacy effects are often measured by membership inference attacks (MIAs), which aim to identify whether a particular example belongs to the training set or not. We propose to investigate privacy from a new perspective called memorization. Through the lens of memorization, we find that previously deployed MIAs produce misleading results as they are less likely to identify samples with higher privacy risks as members compared to samples with low privacy risks. To solve this problem, we deploy a recent attack that can capture the memorization degrees of individual samples for evaluation. Through extensive experiments, we unveil non-trivial findings about the connections between three important properties of machine learning models, including privacy, generalization gap, and adversarial robustness. We demonstrate that, unlike existing results, the generalization gap is shown not highly correlated with privacy leakage. Moreover, stronger adversarial robustness does not necessarily imply that the model is more susceptible to privacy attacks.  ( 3 min )
    Semantic Communications with Discrete-time Analog Transmission: A PAPR Perspective. (arXiv:2208.08342v1 [cs.IT])
    Recent progress in deep learning (DL)-based joint source-channel coding (DeepJSCC) has led to a new paradigm of semantic communications. Two salient features of DeepJSCC-based semantic communications are the exploitation of semantic-aware features directly from the source signal, and the discrete-time analog transmission (DTAT) of these features. Compared with traditional digital communications, semantic communications with DeepJSCC provide superior reconstruction performance at the receiver and graceful degradation with diminishing channel quality, but also exhibit a large peak-to-average power ratio (PAPR) in the transmitted signal. An open question has been whether the gains of DeepJSCC come from the additional freedom brought by the high-PAPR continuous-amplitude signal. In this paper, we address this question by exploring three PAPR reduction techniques in the application of image transmission. We confirm that the superior image reconstruction performance of DeepJSCC-based semantic communications can be retained while the transmitted PAPR is suppressed to an acceptable level. This observation is an important step towards the implementation of DeepJSCC in practical semantic communication systems.  ( 2 min )
    Deep Gaussian Process Emulation using Stochastic Imputation. (arXiv:2107.01590v2 [stat.ML] UPDATED)
    Deep Gaussian processes (DGPs) provide a rich class of models that can better represent functions with varying regimes or sharp changes, compared to conventional GPs. In this work, we propose a novel inference method for DGPs for computer model emulation. By stochastically imputing the latent layers, our approach transforms a DGP into a linked GP: a novel emulator developed for systems of linked computer models. This transformation permits an efficient DGP training procedure that only involves optimizations of conventional GPs. In addition, predictions from DGP emulators can be made in a fast and analytically tractable manner by naturally utilizing the closed form predictive means and variances of linked GP emulators. We demonstrate the method in a series of synthetic examples and empirical applications, and show that it is a competitive candidate for DGP surrogate inference, combining efficiency that is comparable to doubly stochastic variational inference and uncertainty quantification that is comparable to the fully-Bayesian approach. A $\texttt{Python}$ package $\texttt{dgpsi}$ implementing the method is also produced and available at https://github.com/mingdeyu/DGP.  ( 2 min )
    Arachne: Search Based Repair of Deep Neural Networks. (arXiv:1912.12463v2 [cs.LG] UPDATED)
    The rapid and widespread adoption of Deep Neural Networks (DNNs) has called for ways to test their behaviour, and many testing approaches have successfully revealed misbehaviour of DNNs. However, it is relatively unclear what one can do to correct such behaviour after revelation, as retraining involves costly data collection and does not guarantee to fix the underlying issue. This paper introduces Arachne, a novel program repair technique for DNNs, which directly repairs DNNs using their input-output pairs as a specification. Arachne localises neural weights on which it can generate effective patches and uses Differential Evolution to optimise the localised weights and correct the misbehaviour. An empirical study using different benchmarks shows that Arachne can fix specific misclassifications of a DNN without reducing general accuracy significantly. On average, patches generated by Arachne generalise to 61.3% of unseen misbehaviour, whereas those by a state-of-the-art DNN repair technique generalise only to 10.2% and sometimes to none while taking tens of times more than Arachne. We also show that Arachne can address fairness issues by debiasing a gender classification model. Finally, we successfully apply Arachne to a text sentiment model to show that it generalises beyond Convolutional Neural Networks.  ( 3 min )
    Sparse Nonnegative Tucker Decomposition and Completion under Noisy Observations. (arXiv:2208.08287v1 [cs.LG])
    Tensor decomposition is a powerful tool for extracting physically meaningful latent factors from multi-dimensional nonnegative data, and has been an increasing interest in a variety of fields such as image processing, machine learning, and computer vision. In this paper, we propose a sparse nonnegative Tucker decomposition and completion method for the recovery of underlying nonnegative data under noisy observations. Here the underlying nonnegative data tensor is decomposed into a core tensor and several factor matrices with all entries being nonnegative and the factor matrices being sparse. The loss function is derived by the maximum likelihood estimation of the noisy observations, and the $\ell_0$ norm is employed to enhance the sparsity of the factor matrices. We establish the error bound of the estimator of the proposed model under generic noise scenarios, which is then specified to the observations with additive Gaussian noise, additive Laplace noise, and Poisson observations, respectively. Our theoretical results are better than those by existing tensor-based or matrix-based methods. Moreover, the minimax lower bounds are shown to be matched with the derived upper bounds up to logarithmic factors. Numerical examples on both synthetic and real-world data sets demonstrate the superiority of the proposed method for nonnegative tensor data completion.  ( 2 min )
    Deep Contrastive Multiview Network Embedding. (arXiv:2108.08296v2 [cs.LG] UPDATED)
    Multiview network embedding aims at projecting nodes in the network to low-dimensional vectors, while preserving their multiple relations and attribute information. Contrastive learning approaches have shown promising performance in this task. However, they neglect the semantic consistency between fused and view representations and have difficulty in modeling complementary information between different views. To deal with these deficiencies, this work presents a novel Contrastive leaRning framEwork for Multiview network Embedding (CREME). In our work, different views can be obtained based on the various relations among nodes. Then, we generate view embeddings via proper view encoders and utilize an attentive multiview aggregator to fuse these representations. Particularly, we design two collaborative contrastive objectives, view fusion InfoMax and inter-view InfoMin, to train the model in a self-supervised manner. The former objective distills information from embeddings generated from different views, while the latter captures complementary information among views to promote distinctive view embeddings. We also show that the two objectives can be unified into one objective for model training. Extensive experiments on three real-world datasets demonstrate that our proposed CREME is able to consistently outperform state-of-the-art methods.  ( 3 min )
    SYNTHESIS: A Semi-Asynchronous Path-Integrated Stochastic Gradient Method for Distributed Learning in Computing Clusters. (arXiv:2208.08425v1 [cs.LG])
    To increase the training speed of distributed learning, recent years have witnessed a significant amount of interest in developing both synchronous and asynchronous distributed stochastic variance-reduced optimization methods. However, all existing synchronous and asynchronous distributed training algorithms suffer from various limitations in either convergence speed or implementation complexity. This motivates us to propose an algorithm called \algname (\ul{s}emi-as\ul{yn}chronous pa\ul{th}-int\ul{e}grated \ul{s}tochastic grad\ul{i}ent \ul{s}earch), which leverages the special structure of the variance-reduction framework to overcome the limitations of both synchronous and asynchronous distributed learning algorithms, while retaining their salient features. We consider two implementations of \algname under distributed and shared memory architectures. We show that our \algname algorithms have \(O(\sqrt{N}\epsilon^{-2}(\Delta+1)+N)\) and \(O(\sqrt{N}\epsilon^{-2}(\Delta+1) d+N)\) computational complexities for achieving an \(\epsilon\)-stationary point in non-convex learning under distributed and shared memory architectures, respectively, where \(N\) denotes the total number of training samples and \(\Delta\) represents the maximum delay of the workers. Moreover, we investigate the generalization performance of \algname by establishing algorithmic stability bounds for quadratic strongly convex and non-convex optimization. We further conduct extensive numerical experiments to verify our theoretical findings  ( 2 min )
    Minimum Cost Adaptive Submodular Cover. (arXiv:2208.08351v1 [cs.DS])
    We consider the problem of minimum cost cover of adaptive-submodular functions, and provide a 4(ln Q+1)-approximation algorithm, where Q is the goal value. This bound is nearly the best possible as the problem does not admit any approximation ratio better than ln Q (unless P=NP). Our result is the first O(ln Q)-approximation algorithm for this problem. Previously, O(ln Q) approximation algorithms were only known assuming either independent items or unit-cost items. Furthermore, our result easily extends to the setting where one wants to simultaneously cover multiple adaptive-submodular functions: we obtain the first approximation algorithm for this generalization.  ( 2 min )
    LAMA-Net: Unsupervised Domain Adaptation via Latent Alignment and Manifold Learning for RUL Prediction. (arXiv:2208.08388v1 [cs.LG])
    Prognostics and Health Management (PHM) is an emerging field which has received much attention from the manufacturing industry because of the benefits and efficiencies it brings to the table. And Remaining Useful Life (RUL) prediction is at the heart of any PHM system. Most recent data-driven research demand substantial volumes of labelled training data before a performant model can be trained under the supervised learning paradigm. This is where Transfer Learning (TL) and Domain Adaptation (DA) methods step in and make it possible for us to generalize a supervised model to other domains with different data distributions with no labelled data. In this paper, we propose \textit{LAMA-Net}, an encoder-decoder based model (Transformer) with an induced bottleneck, Latent Alignment using Maximum Mean Discrepancy (MMD) and manifold learning is proposed to tackle the problem of Unsupervised Homogeneous Domain Adaptation for RUL prediction. \textit{LAMA-Net} is validated using the C-MAPSS Turbofan Engine dataset by NASA and compared against other state-of-the-art techniques for DA. The results suggest that the proposed method offers a promising approach to perform domain adaptation in RUL prediction. Code will be made available once the paper comes out of review.  ( 2 min )
    The Counterfactual-Shapley Value: Attributing Change in System Metrics. (arXiv:2208.08399v1 [cs.LG])
    Given an unexpected change in the output metric of a large-scale system, it is important to answer why the change occurred: which inputs caused the change in metric? A key component of such an attribution question is estimating the counterfactual: the (hypothetical) change in the system metric due to a specified change in a single input. However, due to inherent stochasticity and complex interactions between parts of the system, it is difficult to model an output metric directly. We utilize the computational structure of a system to break up the modelling task into sub-parts, such that each sub-part corresponds to a more stable mechanism that can be modelled accurately over time. Using the system's structure also helps to view the metric as a computation over a structural causal model (SCM), thus providing a principled way to estimate counterfactuals. Specifically, we propose a method to estimate counterfactuals using time-series predictive models and construct an attribution score, CF-Shapley, that is consistent with desirable axioms for attributing an observed change in the output metric. Unlike past work on causal shapley values, our proposed method can attribute a single observed change in output (rather than a population-level effect) and thus provides more accurate attribution scores when evaluated on simulated datasets. As a real-world application, we analyze a query-ad matching system with the goal of attributing observed change in a metric for ad matching density. Attribution scores explain how query volume and ad demand from different query categories affect the ad matching density, leading to actionable insights and uncovering the role of external events (e.g., "Cheetah Day") in driving the matching density.  ( 3 min )
    Open Long-Tailed Recognition in a Dynamic World. (arXiv:2208.08349v1 [cs.CV])
    Real world data often exhibits a long-tailed and open-ended (with unseen classes) distribution. A practical recognition system must balance between majority (head) and minority (tail) classes, generalize across the distribution, and acknowledge novelty upon the instances of unseen classes (open classes). We define Open Long-Tailed Recognition++ (OLTR++) as learning from such naturally distributed data and optimizing for the classification accuracy over a balanced test set which includes both known and open classes. OLTR++ handles imbalanced classification, few-shot learning, open-set recognition, and active learning in one integrated algorithm, whereas existing classification approaches often focus only on one or two aspects and deliver poorly over the entire spectrum. The key challenges are: 1) how to share visual knowledge between head and tail classes, 2) how to reduce confusion between tail and open classes, and 3) how to actively explore open classes with learned knowledge. Our algorithm, OLTR++, maps images to a feature space such that visual concepts can relate to each other through a memory association mechanism and a learned metric (dynamic meta-embedding) that both respects the closed world classification of seen classes and acknowledges the novelty of open classes. Additionally, we propose an active learning scheme based on visual memory, which learns to recognize open classes in a data-efficient manner for future expansions. On three large-scale open long-tailed datasets we curated from ImageNet (object-centric), Places (scene-centric), and MS1M (face-centric) data, as well as three standard benchmarks (CIFAR-10-LT, CIFAR-100-LT, and iNaturalist-18), our approach, as a unified framework, consistently demonstrates competitive performance. Notably, our approach also shows strong potential for the active exploration of open classes and the fairness analysis of minority groups.  ( 3 min )
    Ask Question First for Enhancing Lifelong Language Learning. (arXiv:2208.08367v1 [cs.CL])
    Lifelong language learning aims to stream learning NLP tasks while retaining knowledge of previous tasks. Previous works based on the language model and following data-free constraint approaches have explored formatting all data as "begin token (\textit{B}) + context (\textit{C}) + question (\textit{Q}) + answer (\textit{A})" for different tasks. However, they still suffer from catastrophic forgetting and are exacerbated when the previous task's pseudo data is insufficient for the following reasons: (1) The model has difficulty generating task-corresponding pseudo data, and (2) \textit{A} is prone to error when \textit{A} and \textit{C} are separated by \textit{Q} because the information of the \textit{C} is diminished before generating \textit{A}. Therefore, we propose the Ask Question First and Replay Question (AQF-RQ), including a novel data format "\textit{BQCA}" and a new training task to train pseudo questions of previous tasks. Experimental results demonstrate that AQF-RQ makes it easier for the model to generate more pseudo data that match corresponding tasks, and is more robust to both sufficient and insufficient pseudo-data when the task boundary is both clear and unclear. AQF-RQ can achieve only 0.36\% lower performance than multi-task learning.  ( 2 min )
    Extract fundamental frequency based on CNN combined with PYIN. (arXiv:2208.08354v1 [cs.SD])
    This paper refers to the extraction of multiple fundamental frequencies (multiple F0) based on PYIN, an algorithm for extracting the fundamental frequency (F0) of monophonic music, and a trained convolutional neural networks (CNN) model, where a pitch salience function of the input signal is produced to estimate the multiple F0. The implementation of these two algorithms and their corresponding advantages and disadvantages are discussed in this article. Analysing the different performance of these two methods, PYIN is applied to supplement the F0 extracted from the trained CNN model to combine the advantages of these two algorithms. For evaluation, four pieces played by two violins are used, and the performance of the models are evaluated accoring to the flatness of the F0 curve extracted. The result shows the combined model outperforms the original algorithms when extracting F0 from monophonic music and polyphonic music.  ( 2 min )
    Deep Generative Views to Mitigate Gender Classification Bias Across Gender-Race Groups. (arXiv:2208.08382v1 [cs.CV])
    Published studies have suggested the bias of automated face-based gender classification algorithms across gender-race groups. Specifically, unequal accuracy rates were obtained for women and dark-skinned people. To mitigate the bias of gender classifiers, the vision community has developed several strategies. However, the efficacy of these mitigation strategies is demonstrated for a limited number of races mostly, Caucasian and African-American. Further, these strategies often offer a trade-off between bias and classification accuracy. To further advance the state-of-the-art, we leverage the power of generative views, structured learning, and evidential learning towards mitigating gender classification bias. We demonstrate the superiority of our bias mitigation strategy in improving classification accuracy and reducing bias across gender-racial groups through extensive experimental validation, resulting in state-of-the-art performance in intra- and cross dataset evaluations.  ( 2 min )
    Leukocyte Classification using Multimodal Architecture Enhanced by Knowledge Distillation. (arXiv:2208.08331v1 [eess.IV])
    Recently, a lot of automated white blood cells (WBC) or leukocyte classification techniques have been developed. However, all of these methods only utilize a single modality microscopic image i.e. either blood smear or fluorescence based, thus missing the potential of a better learning from multimodal images. In this work, we develop an efficient multimodal architecture based on a first of its kind multimodal WBC dataset for the task of WBC classification. Specifically, our proposed idea is developed in two steps - 1) First, we learn modality specific independent subnetworks inside a single network only; 2) We further enhance the learning capability of the independent subnetworks by distilling knowledge from high complexity independent teacher networks. With this, our proposed framework can achieve a high performance while maintaining low complexity for a multimodal dataset. Our unique contribution is two-fold - 1) We present a first of its kind multimodal WBC dataset for WBC classification; 2) We develop a high performing multimodal architecture which is also efficient and low in complexity at the same time.  ( 2 min )
    Transformer-Based Deep Learning Model for Stock Price Prediction: A Case Study on Bangladesh Stock Market. (arXiv:2208.08300v1 [q-fin.ST])
    In modern capital market the price of a stock is often considered to be highly volatile and unpredictable because of various social, financial, political and other dynamic factors. With calculated and thoughtful investment, stock market can ensure a handsome profit with minimal capital investment, while incorrect prediction can easily bring catastrophic financial loss to the investors. This paper introduces the application of a recently introduced machine learning model - the Transformer model, to predict the future price of stocks of Dhaka Stock Exchange (DSE), the leading stock exchange in Bangladesh. The transformer model has been widely leveraged for natural language processing and computer vision tasks, but, to the best of our knowledge, has never been used for stock price prediction task at DSE. Recently the introduction of time2vec encoding to represent the time series features has made it possible to employ the transformer model for the stock price prediction. This paper concentrates on the application of transformer-based model to predict the price movement of eight specific stocks listed in DSE based on their historical daily and weekly data. Our experiments demonstrate promising results and acceptable root mean squared error on most of the stocks.  ( 3 min )
    Error Parity Fairness: Testing for Group Fairness in Regression Tasks. (arXiv:2208.08279v1 [cs.LG])
    The applications of Artificial Intelligence (AI) surround decisions on increasingly many aspects of human lives. Society responds by imposing legal and social expectations for the accountability of such automated decision systems (ADSs). Fairness, a fundamental constituent of AI accountability, is concerned with just treatment of individuals and sensitive groups (e.g., based on sex, race). While many studies focus on fair learning and fairness testing for the classification tasks, the literature is rather limited on how to examine fairness in regression tasks. This work presents error parity as a regression fairness notion and introduces a testing methodology to assess group fairness based on a statistical hypothesis testing procedure. The error parity test checks whether prediction errors are distributed similarly across sensitive groups to determine if an ADS is fair. It is followed by a suitable permutation test to compare groups on several statistics to explore disparities and identify impacted groups. The usefulness and applicability of the proposed methodology are demonstrated via a case study on COVID-19 projections in the US at the county level, which revealed race-based differences in forecast errors. Overall, the proposed regression fairness testing methodology fills a gap in the fair machine learning literature and may serve as a part of larger accountability assessments and algorithm audits.  ( 3 min )
    Position-aware Structure Learning for Graph Topology-imbalance by Relieving Under-reaching and Over-squashing. (arXiv:2208.08302v1 [cs.LG])
    Topology-imbalance is a graph-specific imbalance problem caused by the uneven topology positions of labeled nodes, which significantly damages the performance of GNNs. What topology-imbalance means and how to measure its impact on graph learning remain under-explored. In this paper, we provide a new understanding of topology-imbalance from a global view of the supervision information distribution in terms of under-reaching and over-squashing, which motivates two quantitative metrics as measurements. In light of our analysis, we propose a novel position-aware graph structure learning framework named PASTEL, which directly optimizes the information propagation path and solves the topology-imbalance issue in essence. Our key insight is to enhance the connectivity of nodes within the same class for more supervision information, thereby relieving the under-reaching and over-squashing phenomena. Specifically, we design an anchor-based position encoding mechanism, which better incorporates relative topology position and enhances the intra-class inductive bias by maximizing the label influence. We further propose a class-wise conflict measure as the edge weights, which benefits the separation of different node classes. Extensive experiments demonstrate the superior potential and adaptability of PASTEL in enhancing GNNs' power in different data annotation scenarios.  ( 3 min )
    Quantum Machine Learning for Material Synthesis and Hardware Security. (arXiv:2208.08273v1 [quant-ph])
    Using quantum computing, this paper addresses two scientifically pressing and day-to-day relevant problems, namely, chemical retrosynthesis which is an important step in drug/material discovery and security of the semiconductor supply chain. We show that Quantum Long Short-Term Memory (QLSTM) is a viable tool for retrosynthesis. We achieve 65% training accuracy with QLSTM, whereas classical LSTM can achieve 100%. However, in testing, we achieve 80% accuracy with the QLSTM while classical LSTM peaks at only 70% accuracy! We also demonstrate an application of Quantum Neural Network (QNN) in the hardware security domain, specifically in Hardware Trojan (HT) detection using a set of power and area Trojan features. The QNN model achieves detection accuracy as high as 97.27%.  ( 2 min )
    Domain Knowledge in A*-Based Causal Discovery. (arXiv:2208.08247v1 [stat.ML])
    Causal discovery has become a vital tool for scientists and practitioners wanting to discover causal relationships from observational data. While most previous approaches to causal discovery have implicitly assumed that no expert domain knowledge is available, practitioners can often provide such domain knowledge from prior experience. Recent work has incorporated domain knowledge into constraint-based causal discovery. The majority of such constraint-based methods, however, assume causal faithfulness, which has been shown to be frequently violated in practice. Consequently, there has been renewed attention towards exact-search score-based causal discovery methods, which do not assume causal faithfulness, such as A*-based methods. However, there has been no consideration of these methods in the context of domain knowledge. In this work, we focus on efficiently integrating several types of domain knowledge into A*-based causal discovery. In doing so, we discuss and explain how domain knowledge can reduce the graph search space and then provide an analysis of the potential computational gains. We support these findings with experiments on synthetic and real data, showing that even small amounts of domain knowledge can dramatically speed up A*-based causal discovery and improve its performance and practicality.  ( 2 min )
    SMPL-IK: Learned Morphology-Aware Inverse Kinematics for AI Driven Artistic Workflows. (arXiv:2208.08274v1 [cs.GR])
    Inverse Kinematics (IK) systems are often rigid with respect to their input character, thus requiring user intervention to be adapted to new skeletons. In this paper we aim at creating a flexible, learned IK solver applicable to a wide variety of human morphologies. We extend a state-of-the-art machine learning IK solver to operate on the well known Skinned Multi-Person Linear model (SMPL). We call our model SMPL-IK, and show that when integrated into real-time 3D software, this extended system opens up opportunities for defining novel AI-assisted animation workflows. For example, pose authoring can be made more flexible with SMPL-IK by allowing users to modify gender and body shape while posing a character. Additionally, when chained with existing pose estimation algorithms, SMPL-IK accelerates posing by allowing users to bootstrap 3D scenes from 2D images while allowing for further editing. Finally, we propose a novel SMPL Shape Inversion mechanism (SMPL-SI) to map arbitrary humanoid characters to the SMPL space, allowing artists to leverage SMPL-IK on custom characters. In addition to qualitative demos showing proposed tools, we present quantitative SMPL-IK baselines on the H36M and AMASS datasets.  ( 2 min )
    DICE: Data-Efficient Clinical Event Extraction with Generative Models. (arXiv:2208.07989v1 [cs.CL])
    Event extraction in the clinical domain is an under-explored research area. The lack of training data in addition to the high volume of domain-specific jargon that includes long entities with vague boundaries make the task especially challenging. In this paper, we introduce DICE, a robust and data-efficient generative model for clinical event extraction. DICE frames event extraction as a conditional generation problem and utilizes descriptions provided by domain experts to boost the performance under low-resource settings. Furthermore, DICE learns to locate and bound biomedical mentions with an auxiliary mention identification task trained jointly with event extraction tasks to leverage inter-task dependencies and further incorporates the identified mentions as trigger and argument candidates for their respective tasks. We also introduce MACCROBAT-EE, the first clinical event extraction dataset with event argument annotation. Our experiments demonstrate the robustness of DICE under low data settings for the clinical domain and the benefits of incorporating flexible joint training and mention markers into generative approaches.  ( 2 min )
    Sampling Through the Lens of Sequential Decision Making. (arXiv:2208.08056v1 [cs.LG])
    Sampling is ubiquitous in machine learning methodologies. Due to the growth of large datasets and model complexity, we want to learn and adapt the sampling process while training a representation. Towards achieving this grand goal, a variety of sampling techniques have been proposed. However, most of them either use a fixed sampling scheme or adjust the sampling scheme based on simple heuristics. They cannot choose the best sample for model training in different stages. Inspired by "Think, Fast and Slow" (System 1 and System 2) in cognitive science, we propose a reward-guided sampling strategy called Adaptive Sample with Reward (ASR) to tackle this challenge. To the best of our knowledge, this is the first work utilizing reinforcement learning (RL) to address the sampling problem in representation learning. Our approach optimally adjusts the sampling process to achieve optimal performance. We explore geographical relationships among samples by distance-based sampling to maximize overall cumulative reward. We apply ASR to the long-standing sampling problems in similarity-based loss functions. Empirical results in information retrieval and clustering demonstrate ASR's superb performance across different datasets. We also discuss an engrossing phenomenon which we name as "ASR gravity well" in experiments.  ( 2 min )
    Maximising the Utility of Validation Sets for Imbalanced Noisy-label Meta-learning. (arXiv:2208.08132v1 [cs.LG])
    Meta-learning is an effective method to handle imbalanced and noisy-label learning, but it depends on a validation set containing randomly selected, manually labelled and balanced distributed samples. The random selection and manual labelling and balancing of this validation set is not only sub-optimal for meta-learning, but it also scales poorly with the number of classes. Hence, recent meta-learning papers have proposed ad-hoc heuristics to automatically build and label this validation set, but these heuristics are still sub-optimal for meta-learning. In this paper, we analyse the meta-learning algorithm and propose new criteria to characterise the utility of the validation set, based on: 1) the informativeness of the validation set; 2) the class distribution balance of the set; and 3) the correctness of the labels of the set. Furthermore, we propose a new imbalanced noisy-label meta-learning (INOLML) algorithm that automatically builds a validation set by maximising its utility using the criteria above. Our method shows significant improvements over previous meta-learning approaches and sets the new state-of-the-art on several benchmarks.  ( 2 min )
    HELP ME THINK: A Simple Prompting Strategy for Non-experts to Create Customized Content with Models. (arXiv:2208.08232v1 [cs.CL])
    Controlling the text generated by language models and customizing the content has been a long-standing challenge. Existing prompting techniques proposed in pursuit of providing control are task-specific and lack generality; this provides overwhelming choices for non-expert users to find a suitable method for their task. The effort associated with those techniques, such as in writing examples, explanations, instructions, etc. further limits their adoption among non-expert users. In this paper, we propose a simple prompting strategy HELP ME THINK where we encourage GPT3 to help non-expert users by asking a set of relevant questions and leveraging user answers to execute the task. We demonstrate the efficacy of our technique HELP ME THINK on a variety of tasks. Specifically, we focus on tasks that are hard for average humans and require significant thinking to perform. We hope our work will encourage the development of unconventional ways to harness the power of large language models.  ( 2 min )
    Prediction of Oral Food Challenges via Machine Learning. (arXiv:2208.08268v1 [cs.LG])
    Oral Food Challenges (OFCs) are essential to accurately diagnosing food allergy in patients. However, patients are hesitant to undergo OFCs, and for those that do, there is limited access to allergists in rural/community healthcare settings. The prediction of OFC outcomes through machine learning methods can facilitate the de-labeling of food allergens at home, improve patient and physician comfort during OFCs, and economize medical resources by minimizing the number of OFCs performed. Clinical data was gathered from 1,112 patients who collectively underwent a total of 1,284 OFCs, and consisted of clinical factors including serum specific IgE, total IgE, skin prick tests (SPTs), symptoms, sex, and age. Using these clinical features, machine learning models were constructed to predict outcomes for peanut, egg, and milk challenge. The best performing model for each allergen was created using the Learning Using Concave and Convex Kernels (LUCCK) method, which achieved an Area under the Curve (AUC) for peanut, egg, and milk OFC prediction of 0.76, 0.68, and 0.70, respectively. Model interpretation via SHapley Additive exPlanations (SHAP) indicate that specific IgE, along with wheal and flare values from SPTs, are highly predictive of OFC outcomes. The results of this analysis suggest that machine learning has the potential to predict OFC outcomes and reveal relevant clinical factors for further study.  ( 3 min )
    Towards an Error-free Deep Occupancy Detector for Smart Camera Parking System. (arXiv:2208.08220v1 [cs.CV])
    Although the smart camera parking system concept has existed for decades, a few approaches have fully addressed the system's scalability and reliability. As the cornerstone of a smart parking system is the ability to detect occupancy, traditional methods use the classification backbone to predict spots from a manual labeled grid. This is time-consuming and loses the system's scalability. Additionally, most of the approaches use deep learning models, making them not error-free and not reliable at scale. Thus, we propose an end-to-end smart camera parking system where we provide an autonomous detecting occupancy by an object detector called OcpDet. Our detector also provides meaningful information from contrastive modules: training and spatial knowledge, which avert false detections during inference. We benchmark OcpDet on the existing PKLot dataset and reach competitive results compared to traditional classification solutions. We also introduce an additional SNU-SPS dataset, in which we estimate the system performance from various views and conduct system evaluation in parking assignment tasks. The result from our dataset shows that our system is promising for real-world applications.  ( 2 min )
    Deep Autoencoder Model Construction Based on Pytorch. (arXiv:2208.08231v1 [cs.LG])
    This paper proposes a deep autoencoder model based on Pytorch. This algorithm introduces the idea of Pytorch into the auto-encoder, and randomly clears the input weights connected to the hidden layer neurons with a certain probability, so as to achieve the effect of sparse network, which is similar to the starting point of the sparse auto-encoder. The new algorithm effectively solves the problem of possible overfitting of the model and improves the accuracy of image classification. Finally, the experiment is carried out, and the experimental results are compared with ELM, RELM, AE, SAE, DAE.  ( 2 min )
    Dynamical softassign and adaptive parameter tuning for graph matching. (arXiv:2208.08233v1 [math.CO])
    This paper studies a framework, projected fixed-point method, for graph matching. The framework contains a class of popular graph matching algorithms, including graduated assignment (GA), integer projected fixed-point method (IPFP) and doubly stochastic projected fixed-point method (DSPFP). We propose an adaptive strategy to tune the step size parameter in this framework. Such a strategy improves these algorithms in efficiency and accuracy. Further, it guarantees the convergence of the underlying algorithms. Some preliminary analysis based on distance geometry seems to support that the optimal step size parameter has a high probability of 1 when graphs are fully connected. Secondly, it is observed that a popular projection method, softassign, is sensitive to graphs' cardinality(size). We proposed a dynamical softassign algorithm that is robust to graphs' cardinality. Combining the adaptive step size and the dynamical softassign, we propose a novel graph matching algorithm: the adaptive projected fixed-point method with dynamical softassign. Various experiments demonstrate that the proposed algorithm is significantly faster than several other state-of-art algorithms with no loss of accuracy.  ( 2 min )
    Semi-Supervised Anomaly Detection Based on Quadratic Multiform Separation. (arXiv:2208.08265v1 [stat.ML])
    In this paper we propose a novel method for semi-supervised anomaly detection (SSAD). Our classifier is named QMS22 as its inception was dated 2022 upon the framework of quadratic multiform separation (QMS), a recently introduced classification model. QMS22 tackles SSAD by solving a multi-class classification problem involving both the training set and the test set of the original problem. The classification problem intentionally includes classes with overlapping samples. One of the classes contains mixture of normal samples and outliers, and all other classes contain only normal samples. An outlier score is then calculated for every sample in the test set using the outcome of the classification problem. We also include performance evaluation of QMS22 against top performing classifiers using ninety-five benchmark imbalanced datasets from the KEEL repository. These classifiers are BRM (Bagging-Random Miner), OCKRA (One-Class K-means with Randomly-projected features Algorithm), ISOF (Isolation Forest), and ocSVM (One-Class Support Vector Machine). It is shown by using the area under the curve of the receiver operating characteristic curve as the performance measure, QMS22 significantly outperforms ISOF and ocSVM. Moreover, the Wilcoxon signed-rank tests reveal that there is no statistically significant difference when testing QMS22 against BRM nor QMS22 against OCKRA.  ( 2 min )
    Deep Learning-Based Discrete Calibrated Survival Prediction. (arXiv:2208.08182v1 [cs.LG])
    Deep neural networks for survival prediction outper-form classical approaches in discrimination, which is the ordering of patients according to their time-of-event. Conversely, classical approaches like the Cox Proportional Hazards model display much better calibration, the correct temporal prediction of events of the underlying distribution. Especially in the medical domain, where it is critical to predict the survival of a single patient, both discrimination and calibration are important performance metrics. Here we present Discrete Calibrated Survival (DCS), a novel deep neural network for discriminated and calibrated survival prediction that outperforms competing survival models in discrimination on three medical datasets, while achieving best calibration among all discrete time models. The enhanced performance of DCS can be attributed to two novel features, the variable temporal output node spacing and the novel loss term that optimizes the use of uncensored and censored patient data. We believe that DCS is an important step towards clinical application of deep-learning-based survival prediction with state-of-the-art discrimination and good calibration.  ( 2 min )
    Constrained Few-Shot Learning: Human-Like Low Sample Complexity Learning and Non-Episodic Text Classification. (arXiv:2208.08089v1 [cs.LG])
    Few-shot learning (FSL) is an emergent paradigm of learning that attempts to learn with low sample complexity to mimic the way humans can learn, generalise and extrapolate based on only a few examples. While FSL attempts to mimic these human characteristics, fundamentally, the task of FSL as conventionally described and modelled using meta-learning with episodic-based training does not fully align with how humans acquire and reason with knowledge. FSL with episodic training, while only using $K$ instances of each test class, still requires a large number of labelled instances from disjoint training classes. In this paper, we introduce the novel task of constrained few-shot learning (CFSL), a special case of FSL where the number of training instances of each class is constrained to be less than some value $M$ thus applying a similar restriction during training and test. We propose a method for CFSL leveraging Cat2Vec using a novel categorical contrastive loss inspired by cognitive theories such as fuzzy trace theory and prototype theory.  ( 2 min )
    Shallow neural network representation of polynomials. (arXiv:2208.08138v1 [stat.ML])
    We show that $d$-variate polynomials of degree $R$ can be represented on $[0,1]^d$ as shallow neural networks of width $d+1+\sum_{r=2}^R\binom{r+d-1}{d-1}[\binom{r+d-1}{d-1}+1]$. Also, by SNN representation of localized Taylor polynomials of univariate $C^\beta$-smooth functions, we derive for shallow networks the minimax optimal rate of convergence, up to a logarithmic factor, to unknown univariate regression function.  ( 2 min )
    DPA-1: Pretraining of Attention-based Deep Potential Model for Molecular Simulation. (arXiv:2208.08236v1 [physics.chem-ph])
    Machine learning assisted modeling of the inter-atomic potential energy surface (PES) is revolutionizing the field of molecular simulation. With the accumulation of high-quality electronic structure data, a model that can be pretrained on all available data and finetuned on downstream tasks with a small additional effort would bring the field to a new stage. Here we propose DPA-1, a Deep Potential model with a novel attention mechanism, which is highly effective for representing the conformation and chemical spaces of atomic systems and learning the PES. We tested DPA-1 on a number of systems and observed superior performance compared with existing benchmarks. When pretrained on large-scale datasets containing 56 elements, DPA-1 can be successfully applied to various downstream tasks with a great improvement of sample efficiency. Surprisingly, for different elements, the learned type embedding parameters form a $spiral$ in the latent space and have a natural correspondence with their positions on the periodic table, showing interesting interpretability of the pretrained DPA-1 model.  ( 2 min )
    A Scalable and Extensible Approach to Benchmarking NL2Code for 18 Programming Languages. (arXiv:2208.08227v1 [cs.LG])
    Large language models have demonstrated the ability to condition on and generate both natural language and programming language text. Such models open up the possibility of multi-language code generation: could code generation models generalize knowledge from one language to another? Although contemporary code generation models can generate semantically correct Python code, little is known about their abilities with other languages. We facilitate the exploration of this topic by proposing MultiPL-E, the first multi-language parallel benchmark for natural-language-to-code-generation. MultiPL-E extends the HumanEval benchmark (Chen et al, 2021) to support 18 more programming languages, encompassing a range of programming paradigms and popularity. We evaluate two state-of-the-art code generation models on MultiPL-E: Codex and InCoder. We find that on several languages, Codex matches and even exceeds its performance on Python. The range of programming languages represented in MultiPL-E allow us to explore the impact of language frequency and language features on model performance. Finally, the MultiPL-E approach of compiling code generation benchmarks to new programming languages is both scalable and extensible. We describe a general approach for easily adding support for new benchmarks and languages to MultiPL-E.  ( 2 min )
    Two-Stage Robust and Sparse Distributed Statistical Inference for Large-Scale Data. (arXiv:2208.08230v1 [stat.ML])
    In this paper, we address the problem of conducting statistical inference in settings involving large-scale data that may be high-dimensional and contaminated by outliers. The high volume and dimensionality of the data require distributed processing and storage solutions. We propose a two-stage distributed and robust statistical inference procedures coping with high-dimensional models by promoting sparsity. In the first stage, known as model selection, relevant predictors are locally selected by applying robust Lasso estimators to the distinct subsets of data. The variable selections from each computation node are then fused by a voting scheme to find the sparse basis for the complete data set. It identifies the relevant variables in a robust manner. In the second stage, the developed statistically robust and computationally efficient bootstrap methods are employed. The actual inference constructs confidence intervals, finds parameter estimates and quantifies standard deviation. Similar to stage 1, the results of local inference are communicated to the fusion center and combined there. By using analytical methods, we establish the favorable statistical properties of the robust and computationally efficient bootstrap methods including consistency for a fixed number of predictors, and robustness. The proposed two-stage robust and distributed inference procedures demonstrate reliable performance and robustness in variable selection, finding confidence intervals and bootstrap approximations of standard deviations even when data is high-dimensional and contaminated by outliers.  ( 3 min )
    Multimodal Lecture Presentations Dataset: Understanding Multimodality in Educational Slides. (arXiv:2208.08080v1 [cs.AI])
    Lecture slide presentations, a sequence of pages that contain text and figures accompanied by speech, are constructed and presented carefully in order to optimally transfer knowledge to students. Previous studies in multimedia and psychology attribute the effectiveness of lecture presentations to their multimodal nature. As a step toward developing AI to aid in student learning as intelligent teacher assistants, we introduce the Multimodal Lecture Presentations dataset as a large-scale benchmark testing the capabilities of machine learning models in multimodal understanding of educational content. Our dataset contains aligned slides and spoken language, for 180+ hours of video and 9000+ slides, with 10 lecturers from various subjects (e.g., computer science, dentistry, biology). We introduce two research tasks which are designed as stepping stones towards AI agents that can explain (automatically captioning a lecture presentation) and illustrate (synthesizing visual figures to accompany spoken explanations) educational content. We provide manual annotations to help implement these two research tasks and evaluate state-of-the-art models on them. Comparing baselines and human student performances, we find that current models struggle in (1) weak crossmodal alignment between slides and spoken text, (2) learning novel visual mediums, (3) technical language, and (4) long-range sequences. Towards addressing this issue, we also introduce PolyViLT, a multimodal transformer trained with a multi-instance learning loss that is more effective than current approaches. We conclude by shedding light on the challenges and opportunities in multimodal understanding of educational presentations.  ( 3 min )
    FedPerm: Private and Robust Federated Learning by Parameter Permutation. (arXiv:2208.07922v1 [cs.LG])
    Federated Learning (FL) is a distributed learning paradigm that enables mutually untrusting clients to collaboratively train a common machine learning model. Client data privacy is paramount in FL. At the same time, the model must be protected from poisoning attacks from adversarial clients. Existing solutions address these two problems in isolation. We present FedPerm, a new FL algorithm that addresses both these problems by combining a novel intra-model parameter shuffling technique that amplifies data privacy, with Private Information Retrieval (PIR) based techniques that permit cryptographic aggregation of clients' model updates. The combination of these techniques further helps the federation server constrain parameter updates from clients so as to curtail effects of model poisoning attacks by adversarial clients. We further present FedPerm's unique hyperparameters that can be used effectively to trade off computation overheads with model utility. Our empirical evaluation on the MNIST dataset demonstrates FedPerm's effectiveness over existing Differential Privacy (DP) enforcement solutions in FL.  ( 2 min )
    DLCFT: Deep Linear Continual Fine-Tuning for General Incremental Learning. (arXiv:2208.08112v1 [cs.LG])
    Pre-trained representation is one of the key elements in the success of modern deep learning. However, existing works on continual learning methods have mostly focused on learning models incrementally from scratch. In this paper, we explore an alternative framework to incremental learning where we continually fine-tune the model from a pre-trained representation. Our method takes advantage of linearization technique of a pre-trained neural network for simple and effective continual learning. We show that this allows us to design a linear model where quadratic parameter regularization method is placed as the optimal continual learning policy, and at the same time enjoying the high performance of neural networks. We also show that the proposed algorithm enables parameter regularization methods to be applied to class-incremental problems. Additionally, we provide a theoretical reason why the existing parameter-space regularization algorithms such as EWC underperform on neural networks trained with cross-entropy loss. We show that the proposed method can prevent forgetting while achieving high continual fine-tuning performance on image classification tasks. To show that our method can be applied to general continual learning settings, we evaluate our method in data-incremental, task-incremental, and class-incremental learning problems.  ( 2 min )
    Metric Residual Networks for Sample Efficient Goal-conditioned Reinforcement Learning. (arXiv:2208.08133v1 [cs.LG])
    Goal-conditioned reinforcement learning (GCRL) has a wide range of potential real-world applications, including manipulation and navigation problems in robotics. Especially in such robotics task, sample efficiency is of the utmost importance for GCRL since, by default, the agent is only rewarded when it reaches its goal. While several methods have been proposed to improve the sample efficiency of GCRL, one relatively under-studied approach is the design of neural architectures to support sample efficiency. In this work, we introduce a novel neural architecture for GCRL that achieves significantly better sample efficiency than the commonly-used monolithic network architecture. They key insight is that the optimal action value function Q^*(s, a, g) must satisfy the triangle inequality in a specific sense. Furthermore, we introduce the metric residual network (MRN) that deliberately decomposes the action-value function Q(s,a,g) into the negated summation of a metric plus a residual asymmetric component. MRN provably approximates any optimal action-value function Q^*(s,a,g), thus making it a fitting neural architecture for GCRL. We conduct comprehensive experiments across 12 standard benchmark environments in GCRL. The empirical results demonstrate that MRN uniformly outperforms other state-of-the-art GCRL neural architectures in terms of sample efficiency.  ( 2 min )
    ILLUME: Rationalizing Vision-Language Models by Interacting with their Jabber. (arXiv:2208.08241v1 [cs.LG])
    Bootstrapping from pre-trained language models has been proven to be an efficient approach for building foundation vision-language models (VLM) for tasks such as image captioning or visual question answering. However, it is difficult-if not impossible-to utilize it to make the model conform with user's rationales for specific answers. To elicit and reinforce commonsense reasons, we propose an iterative sampling and tuning paradigm, called ILLUME, that executes the following loop: Given an image-question-answer prompt, the VLM samples multiple candidate rationales, and a human critic provides minimal feedback via preference selection, used for fine-tuning. This loop increases the training data and gradually carves out the VLM's rationalization capabilities. Our exhaustive experiments demonstrate that ILLUME is competitive with standard supervised fine-tuning while using significantly fewer training data and only requiring minimal feedback.  ( 2 min )
    Assurance Cases as Foundation Stone for Auditing AI-enabled and Autonomous Systems: Workshop Results and Political Recommendations for Action from the ExamAI Project. (arXiv:2208.08198v1 [cs.SE])
    The European Machinery Directive and related harmonized standards do consider that software is used to generate safety-relevant behavior of the machinery but do not consider all kinds of software. In particular, software based on machine learning (ML) are not considered for the realization of safety-relevant behavior. This limits the introduction of suitable safety concepts for autonomous mobile robots and other autonomous machinery, which commonly depend on ML-based functions. We investigated this issue and the way safety standards define safety measures to be implemented against software faults. Functional safety standards use Safety Integrity Levels (SILs) to define which safety measures shall be implemented. They provide rules for determining the SIL and rules for selecting safety measures depending on the SIL. In this paper, we argue that this approach can hardly be adopted with respect to ML and other kinds of Artificial Intelligence (AI). Instead of simple rules for determining an SIL and applying related measures against faults, we propose the use of assurance cases to argue that the individually selected and applied measures are sufficient in the given case. To get a first rating regarding the feasibility and usefulness of our proposal, we presented and discussed it in a workshop with experts from industry, German statutory accident insurance companies, work safety and standardization commissions, and representatives from various national, European, and international working groups dealing with safety and AI. In this paper, we summarize the proposal and the workshop discussion. Moreover, we check to which extent our proposal is in line with the European AI Act proposal and current safety standardization initiatives addressing AI and Autonomous Systems  ( 3 min )
    Efficient Detection and Filtering Systems for Distributed Training. (arXiv:2208.08085v1 [cs.LG])
    A plethora of modern machine learning tasks require the utilization of large-scale distributed clusters as a critical component of the training pipeline. However, abnormal Byzantine behavior of the worker nodes can derail the training and compromise the quality of the inference. Such behavior can be attributed to unintentional system malfunctions or orchestrated attacks; as a result, some nodes may return arbitrary results to the parameter server (PS) that coordinates the training. Recent work considers a wide range of attack models and has explored robust aggregation and/or computational redundancy to correct the distorted gradients. In this work, we consider attack models ranging from strong ones: $q$ omniscient adversaries with full knowledge of the defense protocol that can change from iteration to iteration to weak ones: $q$ randomly chosen adversaries with limited collusion abilities which only change every few iterations at a time. Our algorithms rely on redundant task assignments coupled with detection of adversarial behavior. For strong attacks, we demonstrate a reduction in the fraction of distorted gradients ranging from 16\%-99\% as compared to the prior state-of-the-art. Our top-1 classification accuracy results on the CIFAR-10 data set demonstrate 25\% advantage in accuracy (averaged over strong and weak scenarios) under the most sophisticated attacks compared to state-of-the-art methods.  ( 3 min )
    Ex-Ante Assessment of Discrimination in Dataset. (arXiv:2208.07918v1 [cs.LG])
    Data owners face increasing liability for how the use of their data could harm under-priviliged communities. Stakeholders would like to identify the characteristics of data that lead to algorithms being biased against any particular demographic groups, for example, defined by their race, gender, age, and/or religion. Specifically, we are interested in identifying subsets of the feature space where the ground truth response function from features to observed outcomes differs across demographic groups. To this end, we propose FORESEE, a FORESt of decision trEEs algorithm, which generates a score that captures how likely an individual's response varies with sensitive attributes. Empirically, we find that our approach allows us to identify the individuals who are most likely to be misclassified by several classifiers, including Random Forest, Logistic Regression, Support Vector Machine, and k-Nearest Neighbors. The advantage of our approach is that it allows stakeholders to characterize risky samples that may contribute to discrimination, as well as, use the FORESEE to estimate the risk of upcoming samples.  ( 2 min )
    Tiny-HR: Towards an interpretable machine learning pipeline for heart rate estimation on edge devices. (arXiv:2208.07981v1 [cs.LG])
    The focus of this paper is a proof of concept, machine learning (ML) pipeline that extracts heart rate from pressure sensor data acquired on low-power edge devices. The ML pipeline consists an upsampler neural network, a signal quality classifier, and a 1D-convolutional neural network optimized for efficient and accurate heart rate estimation. The models were designed so the pipeline was less than 40 kB. Further, a hybrid pipeline consisting of the upsampler and classifier, followed by a peak detection algorithm was developed. The pipelines were deployed on ESP32 edge device and benchmarked against signal processing to determine the energy usage, and inference times. The results indicate that the proposed ML and hybrid pipeline reduces energy and time per inference by 82% and 28% compared to traditional algorithms. The main trade-off for ML pipeline was accuracy, with a mean absolute error (MAE) of 3.28, compared to 2.39 and 1.17 for the hybrid and signal processing pipelines. The ML models thus show promise for deployment in energy and computationally constrained devices. Further, the lower sampling rate and computational requirements for the ML pipeline could enable custom hardware solutions to reduce the cost and energy needs of wearable devices.  ( 3 min )
    Gradient-Based Meta-Learning Using Uncertainty to Weigh Loss for Few-Shot Learning. (arXiv:2208.08135v1 [cs.LG])
    Model-Agnostic Meta-Learning (MAML) is one of the most successful meta-learning techniques for few-shot learning. It uses gradient descent to learn commonalities between various tasks, enabling the model to learn the meta-initialization of its own parameters to quickly adapt to new tasks using a small amount of labeled training data. A key challenge to few-shot learning is task uncertainty. Although a strong prior can be obtained from meta-learning with a large number of tasks, a precision model of the new task cannot be guaranteed because the volume of the training dataset is normally too small. In this study, first,in the process of choosing initialization parameters, the new method is proposed for task-specific learner adaptively learn to select initialization parameters that minimize the loss of new tasks. Then, we propose two improved methods for the meta-loss part: Method 1 generates weights by comparing meta-loss differences to improve the accuracy when there are few classes, and Method 2 introduces the homoscedastic uncertainty of each task to weigh multiple losses based on the original gradient descent,as a way to enhance the generalization ability to novel classes while ensuring accuracy improvement. Compared with previous gradient-based meta-learning methods, our model achieves better performance in regression tasks and few-shot classification and improves the robustness of the model to the learning rate and query sets in the meta-test set.  ( 3 min )
    Mixed Quantum-Classical Method For Fraud Detection with Quantum Feature Selection. (arXiv:2208.07963v1 [quant-ph])
    This paper presents a first end-to-end application of a Quantum Support Vector Machine (QSVM) algorithm for a classification problem in the financial payment industry using the IBM Safer Payments and IBM Quantum Computers via the Qiskit software stack. Based on real card payment data, a thorough comparison is performed to assess the complementary impact brought in by the current state-of-the-art Quantum Machine Learning algorithms with respect to the Classical Approach. A new method to search for best features is explored using the Quantum Support Vector Machine's feature map characteristics. The results are compared using fraud specific key performance indicators: Accuracy, Recall, and False Positive Rate, extracted from analyses based on human expertise (rule decisions), classical machine learning algorithms (Random Forest, XGBoost) and quantum based machine learning algorithms using QSVM. In addition, a hybrid classical-quantum approach is explored by using an ensemble model that combines classical and quantum algorithms to better improve the fraud prevention decision. We found, as expected, that the results highly depend on feature selections and algorithms that are used to select them. The QSVM provides a complementary exploration of the feature space which led to an improved accuracy of the mixed quantum-classical method for fraud detection, on a drastically reduced data set to fit current state of Quantum Hardware.  ( 3 min )
    AHEAD: A Triple Attention Based Heterogeneous Graph Anomaly Detection Approach. (arXiv:2208.08200v1 [cs.SI])
    Graph anomaly detection on attributed networks has become a prevalent research topic due to its broad applications in many influential domains. In real-world scenarios, nodes and edges in attributed networks usually display distinct heterogeneity, i.e. attributes of different types of nodes show great variety, different types of relations represent diverse meanings. Anomalies usually perform differently from the majority in various perspectives of heterogeneity in these networks. However, existing graph anomaly detection approaches do not leverage heterogeneity in attributed networks, which is highly related to anomaly detection. In light of this problem, we propose AHEAD: a heterogeneity-aware unsupervised graph anomaly detection approach based on the encoder-decoder framework. Specifically, for the encoder, we design three levels of attention, i.e. attribute level, node type level, and edge level attentions to capture the heterogeneity of network structure, node properties and information of a single node, respectively. In the decoder, we exploit structure, attribute, and node type reconstruction terms to obtain an anomaly score for each node. Extensive experiments show the superiority of AHEAD on several real-world heterogeneous information networks compared with the state-of-arts in the unsupervised setting. Further experiments verify the effectiveness and robustness of our triple attention, model backbone, and decoder in general.  ( 3 min )
    A Monotonicity Constrained Attention Module for Emotion Classification with Limited EEG Data. (arXiv:2208.08155v1 [eess.SP])
    In this work, a parameter-efficient attention module is presented for emotion classification using a limited, or relatively small, number of electroencephalogram (EEG) signals. This module is called the Monotonicity Constrained Attention Module (MCAM) due to its capability of incorporating priors on the monotonicity when converting features' Gram matrices into attention matrices for better feature refinement. Our experiments have shown that MCAM's effectiveness is comparable to state-of-the-art attention modules in boosting the backbone network's performance in prediction while requiring less parameters. Several accompanying sensitivity analyses on trained models' prediction concerning different attacks are also performed. These attacks include various frequency domain filtering levels and gradually morphing between samples associated with multiple labels. Our results can help better understand different modules' behaviour in prediction and can provide guidance in applications where data is limited and are with noises.  ( 2 min )
    Random Search Hyper-Parameter Tuning: Expected Improvement Estimation and the Corresponding Lower Bound. (arXiv:2208.08170v1 [cs.LG])
    Hyperparameter tuning is a common technique for improving the performance of neural networks. Most techniques for hyperparameter search involve an iterated process where the model is retrained at every iteration. However, the expected accuracy improvement from every additional search iteration, is still unknown. Calculating the expected improvement can help create stopping rules for hyperparameter tuning and allow for a wiser allocation of a project's computational budget. In this paper, we establish an empirical estimate for the expected accuracy improvement from an additional iteration of hyperparameter search. Our results hold for any hyperparameter tuning method which is based on random search \cite{bergstra2012random} and samples hyperparameters from a fixed distribution. We bound our estimate with an error of $O\left(\sqrt{\frac{\log k}{k}}\right)$ w.h.p. where $k$ is the current number of iterations. To the best of our knowledge this is the first bound on the expected gain from an additional iteration of hyperparameter search. Finally, we demonstrate that the optimal estimate for the expected accuracy will still have an error of $\frac{1}{k}$.  ( 2 min )
    Autonomous Resource Management in Construction Companies Using Deep Reinforcement Learning Based on IoT. (arXiv:2208.08087v1 [cs.LG])
    Resource allocation is one of the most critical issues in planning construction projects, due to its direct impact on cost, time, and quality. There are usually specific allocation methods for autonomous resource management according to the projects objectives. However, integrated planning and optimization of utilizing resources in an entire construction organization are scarce. The purpose of this study is to present an automatic resource allocation structure for construction companies based on Deep Reinforcement Learning (DRL), which can be used in various situations. In this structure, Data Harvesting (DH) gathers resource information from the distributed Internet of Things (IoT) sensor devices all over the companys projects to be employed in the autonomous resource management approach. Then, Coverage Resources Allocation (CRA) is compared to the information obtained from DH in which the Autonomous Resource Management (ARM) determines the project of interest. Likewise, Double Deep Q-Networks (DDQNs) with similar models are trained on two distinct assignment situations based on structured resource information of the company to balance objectives with resource constraints. The suggested technique in this paper can efficiently adjust to large resource management systems by combining portfolio information with adopted individual project information. Also, the effects of important information processing parameters on resource allocation performance are analyzed in detail. Moreover, the results of the generalizability of management approaches are presented, indicating no need for additional training when the variables of situations change.  ( 3 min )
    DeepSportradar-v1: Computer Vision Dataset for Sports Understanding with High Quality Annotations. (arXiv:2208.08190v1 [cs.CV])
    With the recent development of Deep Learning applied to Computer Vision, sport video understanding has gained a lot of attention, providing much richer information for both sport consumers and leagues. This paper introduces DeepSportradar-v1, a suite of computer vision tasks, datasets and benchmarks for automated sport understanding. The main purpose of this framework is to close the gap between academic research and real world settings. To this end, the datasets provide high-resolution raw images, camera parameters and high quality annotations. DeepSportradar currently supports four challenging tasks related to basketball: ball 3D localization, camera calibration, player instance segmentation and player re-identification. For each of the four tasks, a detailed description of the dataset, objective, performance metrics, and the proposed baseline method are provided. To encourage further research on advanced methods for sport understanding, a competition is organized as part of the MMSports workshop from the ACM Multimedia 2022 conference, where participants have to develop state-of-the-art methods to solve the above tasks. The four datasets, development kits and baselines are publicly available.  ( 3 min )
    Enhancing Audio Perception of Music By AI Picked Room Acoustics. (arXiv:2208.07994v1 [cs.SD])
    Every sound that we hear is the result of successive convolutional operations (e.g. room acoustics, microphone characteristics, resonant properties of the instrument itself, not to mention characteristics and limitations of the sound reproduction system). In this work we seek to determine the best room in which to perform a particular piece using AI. Additionally, we use room acoustics as a way to enhance the perceptual qualities of a given sound. Historically, rooms (particularly Churches and concert halls) were designed to host and serve specific musical functions. In some cases the architectural acoustical qualities enhanced the music performed there. We try to mimic this, as a first step, by designating room impulse responses that would correlate to producing enhanced sound quality for particular music. A convolutional architecture is first trained to take in an audio sample and mimic the ratings of experts with about 78 % accuracy for various instrument families and notes for perceptual qualities. This gives us a scoring function for any audio sample which can rate the perceptual pleasantness of a note automatically. Now, via a library of about 60,000 synthetic impulse responses mimicking all kinds of room, materials, etc, we use a simple convolution operation, to transform the sound as if it was played in a particular room. The perceptual evaluator is used to rank the musical sounds, and yield the "best room or the concert hall" to play a sound. As a byproduct it can also use room acoustics to turn a poor quality sound into a "good" sound.  ( 3 min )
    Riemannian Diffusion Models. (arXiv:2208.07949v1 [cs.LG])
    Diffusion models are recent state-of-the-art methods for image generation and likelihood estimation. In this work, we generalize continuous-time diffusion models to arbitrary Riemannian manifolds and derive a variational framework for likelihood estimation. Computationally, we propose new methods for computing the Riemannian divergence which is needed in the likelihood estimation. Moreover, in generalizing the Euclidean case, we prove that maximizing this variational lower-bound is equivalent to Riemannian score matching. Empirically, we demonstrate the expressive power of Riemannian diffusion models on a wide spectrum of smooth manifolds, such as spheres, tori, hyperboloids, and orthogonal groups. Our proposed method achieves new state-of-the-art likelihoods on all benchmarks.  ( 2 min )
    On the generalization of learning algorithms that do not converge. (arXiv:2208.07951v1 [cs.LG])
    Generalization analyses of deep learning typically assume that the training converges to a fixed point. But, recent results indicate that in practice, the weights of deep neural networks optimized with stochastic gradient descent often oscillate indefinitely. To reduce this discrepancy between theory and practice, this paper focuses on the generalization of neural networks whose training dynamics do not necessarily converge to fixed points. Our main contribution is to propose a notion of statistical algorithmic stability (SAS) that extends classical algorithmic stability to non-convergent algorithms and to study its connection to generalization. This ergodic-theoretic approach leads to new insights when compared to the traditional optimization and learning theory perspectives. We prove that the stability of the time-asymptotic behavior of a learning algorithm relates to its generalization and empirically demonstrate how loss dynamics can provide clues to generalization performance. Our findings provide evidence that networks that "train stably generalize better" even when the training continues indefinitely and the weights do not converge.  ( 2 min )
    Private Estimation with Public Data. (arXiv:2208.07984v1 [cs.LG])
    We initiate the study of differentially private (DP) estimation with access to a small amount of public data. For private estimation of d-dimensional Gaussians, we assume that the public data comes from a Gaussian that may have vanishing similarity in total variation distance with the underlying Gaussian of the private data. We show that under the constraints of pure or concentrated DP, d+1 public data samples are sufficient to remove any dependence on the range parameters of the private data distribution from the private sample complexity, which is known to be otherwise necessary without public data. For separated Gaussian mixtures, we assume that the underlying public and private distributions are the same, and we consider two settings: (1) when given a dimension-independent amount of public data, the private sample complexity can be improved polynomially in terms of the number of mixture components, and any dependence on the range parameters of the distribution can be removed in the approximate DP case; (2) when given an amount of public data linear in the dimension, the private sample complexity can be made independent of range parameters even under concentrated DP, and additional improvements can be made to the overall sample complexity.  ( 2 min )
    ShortcutLens: A Visual Analytics Approach for Exploring Shortcuts in Natural Language Understanding Dataset. (arXiv:2208.08010v1 [cs.HC])
    Benchmark datasets play an important role in evaluating Natural Language Understanding (NLU) models. However, shortcuts -- unwanted biases in the benchmark datasets -- can damage the effectiveness of benchmark datasets in revealing models' real capabilities. Since shortcuts vary in coverage, productivity, and semantic meaning, it is challenging for NLU experts to systematically understand and avoid them when creating benchmark datasets. In this paper, we develop a visual analytics system, ShortcutLens, to help NLU experts explore shortcuts in NLU benchmark datasets. The system allows users to conduct multi-level exploration of shortcuts. Specifically, Statistics View helps users grasp the statistics such as coverage and productivity of shortcuts in the benchmark dataset. Template View employs hierarchical and interpretable templates to summarize different types of shortcuts. Instance View allows users to check the corresponding instances covered by the shortcuts. We conduct case studies and expert interviews to evaluate the effectiveness and usability of the system. The results demonstrate that ShortcutLens supports users in gaining a better understanding of benchmark dataset issues through shortcuts, inspiring them to create challenging and pertinent benchmark datasets.  ( 2 min )
    Resource-aware Federated Learning using Knowledge Extraction and Multi-model Fusion. (arXiv:2208.07978v1 [cs.DC])
    With increasing concern about user data privacy, federated learning (FL) has been developed as a unique training paradigm for training machine learning models on edge devices without access to sensitive data. Traditional FL and existing methods directly employ aggregation methods on all edges of the same models and training devices for a cloud server. Although these methods protect data privacy, they are not capable of model heterogeneity, even ignore the heterogeneous computing power, and incur steep communication costs. In this paper, we purpose a resource-aware FL to aggregate an ensemble of local knowledge extracted from edge models, instead of aggregating the weights of each local model, which is then distilled into a robust global knowledge as the server model through knowledge distillation. The local model and the global knowledge are extracted into a tiny size knowledge network by deep mutual learning. Such knowledge extraction allows the edge client to deploy a resource-aware model and perform multi-model knowledge fusion while maintaining communication efficiency and model heterogeneity. Empirical results show that our approach has significantly improved over existing FL algorithms in terms of communication cost and generalization performance in heterogeneous data and models. Our approach reduces the communication cost of VGG-11 by up to 102$\times$ and ResNet-32 by up to 30$\times$ when training ResNet-20 as the knowledge network.  ( 3 min )
    Collaborative causal inference on distributed data. (arXiv:2208.07898v1 [stat.ME])
    The development of technologies for causal inference with the privacy preservation of distributed data has attracted considerable attention in recent years. To address this issue, we propose a quasi-experiment based on data collaboration (DC-QE) that enables causal inference from distributed data with privacy preservation. Our method preserves the privacy of private data by sharing only dimensionality-reduced intermediate representations, which are individually constructed by each party. Moreover, our method can reduce both random errors and biases, whereas existing methods can only reduce random errors in the estimation of treatment effects. Through numerical experiments on both artificial and real-world data, we confirmed that our method can lead to better estimation results than individual analyses. With the spread of our method, intermediate representations can be published as open data to help researchers find causalities and accumulated as a knowledge base.  ( 2 min )
    Streaming Adaptive Submodular Maximization. (arXiv:2208.08021v1 [cs.AI])
    Many sequential decision making problems can be formulated as an adaptive submodular maximization problem. However, most of existing studies in this field focus on pool-based setting, where one can pick items in any order, and there have been few studies for the stream-based setting where items arrive in an arbitrary order and one must immediately decide whether to select an item or not upon its arrival. In this paper, we introduce a new class of utility functions, semi-policywise submodular functions. We develop a series of effective algorithms to maximize a semi-policywise submodular function under the stream-based setting.  ( 2 min )
    Quantum Bayes AI. (arXiv:2208.08068v1 [stat.ML])
    Quantum Bayesian AI (Q-B) is an emerging field that levers the computational gains available in Quantum computing. The promise is an exponential speed-up in many Bayesian algorithms. Our goal is to apply these methods directly to statistical and machine learning problems. We provide a duality between classical and quantum probability for calculating of posterior quantities of interest. Our framework unifies MCMC, Deep Learning and Quantum Learning calculations from the viewpoint from von Neumann's principle of quantum measurement. Quantum embeddings and neural gates are also an important part of data encoding and feature selection. There is a natural duality with well-known kernel methods in statistical learning. We illustrate the behaviour of quantum algorithms on two simple classification algorithms. Finally, we conclude with directions for future research.  ( 2 min )
    Interference Cancellation GAN Framework for Dynamic Channels. (arXiv:2208.08019v1 [cs.LG])
    Symbol detection is a fundamental and challenging problem in modern communication systems, e.g., multiuser multiple-input multiple-output (MIMO) setting. Iterative Soft Interference Cancellation (SIC) is a state-of-the-art method for this task and recently motivated data-driven neural network models, e.g. DeepSIC, that can deal with unknown non-linear channels. However, these neural network models require thorough timeconsuming training of the networks before applying, and is thus not readily suitable for highly dynamic channels in practice. We introduce an online training framework that can swiftly adapt to any changes in the channel. Our proposed framework unifies the recent deep unfolding approaches with the emerging generative adversarial networks (GANs) to capture any changes in the channel and quickly adjust the networks to maintain the top performance of the model. We demonstrate that our framework significantly outperforms recent neural network models on highly dynamic channels and even surpasses those on the static channel in our experiments.  ( 2 min )
    Artificial Intelligence Empowered Multiple Access for Ultra Reliable and Low Latency THz Wireless Networks. (arXiv:2208.08039v1 [eess.SP])
    Terahertz (THz) wireless networks are expected to catalyze the beyond fifth generation (B5G) era. However, due to the directional nature and the line-of-sight demand of THz links, as well as the ultra-dense deployment of THz networks, a number of challenges that the medium access control (MAC) layer needs to face are created. In more detail, the need of rethinking user association and resource allocation strategies by incorporating artificial intelligence (AI) capable of providing "real-time" solutions in complex and frequently changing environments becomes evident. Moreover, to satisfy the ultra-reliability and low-latency demands of several B5G applications, novel mobility management approaches are required. Motivated by this, this article presents a holistic MAC layer approach that enables intelligent user association and resource allocation, as well as flexible and adaptive mobility management, while maximizing systems' reliability through blockage minimization. In more detail, a fast and centralized joint user association, radio resource allocation, and blockage avoidance by means of a novel metaheuristic-machine learning framework is documented, that maximizes the THz networks performance, while minimizing the association latency by approximately three orders of magnitude. To support, within the access point (AP) coverage area, mobility management and blockage avoidance, a deep reinforcement learning (DRL) approach for beam-selection is discussed. Finally, to support user mobility between coverage areas of neighbor APs, a proactive hand-over mechanism based on AI-assisted fast channel prediction is~reported.  ( 3 min )
    Paint2Pix: Interactive Painting based Progressive Image Synthesis and Editing. (arXiv:2208.08092v1 [cs.CV])
    Controllable image synthesis with user scribbles is a topic of keen interest in the computer vision community. In this paper, for the first time we study the problem of photorealistic image synthesis from incomplete and primitive human paintings. In particular, we propose a novel approach paint2pix, which learns to predict (and adapt) "what a user wants to draw" from rudimentary brushstroke inputs, by learning a mapping from the manifold of incomplete human paintings to their realistic renderings. When used in conjunction with recent works in autonomous painting agents, we show that paint2pix can be used for progressive image synthesis from scratch. During this process, paint2pix allows a novice user to progressively synthesize the desired image output, while requiring just few coarse user scribbles to accurately steer the trajectory of the synthesis process. Furthermore, we find that our approach also forms a surprisingly convenient approach for real image editing, and allows the user to perform a diverse range of custom fine-grained edits through the addition of only a few well-placed brushstrokes. Supplemental video and demo are available at https://1jsingh.github.io/paint2pix  ( 2 min )
    A Survey on Incomplete Multi-view Clustering. (arXiv:2208.08040v1 [cs.LG])
    Conventional multi-view clustering seeks to partition data into respective groups based on the assumption that all views are fully observed. However, in practical applications, such as disease diagnosis, multimedia analysis, and recommendation system, it is common to observe that not all views of samples are available in many cases, which leads to the failure of the conventional multi-view clustering methods. Clustering on such incomplete multi-view data is referred to as incomplete multi-view clustering. In view of the promising application prospects, the research of incomplete multi-view clustering has noticeable advances in recent years. However, there is no survey to summarize the current progresses and point out the future research directions. To this end, we review the recent studies of incomplete multi-view clustering. Importantly, we provide some frameworks to unify the corresponding incomplete multi-view clustering methods, and make an in-depth comparative analysis for some representative methods from theoretical and experimental perspectives. Finally, some open problems in the incomplete multi-view clustering field are offered for researchers.  ( 2 min )
    Online Learning for Mixture of Multivariate Hawkes Processes. (arXiv:2208.07961v1 [stat.ML])
    Online learning of Hawkes processes has received increasing attention in the last couple of years especially for modeling a network of actors. However, these works typically either model the rich interaction between the events or the latent cluster of the actors or the network structure between the actors. We propose to model the latent structure of the network of actors as well as their rich interaction across events for real-world settings of medical and financial applications. Experimental results on both synthetic and real-world data showcase the efficacy of our approach.  ( 2 min )
    PD-MORL: Preference-Driven Multi-Objective Reinforcement Learning Algorithm. (arXiv:2208.07914v1 [cs.LG])
    Many real-world problems involve multiple, possibly conflicting, objectives. Multi-objective reinforcement learning (MORL) approaches have emerged to tackle these problems by maximizing a joint objective function weighted by a preference vector. These approaches find fixed customized policies corresponding to preference vectors specified during training. However, the design constraints and objectives typically change dynamically in real-life scenarios. Furthermore, storing a policy for each potential preference is not scalable. Hence, obtaining a set of Pareto front solutions for the entire preference space in a given domain with a single training is critical. To this end, we propose a novel MORL algorithm that trains a single universal network to cover the entire preference space. The proposed approach, Preference-Driven MORL (PD-MORL), utilizes the preferences as guidance to update the network parameters. After demonstrating PD-MORL using classical Deep Sea Treasure and Fruit Tree Navigation benchmarks, we evaluate its performance on challenging multi-objective continuous control tasks.  ( 2 min )
    FOLD-SE: Scalable Explainable AI. (arXiv:2208.07912v1 [cs.LG])
    FOLD-R++ is a highly efficient and explainable rule-based machine learning algorithm for binary classification tasks. It generates a stratified normal logic program as an (explainable) trained model. We present an improvement over the FOLD-R++ algorithm, termed FOLD-SE, that provides scalable explainability (SE) while inheriting all the merits of FOLD-R++. Scalable explainability means that regardless of the size of the dataset, the number of learned rules and learned literals stay small and, hence, understandable by human beings, while maintaining good performance in classification. FOLD-SE is competitive in performance with state-of-the-art algorithms such as XGBoost and Multi-Layer Perceptrons (MLP). However, unlike XGBoost and MLP, the FOLD-SE algorithm generates a model with scalable explainability. The FOLD-SE algorithm outperforms FOLD-R++ and RIPPER algorithms in efficiency, performance, and explainability, especially for large datasets. The FOLD-RM algorithm is an extension of FOLD-R++ for multi-class classification tasks. An improved FOLD-RM algorithm built upon FOLD-SE is also presented.  ( 2 min )
    Measuring Statistical Dependencies via Maximum Norm and Characteristic Functions. (arXiv:2208.07934v1 [cs.LG])
    In this paper, we focus on the problem of statistical dependence estimation using characteristic functions. We propose a statistical dependence measure, based on the maximum-norm of the difference between joint and product-marginal characteristic functions. The proposed measure can detect arbitrary statistical dependence between two random vectors of possibly different dimensions, is differentiable, and easily integrable into modern machine learning and deep learning pipelines. We also conduct experiments both with simulated and real data. Our simulations show, that the proposed method can measure statistical dependencies in high-dimensional, non-linear data, and is less affected by the curse of dimensionality, compared to the previous work in this line of research. The experiments with real data demonstrate the potential applicability of our statistical measure for two different empirical inference scenarios, showing statistically significant improvement in the performance characteristics when applied for supervised feature extraction and deep neural network regularization. In addition, we provide a link to the accompanying open-source repository https://bit.ly/3d4ch5I.  ( 2 min )
  • Open

    Localized Debiased Machine Learning: Efficient Inference on Quantile Treatment Effects and Beyond. (arXiv:1912.12945v5 [stat.ML] UPDATED)
    We consider estimating a low-dimensional parameter in an estimating equation involving high-dimensional nuisances that depend on the parameter. A central example is the efficient estimating equation for the (local) quantile treatment effect ((L)QTE) in causal inference, which involves as a nuisance the covariate-conditional cumulative distribution function evaluated at the quantile to be estimated. Debiased machine learning (DML) is a data-splitting approach to estimating high-dimensional nuisances using flexible machine learning methods, but applying it to problems with parameter-dependent nuisances is impractical. For (L)QTE, DML requires we learn the whole covariate-conditional cumulative distribution function. We instead propose localized debiased machine learning (LDML), which avoids this burdensome step and needs only estimate nuisances at a single initial rough guess for the parameter. For (L)QTE, LDML involves learning just two regression functions, a standard task for machine learning methods. We prove that under lax rate conditions our estimator has the same favorable asymptotic behavior as the infeasible estimator that uses the unknown true nuisances. Thus, LDML notably enables practically-feasible and theoretically-grounded efficient estimation of important quantities in causal inference such as (L)QTEs when we must control for many covariates and/or flexible relationships, as we demonstrate in empirical studies.  ( 3 min )
    Semi-Supervised Anomaly Detection Based on Quadratic Multiform Separation. (arXiv:2208.08265v1 [stat.ML])
    In this paper we propose a novel method for semi-supervised anomaly detection (SSAD). Our classifier is named QMS22 as its inception was dated 2022 upon the framework of quadratic multiform separation (QMS), a recently introduced classification model. QMS22 tackles SSAD by solving a multi-class classification problem involving both the training set and the test set of the original problem. The classification problem intentionally includes classes with overlapping samples. One of the classes contains mixture of normal samples and outliers, and all other classes contain only normal samples. An outlier score is then calculated for every sample in the test set using the outcome of the classification problem. We also include performance evaluation of QMS22 against top performing classifiers using ninety-five benchmark imbalanced datasets from the KEEL repository. These classifiers are BRM (Bagging-Random Miner), OCKRA (One-Class K-means with Randomly-projected features Algorithm), ISOF (Isolation Forest), and ocSVM (One-Class Support Vector Machine). It is shown by using the area under the curve of the receiver operating characteristic curve as the performance measure, QMS22 significantly outperforms ISOF and ocSVM. Moreover, the Wilcoxon signed-rank tests reveal that there is no statistically significant difference when testing QMS22 against BRM nor QMS22 against OCKRA.  ( 2 min )
    Domain Knowledge in A*-Based Causal Discovery. (arXiv:2208.08247v1 [stat.ML])
    Causal discovery has become a vital tool for scientists and practitioners wanting to discover causal relationships from observational data. While most previous approaches to causal discovery have implicitly assumed that no expert domain knowledge is available, practitioners can often provide such domain knowledge from prior experience. Recent work has incorporated domain knowledge into constraint-based causal discovery. The majority of such constraint-based methods, however, assume causal faithfulness, which has been shown to be frequently violated in practice. Consequently, there has been renewed attention towards exact-search score-based causal discovery methods, which do not assume causal faithfulness, such as A*-based methods. However, there has been no consideration of these methods in the context of domain knowledge. In this work, we focus on efficiently integrating several types of domain knowledge into A*-based causal discovery. In doing so, we discuss and explain how domain knowledge can reduce the graph search space and then provide an analysis of the potential computational gains. We support these findings with experiments on synthetic and real data, showing that even small amounts of domain knowledge can dramatically speed up A*-based causal discovery and improve its performance and practicality.  ( 2 min )
    CoSimGNN: Towards Large-scale Graph Similarity Computation. (arXiv:2005.07115v7 [cs.LG] UPDATED)
    The ability to compute similarity scores between graphs based on metrics such as Graph Edit Distance (GED) is important in many real-world applications. Computing exact GED values is typically an NP-hard problem and traditional algorithms usually achieve an unsatisfactory trade-off between accuracy and efficiency. Recently, Graph Neural Networks (GNNs) provide a data-driven solution for this task, which is more efficient while maintaining prediction accuracy in small graph (around 10 nodes per graph) similarity computation. Existing GNN-based methods, which either respectively embeds two graphs (lack of low-level cross-graph interactions) or deploy cross-graph interactions for whole graph pairs (redundant and time-consuming), are still not able to achieve competitive results when the number of nodes in graphs increases. In this paper, we focus on similarity computation for large-scale graphs and propose the "embedding-coarsening-matching" framework CoSimGNN, which first embeds and coarsens large graphs with adaptive pooling operation and then deploys fine-grained interactions on the coarsened graphs for final similarity scores. Furthermore, we create several synthetic datasets which provide new benchmarks for graph similarity computation. Detailed experiments on both synthetic and real-world datasets have been conducted and CoSimGNN achieves the best performance while the inference time is at most 1/3 of that of previous state-of-the-art.  ( 3 min )
    Supervised PCA: A Multiobjective Approach. (arXiv:2011.05309v4 [stat.ML] UPDATED)
    Methods for supervised principal component analysis (SPCA) aim to incorporate label information into principal component analysis (PCA), so that the extracted features are more useful for a prediction task of interest. Prior work on SPCA has focused primarily on optimizing prediction error, and has neglected the value of maximizing variance explained by the extracted features. We propose a new method for SPCA that addresses both of these objectives jointly, and demonstrate empirically that our approach dominates existing approaches, i.e., outperforms them with respect to both prediction error and variation explained. Our approach accommodates arbitrary supervised learning losses and, through a statistical reformulation, provides a novel low-rank extension of generalized linear models.  ( 2 min )
    Deep Gaussian Process Emulation using Stochastic Imputation. (arXiv:2107.01590v2 [stat.ML] UPDATED)
    Deep Gaussian processes (DGPs) provide a rich class of models that can better represent functions with varying regimes or sharp changes, compared to conventional GPs. In this work, we propose a novel inference method for DGPs for computer model emulation. By stochastically imputing the latent layers, our approach transforms a DGP into a linked GP: a novel emulator developed for systems of linked computer models. This transformation permits an efficient DGP training procedure that only involves optimizations of conventional GPs. In addition, predictions from DGP emulators can be made in a fast and analytically tractable manner by naturally utilizing the closed form predictive means and variances of linked GP emulators. We demonstrate the method in a series of synthetic examples and empirical applications, and show that it is a competitive candidate for DGP surrogate inference, combining efficiency that is comparable to doubly stochastic variational inference and uncertainty quantification that is comparable to the fully-Bayesian approach. A $\texttt{Python}$ package $\texttt{dgpsi}$ implementing the method is also produced and available at https://github.com/mingdeyu/DGP.  ( 2 min )
    Using Machine Learning to Test Causal Hypotheses in Conjoint Analysis. (arXiv:2201.08343v2 [stat.ME] UPDATED)
    Conjoint analysis is a popular experimental design used to measure multidimensional preferences. Researchers examine how varying a factor of interest, while controlling for other relevant factors, influences decision-making. Currently, there exist two methodological approaches to analyzing data from a conjoint experiment. The first focuses on estimating the average marginal effects of each factor while averaging over the other factors. Although this allows for straightforward design-based estimation, the results critically depend on the distribution of other factors and how interaction effects are aggregated. An alternative model-based approach can compute various quantities of interest, but requires researchers to correctly specify the model, a challenging task for conjoint analysis with many factors and possible interactions. In addition, a commonly used logistic regression has poor statistical properties even with a moderate number of factors when incorporating interactions. We propose a new hypothesis testing approach based on the conditional randomization test to answer the most fundamental question of conjoint analysis: Does a factor of interest matter in any way given the other factors? Our methodology is solely based on the randomization of factors, and hence is free from assumptions. Yet, it allows researchers to use any test statistic, including those based on complex machine learning algorithms. As a result, we are able to combine the strengths of the existing design-based and model-based approaches. We illustrate the proposed methodology through conjoint analysis of immigration preferences and political candidate evaluation. We also extend the proposed approach to test for regularity assumptions commonly used in conjoint analysis. An open-source software package is available for implementing the proposed methodology.  ( 3 min )
    Quantum Bayes AI. (arXiv:2208.08068v1 [stat.ML])
    Quantum Bayesian AI (Q-B) is an emerging field that levers the computational gains available in Quantum computing. The promise is an exponential speed-up in many Bayesian algorithms. Our goal is to apply these methods directly to statistical and machine learning problems. We provide a duality between classical and quantum probability for calculating of posterior quantities of interest. Our framework unifies MCMC, Deep Learning and Quantum Learning calculations from the viewpoint from von Neumann's principle of quantum measurement. Quantum embeddings and neural gates are also an important part of data encoding and feature selection. There is a natural duality with well-known kernel methods in statistical learning. We illustrate the behaviour of quantum algorithms on two simple classification algorithms. Finally, we conclude with directions for future research.  ( 2 min )
    Quadratic Multiform Separation: A New Classification Model in Machine Learning. (arXiv:2110.04925v2 [stat.ML] UPDATED)
    In this paper we present a new classification model in machine learning. Our result is threefold: 1) The model produces comparable predictive accuracy to that of most common classification models. 2) It runs significantly faster than most common classification models. 3) It has the ability to identify a portion of unseen samples for which class labels can be found with much higher predictive accuracy. Currently there are several patents pending on the proposed model.  ( 2 min )
    A Framework for Machine Learning of Model Error in Dynamical Systems. (arXiv:2107.06658v3 [math.DS] UPDATED)
    The development of data-informed predictive models for dynamical systems is of widespread interest in many disciplines. We present a unifying framework for blending mechanistic and machine-learning approaches to identify dynamical systems from noisily and partially observed data. We compare pure data-driven learning with hybrid models which incorporate imperfect domain knowledge. Our formulation is agnostic to the chosen machine learning model, is presented in both continuous- and discrete-time settings, and is compatible both with model errors that exhibit substantial memory and errors that are memoryless. First, we study memoryless linear (w.r.t. parametric-dependence) model error from a learning theory perspective, defining excess risk and generalization error. For ergodic continuous-time systems, we prove that both excess risk and generalization error are bounded above by terms that diminish with the square-root of T, the time-interval over which training data is specified. Secondly, we study scenarios that benefit from modeling with memory, proving universal approximation theorems for two classes of continuous-time recurrent neural networks (RNNs): both can learn memory-dependent model error. In addition, we connect one class of RNNs to reservoir computing, thereby relating learning of memory-dependent error to recent work on supervised learning between Banach spaces using random features. Numerical results are presented (Lorenz '63, Lorenz '96 Multiscale systems) to compare purely data-driven and hybrid approaches, finding hybrid methods less data-hungry and more parametrically efficient. Finally, we demonstrate numerically how data assimilation can be leveraged to learn hidden dynamics from noisy, partially-observed data, and illustrate challenges in representing memory by this approach, and in the training of such models.  ( 3 min )
    Debiased Inference on Identified Linear Functionals of Underidentified Nuisances via Penalized Minimax Estimation. (arXiv:2208.08291v1 [stat.ME])
    We study generic inference on identified linear functionals of nonunique nuisances defined as solutions to underidentified conditional moment restrictions. This problem appears in a variety of applications, including nonparametric instrumental variable models, proximal causal inference under unmeasured confounding, and missing-not-at-random data with shadow variables. Although the linear functionals of interest, such as average treatment effect, are identifiable under suitable conditions, nonuniqueness of nuisances pose serious challenges to statistical inference, since in this setting common nuisance estimators can be unstable and lack fixed limits. In this paper, we propose penalized minimax estimators for the nuisance functions and show they enable valid inference in this challenging setting. The proposed nuisance estimators can accommodate flexible function classes, and importantly, they can converge to fixed limits determined by the penalization, regardless of whether the nuisances are unique or not. We use the penalized nuisance estimators to form a debiased estimator for the linear functional of interest and prove its asymptotic normality under generic high-level conditions, which provide for asymptotically valid confidence intervals.  ( 2 min )
    Online Learning for Mixture of Multivariate Hawkes Processes. (arXiv:2208.07961v1 [stat.ML])
    Online learning of Hawkes processes has received increasing attention in the last couple of years especially for modeling a network of actors. However, these works typically either model the rich interaction between the events or the latent cluster of the actors or the network structure between the actors. We propose to model the latent structure of the network of actors as well as their rich interaction across events for real-world settings of medical and financial applications. Experimental results on both synthetic and real-world data showcase the efficacy of our approach.  ( 2 min )
    Score-Based Generative Models Detect Manifolds. (arXiv:2206.01018v2 [stat.ML] UPDATED)
    Score-based generative models (SGMs) need to approximate the scores $\nabla \log p_t$ of the intermediate distributions as well as the final distribution $p_T$ of the forward process. The theoretical underpinnings of the effects of these approximations are still lacking. We find precise conditions under which SGMs are able to produce samples from an underlying (low-dimensional) data manifold $\mathcal{M}$. This assures us that SGMs are able to generate the "right kind of samples". For example, taking $\mathcal{M}$ to be the subset of images of faces, we find conditions under which the SGM robustly produces an image of a face, even though the relative frequencies of these images might not accurately represent the true data generating distribution. Moreover, this analysis is a first step towards understanding the generalization properties of SGMs: Taking $\mathcal{M}$ to be the set of all training samples, our results provide a precise description of when the SGM memorizes its training data.  ( 2 min )
    Sparse Nonnegative Tucker Decomposition and Completion under Noisy Observations. (arXiv:2208.08287v1 [cs.LG])
    Tensor decomposition is a powerful tool for extracting physically meaningful latent factors from multi-dimensional nonnegative data, and has been an increasing interest in a variety of fields such as image processing, machine learning, and computer vision. In this paper, we propose a sparse nonnegative Tucker decomposition and completion method for the recovery of underlying nonnegative data under noisy observations. Here the underlying nonnegative data tensor is decomposed into a core tensor and several factor matrices with all entries being nonnegative and the factor matrices being sparse. The loss function is derived by the maximum likelihood estimation of the noisy observations, and the $\ell_0$ norm is employed to enhance the sparsity of the factor matrices. We establish the error bound of the estimator of the proposed model under generic noise scenarios, which is then specified to the observations with additive Gaussian noise, additive Laplace noise, and Poisson observations, respectively. Our theoretical results are better than those by existing tensor-based or matrix-based methods. Moreover, the minimax lower bounds are shown to be matched with the derived upper bounds up to logarithmic factors. Numerical examples on both synthetic and real-world data sets demonstrate the superiority of the proposed method for nonnegative tensor data completion.  ( 2 min )
    Learning low-rank latent mesoscale structures in networks. (arXiv:2102.06984v3 [cs.SI] UPDATED)
    It is common to use networks to encode the architecture of interactions between entities in complex systems in applications in the physical, biological, social, and information sciences. To study the large-scale behavior of complex systems, it is useful to study mesoscale structures in networks as building blocks that influence such behavior. We present a new approach for describing low-rank mesoscale structure in networks, and we illustrate our approach using several synthetic network models and empirical friendship, collaboration, and protein--protein interaction (PPI) networks. We find that these networks possess a relatively small number of `latent motifs' that together can successfully approximate most subgraphs of a network at a fixed mesoscale. We use an algorithm that we call `network dictionary learning' (NDL), which combines a network-sampling method and nonnegative matrix factorization, to learn the latent motifs of a given network. The ability to encode a network using a set of latent motifs has a wide variety of applications to network-analysis tasks, such as comparison, denoising, and edge inference. Additionally, using our new network denoising and reconstruction (NDR) algorithm, we demonstrate how to denoise a corrupted network by using only the latent motifs that one learns directly from the corrupted networks.  ( 3 min )
    Two-Stage Robust and Sparse Distributed Statistical Inference for Large-Scale Data. (arXiv:2208.08230v1 [stat.ML])
    In this paper, we address the problem of conducting statistical inference in settings involving large-scale data that may be high-dimensional and contaminated by outliers. The high volume and dimensionality of the data require distributed processing and storage solutions. We propose a two-stage distributed and robust statistical inference procedures coping with high-dimensional models by promoting sparsity. In the first stage, known as model selection, relevant predictors are locally selected by applying robust Lasso estimators to the distinct subsets of data. The variable selections from each computation node are then fused by a voting scheme to find the sparse basis for the complete data set. It identifies the relevant variables in a robust manner. In the second stage, the developed statistically robust and computationally efficient bootstrap methods are employed. The actual inference constructs confidence intervals, finds parameter estimates and quantifies standard deviation. Similar to stage 1, the results of local inference are communicated to the fusion center and combined there. By using analytical methods, we establish the favorable statistical properties of the robust and computationally efficient bootstrap methods including consistency for a fixed number of predictors, and robustness. The proposed two-stage robust and distributed inference procedures demonstrate reliable performance and robustness in variable selection, finding confidence intervals and bootstrap approximations of standard deviations even when data is high-dimensional and contaminated by outliers.  ( 3 min )
    Expressivity of Hidden Markov Chains vs. Recurrent Neural Networks from a system theoretic viewpoint. (arXiv:2208.08175v1 [eess.SY])
    Hidden Markov Chains (HMC) and Recurrent Neural Networks (RNN) are two well known tools for predicting time series. Even though these solutions were developed independently in distinct communities, they share some similarities when considered as probabilistic structures. So in this paper we first consider HMC and RNN as generative models, and we embed both structures in a common generative unified model (GUM). We next address a comparative study of the expressivity of these models. To that end we assume that the models are furthermore linear and Gaussian. The probability distributions produced by these models are characterized by structured covariance series, and as a consequence expressivity reduces to comparing sets of structured covariance series, which enables us to call for stochastic realization theory (SRT). We finally provide conditions under which a given covariance series can be realized by a GUM, an HMC or an RNN.  ( 2 min )
    On the generalization of learning algorithms that do not converge. (arXiv:2208.07951v1 [cs.LG])
    Generalization analyses of deep learning typically assume that the training converges to a fixed point. But, recent results indicate that in practice, the weights of deep neural networks optimized with stochastic gradient descent often oscillate indefinitely. To reduce this discrepancy between theory and practice, this paper focuses on the generalization of neural networks whose training dynamics do not necessarily converge to fixed points. Our main contribution is to propose a notion of statistical algorithmic stability (SAS) that extends classical algorithmic stability to non-convergent algorithms and to study its connection to generalization. This ergodic-theoretic approach leads to new insights when compared to the traditional optimization and learning theory perspectives. We prove that the stability of the time-asymptotic behavior of a learning algorithm relates to its generalization and empirically demonstrate how loss dynamics can provide clues to generalization performance. Our findings provide evidence that networks that "train stably generalize better" even when the training continues indefinitely and the weights do not converge.  ( 2 min )
    Shallow neural network representation of polynomials. (arXiv:2208.08138v1 [stat.ML])
    We show that $d$-variate polynomials of degree $R$ can be represented on $[0,1]^d$ as shallow neural networks of width $d+1+\sum_{r=2}^R\binom{r+d-1}{d-1}[\binom{r+d-1}{d-1}+1]$. Also, by SNN representation of localized Taylor polynomials of univariate $C^\beta$-smooth functions, we derive for shallow networks the minimax optimal rate of convergence, up to a logarithmic factor, to unknown univariate regression function.  ( 2 min )
    Superior generalization of smaller models in the presence of significant label noise. (arXiv:2208.08003v1 [cs.LG])
    The benefits of over-parameterization in achieving superior generalization performance have been shown in several recent studies, justifying the trend of using larger models in practice. In the context of robust learning however, the effect of neural network size has not been well studied. In this work, we find that in the presence of a substantial fraction of mislabeled examples, increasing the network size beyond some point can be harmful. In particular, the originally monotonic or `double descent' test loss curve (w.r.t. network width) turns into a U-shaped or a double U-shaped curve when label noise increases, suggesting that the best generalization is achieved by some model with intermediate size. We observe that when network size is controlled by density through random pruning, similar test loss behaviour is observed. We also take a closer look into both phenomenon through bias-variance decomposition and theoretically characterize how label noise shapes the variance term. Similar behavior of the test loss can be observed even when state-of-the-art robust methods are applied, indicating that limiting the network size could further boost existing methods. Finally, we empirically examine the effect of network size on the smoothness of learned functions, and find that the originally negative correlation between size and smoothness is flipped by label noise.  ( 3 min )
    Private Estimation with Public Data. (arXiv:2208.07984v1 [cs.LG])
    We initiate the study of differentially private (DP) estimation with access to a small amount of public data. For private estimation of d-dimensional Gaussians, we assume that the public data comes from a Gaussian that may have vanishing similarity in total variation distance with the underlying Gaussian of the private data. We show that under the constraints of pure or concentrated DP, d+1 public data samples are sufficient to remove any dependence on the range parameters of the private data distribution from the private sample complexity, which is known to be otherwise necessary without public data. For separated Gaussian mixtures, we assume that the underlying public and private distributions are the same, and we consider two settings: (1) when given a dimension-independent amount of public data, the private sample complexity can be improved polynomially in terms of the number of mixture components, and any dependence on the range parameters of the distribution can be removed in the approximate DP case; (2) when given an amount of public data linear in the dimension, the private sample complexity can be made independent of range parameters even under concentrated DP, and additional improvements can be made to the overall sample complexity.  ( 2 min )

  • Open

    [N] NeurIPS 2022 Temporal Graph Learning Workshop
    We are pleased to announce the NeurIPS 2022 Temporal Graph Learning Workshop. The workshop aims to share understanding and techniques to facilitate the development of novel temporal graph learning methods. For more details, please see workshop website: https://sites.google.com/view/tglworkshop2022/home. We will also give updates through Twitter @tgl_workshop. Key Dates: The planned dates are as follows: Submission deadline: Sep. 19th, 2022 Accept/reject notification: Oct. 12th, 2022 Camera ready deadline: Nov. 3rd, 2022 Workshop date: Dec. 3rd, 2022; in-person in New Orleans, US Call for Papers: We encourage researchers to submit their papers broadly related to temporal graph learning. We also welcome papers that present benchmark datasets, evaluation protocols, and challenges …  ( 89 min )
    Best Budget GPU for ai training [D]
    Hi, I have a budget of 200€-400€ what would be your Recommendation ? submitted by /u/Mo_187_ [link] [comments]  ( 88 min )
    [P] - VkFFT now supports Rader's algorithm - A100 and MI250 benchmarks
    Hello, I am the creator of the VkFFT - GPU Fast Fourier Transform library for Vulkan/CUDA/HIP/OpenCL and Level Zero. In the latest update, I have implemented my take on Rader's FFT algorithm, which allows VkFFT to do FFTs of sequences representable as a multiplication of primes up to 83, just like you would with powers of two. Rader's FFT algorithm represents an FFT of prime length sequence as a convolution of length N-1. Inlining these convolutions as a step in the Stockham algorithm, makes it possible to have radix kernels of extremely high prime lengths - VkFFT currently uses primes up to 83. Previously, VkFFT had to switch to Bluestein's algorithm if a sequence had primes bigger than 13. Bluestein's algorithm does FFT of arbitrary length as a zero-padded convolution of a length at le…  ( 92 min )
    [Discussion] Advice on Toolkit/Framework?
    Hey guys, Kind of a noob question here. I am working on a pilot project that involves a combination of circumstances that make it rather peculiar (at least to me). I can certainly appreciate some fresh eyes and pointers on what kind of tools/libraries you would choose to approach the problem. Relevant details: Fairly large dataset (at least a couple hundred million records) Will require significant amount of stateful transformations (ie: transformations like standardization where you have to learn the mean and stddev from your training data and persist the learned states during inference time, but potentially MUCH more involved). Part of the transformation pipeline can inference on bulk data with relatively high latency, part of the transformation pipeline requires inference on SI…  ( 91 min )
    [D] Looking for resources on feedback control systems with a DNN Plant
    We have a deep neural network model and some feedback that modifies the input to that model over time. This system currently has some undesirable properties. We’d like to retrain it to not have those properties, but may not be able to for a variety of reasons. So I’m looking for any resources on developing feedback controllers where the plant is / includes a neural network. Note: I’m not looking to use a NN as my controller, just looking for any techniques that can be applied when the plant is a NN. submitted by /u/nlman0 [link] [comments]  ( 108 min )
    [D] Anyone has Deep Learning benchmark for 6800XT with Rocm?
    I suppose currently 6800XT is about $600, which is the price I am willing to pay. That money can net me a 3070 Ti. However, the Nvidia choice has like half the amount of VRAM, and I am kinda get bored with the CUDA lock down system anyway. On top of that, my 1080 TI for ML training is getting older. I can feel it... Anyway, does anyone have any data about how AMD offerings (I know Rocm right now only support Rx 6800 and above) can compete with Nvidia cards? Any comparable data between a 6800XT and 3070 Ti in deep learning would be nice. Thank you. submitted by /u/ffleader1 [link] [comments]  ( 88 min )
    [P] Solving Boggle in the Browser Using TensorFlow.js and WebAssembly!
    Hi everyone! Here's a project I've been working on for the last few months. It's a Boggle solver that can take a photo of a Boggle board and attempt to extract the letters of the board using TensorFlow.js and OpenCV.js. The actual solving code is written in Rust and compiled to WebAssembly. Link to Blog Post: https://prowe.ca/blog/wasm-rust-boggle-solver Link to project: https://roggle.prowe.ca/ Any advice on how I could improve the accuracy of my model would be greatly appreciated. I spent some time trying to increase the accuracy but it still struggles to figure out each letter a lot of the time. submitted by /u/Parkuman [link] [comments]  ( 89 min )
    [D] What framework are you using?
    I realized recently that PyTorch overtook Tensorflow on Google Trends: https://trends.google.com/trends/explore?date=all&geo=US&q=tensorflow,pytorch What are you using? View Poll submitted by /u/wnorrisii [link] [comments]  ( 89 min )
    [D] If there’s one practical tip you wish should have been drilled deeply into you when you first started out learning about deep learning, what would it be?
    For example, things that you found are really important when you're in the industry but not well-covered / explained in a typical DL course (undergrad level). Best if that thing changed how you approached DL projects, or something that made you much more productive. ​ I'll start the ball rolling: one of the biggest pains / time-consuming aspects of ML/DL projects so far have been poorly documented code - not just from others but your own. Having no documentation is bad but documenting / tracking every single thing you've tried is equally bad (revisiting such a repo is horrifying...). Writing just enough documentation so that someone else knows how to prep the data correctly and get the model running on them + reproduce the best results reported + be aware about what have been tried before (and do all these 3 things in the shortest time possible) - that'd do the trick but it's not always easy to get the right balance, especially when writing it for someone else to read. I guess the lesson learnt is to (dedicate some time to) clean up the repo before ending any projects, because you never know when you / someone else have to revisit it. Perhaps some kind of standardisation / template for documentation writing would become a thing as the field matures... there are probably such stuff from software engineering but something specific to data + models + ML/DL code is needed. submitted by /u/manzaikid [link] [comments]  ( 114 min )
    [D] Bias-variance tradeoff in human perception?
    Wanted to start a discussion about the bias-variance tradeoff in human perception. Are human perceptual systems (an example is object recognition) extremely biased, with little to no variance? For example - I never fail to recognize a water bottle (little variance) under any environmental conditions submitted by /u/liqui_date_me [link] [comments]  ( 90 min )
    [D] Fool me once, shame on you; fool me twice, shame on me: Exponential Smoothing vs. Facebook's Neural-Prophet.
    ​ https://preview.redd.it/put2itbz1bi91.png?width=920&format=png&auto=webp&s=10c905054f14214d1caaaf7765dc5693efad4a14 History tends to repeat itself. But FB-Prophet's tainted memory is too recent and should act as a warning not to repeat the same mistakes. This post compares Neural-Prophet's performance with Exponential Smoothing (ETS), a half-century-old forecasting method part of every practitioner's toolkit. Our comparison covers Tourism, M3, M4, ERCOT, and ETTm2 datasets, following the authors' recommended hyperparameter and network configuration settings. Despite Neural-Prophet's outstanding success over its unreliable predecessor, its errors are still 30 percent larger than ETS' while doubling its computation time. https://preview.redd.it/34d42nc8lai91.png?width=2008&format=png&auto=webp&s=38fe03059107b7054fe60a464c701e10d1ac3330 We hope this exercise helps the community evaluation of forecasting tools. And help us avoid adopting yet another overpromising and unproven forecasting method. As always, if you find our work helpful, your starring support ⭐ is greatly appreciated https://github.com/Nixtla/statsforecast. submitted by /u/fedegarzar [link] [comments]  ( 109 min )
    [P] MinImagen: A Minimal Imagen Implementation
    This year has been great for text-to-image models, but given that SoTA models like DALL-E 2 and Imagen depend on the recent advances in Diffusion Models, there seems to be a relative lack of good resources on building these models. I created MinImagen - a minimal Imagen implementation - to fill this gap. It includes a thoroughly commented repository and the above linked associated article, and is intended to help elucidate how Imagen (and related models) work internally. Here are some links if you want to take a look: Build Your Own Imagen Text-to-Image Model MinImagen GitHub Respository MinImagen Documentation MinImagen PyPi Package The implementation strips off all of the bells and whistles to isolate the salient and essential components of Imagen for educational purposes, and so it does not use modern best practices for maximum efficiency. Looking forward to seeing what you think and answering any questions! submitted by /u/SleekEagle [link] [comments]  ( 89 min )
    [P] Yet another deep learning based natural language processing APIs focused on Korean and English
    We released a collection of NLP APIs, dubbed TUNiBridge. Currently, it consists of 11 modules--safety check, de-identification, text analytics, image analytics, acrostic poem generation, etc. Check them here: https://tunibridge.ai/ submitted by /u/longinglove [link] [comments]  ( 111 min )
    [P] Cleanlab Vizzy — learn how to automatically find label errors and out-of-distribution data
    If you’ve seen the Cleanlab open-source package for automatically finding issues in datasets and training an ML model on bad labels as if you had error-free data — if you’re like me, you may have been curious — how does it work? It might seem surprising that it’s possible to automatically identify label errors and out-of-distribution data, using any model and for any modality of dataset (described as "black magic" by some). Cleanlab accomplishes this using "confident learning" algorithms backed by solid theory and peer-reviewed research. To help myself (and others!) build intuition for how they work, I built Vizzy, an interactive visualization playground that runs in the browser. Vizzy lets you experiment with an example dataset, tweak the labels, and run Cleanlab to automatically find issues like label errors and out-of-distribution data. Screenshot of Cleanlab Vizzy (https://playground.cleanlab.ai) Vizzy includes a JavaScript port of (a part of) cleanlab, which implements the algorithms described in https://arxiv.org/abs/1911.00068. There are other neat technical nuggets in the implementation of Vizzy as well, including ML model training in the browser (using features from a pretrained ResNet-18, performing truncated SVD, and using an SVM model for speed). If you’re interested in the details of how Vizzy works, check out the blog post. Happy to answer any questions related to Vizzy, cleanlab, or confident learning and data-centric AI in general! Vizzy: https://playground.cleanlab.ai/ Blog post: https://cleanlab.ai/blog/cleanlab-vizzy/ Source code: https://github.com/cleanlab/vizzy submitted by /u/Calebchiam [link] [comments]  ( 109 min )
    [P] CodeStamper - Ensuring traceability between ML experiments and Code
    In an ideal world an ML engineer should be able to rerun a past experiment at any moment in time and re-obtain the same results. CodeStamper tries to help the user in this direction in a non-ideal world. There are full fledged projects like DVC which aim to version everything (code, datasets & models) in order to ensure reproducibility. CodeStamper is a lightweight solution, which provides seamless integration into existing projects and targets to ensure code traceability (so it is not a magic bullet for all aspects of an ML experiment like datasets). When things can go wrong. An ML experiment is started but it might not be reproducible in the future because: Issue CodeStamper's approach The experiment itself does not contain any information related to the code with which it wa…  ( 89 min )
    [P] The table extraction tool: PP-Structure
    PP-Structure is an OCR toolkit that can be used for document analysis and processing with complex structures, designed to help developers better complete document understanding tasks ​ * Support the layout analysis of documents, divide the documents into 5 types of areas **text, title, table, image and list** (conjunction with Layout-Parser) * Support to extract the texts from the text, title, picture and list areas (used in conjunction with PP-OCR) * Support to extract excel files from the table areas * Support python WHL package and command line usage, easy to use * Support custom training for layout analysis and table structure tasks * Support Document Visual Question Answering (DOC-VQA) tasks: Semantic Entity Recognition (SER) and Relation Extraction (RE) submitted by /u/osicli [link] [comments]  ( 88 min )
    [P] The spelled-out intro to neural networks and backpropagation: building micrograd (Andrej Karpathy 2h25m lecture)
    A new lecture from Andrej Karpathy on his YouTube channel: https://www.youtube.com/watch?v=VMj-3S1tku0 This is the most step-by-step spelled-out explanation of backpropagation and training of neural networks. It only assumes basic knowledge of Python and a vague recollection of calculus from high school. According to Karpathy, "this is the culmination of about 8 years of obsessing about the best way to explain neural nets and backprop." He also mentions, "If you know Python, have a vague recollection of taking some derivatives in your high school, watch this video and not understand backpropagation and the core of neural nets by the end then I will eat a shoe :D" Link to the YouTube video: https://www.youtube.com/watch?v=VMj-3S1tku0 submitted by /u/hardmaru [link] [comments]  ( 89 min )
  • Open

    Learn Deep Learning Concepts From Minions
    submitted by /u/BuilderPrior4707 [link] [comments]  ( 86 min )
    whole proces in layers. after that i deside to make the painting in water. i love water elements to use in paintings. and you?
    submitted by /u/katomic_tattoo [link] [comments]  ( 89 min )
    Designing APIs for AI
    It’s estimated that anywhere from 50-90% of AI models developed never make it past the AI “valley of death” that exists between the lab and production deployment. This tech talk covers how an API-based approach to building and maintaining AI-enabled applications can bridge the divide between data scientists, software developers, and infrastructure managers building them. You'll learn best practices for API design, as well as pitfalls to avoid. https://youtu.be/tYStPrwRY4w submitted by /u/modzykirsten [link] [comments]  ( 87 min )
    i made a mnist spin off with geometric shapes
    It is basically a program to generate labeled geometric shapes, it generates as much data as you want... The first input is about the size of the image, the second one is how many images do you want when clicking on download. https://preview.redd.it/h4wdl8sqkbi91.png?width=560&format=png&auto=webp&s=0fbcd4751602918f4d25adbfb2c0e232c36e896e Example of low resolution (recomended) https://preview.redd.it/y137s3hdkbi91.png?width=1158&format=png&auto=webp&s=5e74cb7b2027c089ab3535aa65265a383c726569 high resolution example: https://preview.redd.it/4wk62kbfkbi91.png?width=1086&format=png&auto=webp&s=326a731d575b136108bab0f7a72dc385c987ade9 The main goal of this project is to create my own object recognition (with boundary) AI, but for that I still need to figure out how to make a dynamic output AI, and improve on the noise / gradient area of this program. link source code submitted by /u/Small-Ad-1694 [link] [comments]  ( 87 min )
    DALL-E 2 Art: Experiments with Prompts or How I Got My New Wallpaper
    submitted by /u/strikingLoo [link] [comments]  ( 86 min )
    Question about AI research back in 80's
    When I was younger, in my early 20's, I was into programming and algorithms and very interested in AI and learning systems. I remember reading an article in a magazine (may have been Computing magazine or something like that) and there was a researcher that had created a progam that he intended to learn as if it was a child. I am thinking this was in Berkley but cannot be sure but I think it was a University in California at least. I have since tried to find references or an update to this but have never been able to find anything. Looking at you folks here for some type of info on this. Always been curious if this had been scrapped, moved into an advanced computer system, etc. submitted by /u/AbruptGravy [link] [comments]  ( 89 min )
    look at this cool animation please
    submitted by /u/mech010001 [link] [comments]  ( 88 min )
    Real-world AI assistant: Google combines a large language model with an everyday robot
    submitted by /u/much_successes [link] [comments]  ( 86 min )
    Character generation using Midjourney AI
    submitted by /u/guilds-and-blades [link] [comments]  ( 86 min )
    If you are a data scientist, analyst or simply looking to make your analytics skills more efficient, Blinx is the tool for you!
    submitted by /u/SamuelSmith1416 [link] [comments]  ( 86 min )
    White house in the forrest
    submitted by /u/widgia [link] [comments]  ( 86 min )
    Lion family on a scooter !
    submitted by /u/Remarkable_Owl_2058 [link] [comments]  ( 89 min )
    Computer Vision News of August 2022
    Dear all, Here is Computer Vision News of August 2022. Many great articles about Artificial Intelligence, Deep Learning, Computer Vision and more (with code!) HTML5 version (recommended) PDF version Dilbert on page 2. Free subscription on page 60. Enjoy! https://preview.redd.it/o7m3v844q8i91.jpg?width=400&format=pjpg&auto=webp&s=fb1f9ed4ab3c4131154038390d7d1fd9be660441 submitted by /u/Gletta [link] [comments]  ( 86 min )
    Workshops in the first day of AGI-2022 conference on August 19 - video streaming links
    submitted by /u/akolonin [link] [comments]  ( 87 min )
    [Repost]Survey on AI Ethics and Readiness (for high school and first-year bachelor students)
    https://forms.gle/rPKmuN611VeLmZaNA I'm conducting a survey to understand awareness, attitudes and readiness of high school students towards Artificial Intelligence. The study will look at different aspects such as opportunities, risks, and ethics of AI, and also education necessary for high schoolers to improve their understanding. The results will be published as part of a detailed report. Your inputs are valuable in understanding how students learn and think about AI. All responses will be kept confidential and data is analysed only at the aggregate level. The “best” five responses will get a Rs 1000 amazon gift card each. The winners will be selected by, you guessed it, an algorithm. Thank You! submitted by /u/divijadurga [link] [comments]  ( 94 min )
    Andrew NG Machine Learning New Specialization
    submitted by /u/ampankajsharma [link] [comments]  ( 86 min )
    My second attempt at creating wallpapers for my phone: Chaos Forest Nº 1 & 2 | Using MidJourney AI (Image Creator bot for Discord)
    submitted by /u/Potato_Player_BR [link] [comments]  ( 86 min )
  • Open

    Project or Research topic to get a job?
    Hi everyone, I am currently a graduate student learning RL and I am intended to look for a RL engineer (not researcher) position in a game studio. I have learned some typical models, including PPO, SAC, etc, but am not familiar with state-of-art algorithms in specific areas. Now I work with my friend in an open-source project, which is a robot-learning environment. Most of the work is about data communication and data transformation. ​ At the same time, my recent attempts to look for a job all failed because of lacking project experience, especially experience in algorithm design and implementation. So I want to search for a project or research topic, which is relative to RL and game (or robot). Could anyone give me some instructions or suggestions? Thank you very much! submitted by /u/ZavierTi2021 [link] [comments]  ( 87 min )
    Why might agent be unable to learn training environment? Possible constraints?
    Hi everyone! I'm working on an RL problem using PPO. The environment's data/states/variables come from an actual, finite data table (time series). Therefore, I have split the data table into "training" and "testing", so that after training, I can see how the agent performs in states that it's never seen during training (just like a train/test split in supervised learning). The problem is that I am actually unable to get good performance on the training environment. Some details: My training dataset consists of 1,400,000 rows/unique states, with around 140 state-space variables I am using PPO with both (value and policy) networks having the following architecture: 192-128-64 I have tried gamma ranging from 0.9825 to 0.995 I am implementing a "dynamic" learning rate schedule, where the learning rate is halved if performance doesn't improve in 250K timesteps of training I have trained for up to 4M timesteps. The agent DOES learn, but the performance isn't good enough on training set (performance either plateaus or ends up crashing as training goes on) Does anyone have any ideas for what might be the problem? I would expect that the agent would be able to "overfit" on the training set, as I have seen before with smaller datasets. What might I be missing? Why wouldn't the agent be able to get insanely good scores on the finite training environment? Any suggestions would be appreciated! I know I'm coming at this from a "supervised learning" perspective, but given that it's not a simulated environment with endless possible states, I think it only makes sense. submitted by /u/VladimirB-98 [link] [comments]  ( 90 min )
    Reinforcement learning models are prone to membership inference attacks
    submitted by /u/bendee983 [link] [comments]  ( 86 min )
    For a Multi-Agent Swarm, would you have different RL models for each agent or one master RL model that takes in data of all the agents and outputs actions for all the agent, or are both the same thing?
    submitted by /u/FailedMesh [link] [comments]  ( 87 min )
    Reducing Exploitability with Population Based Training
    submitted by /u/Caffeinated-Scholar [link] [comments]  ( 87 min )
    How does off-policy monte-carlo explore and converge?
    Premises to question: Behavior Policy: e-greedy (stochastic) & Target Policy: greedy, (deterministic) Importance Sampling Included In Off-policy Monte-Carlo control, the behavior policy chooses actions to follow, and the target policy learns from those actions. However, because of importance sampling, if the behavior policy chooses an action that is not believed to be the "best" action by the target policy, then the importance sampling ratio is 0 and the algorithm disregards any learning. My question, then, is how the target policy can ever change its preferred action if the action value is only updated when the behavior policy chooses the same action as the target policy? How is there any exploration if the target policy is greedy, because the importance sampling ratio zeros out every action from the behavior policy that is not chosen by the target policy? Sutton's RL book says that "learning will be slow... if non-greedy actions are common" What I don't understand is how the target policy can ever choose a different action if the only actions that count are the actions that are the same from target and behavior policy? I've been struggling on this for the past few days, please help. Thank you. submitted by /u/JonathanMonathan62 [link] [comments]  ( 100 min )
  • Open

    AWS Localization uses Amazon Translate to scale localization
    The AWS website is currently available in 16 languages (12 for the AWS Management Console and for technical documentation): Arabic, Chinese Simplified, Chinese Traditional, English, French, German, Indonesian, Italian, Japanese, Korean, Portuguese, Russian, Spanish, Thai, Turkish, and Vietnamese. Customers all over the world gain hands-on experience with the AWS platform, products, and services in their […]  ( 6 min )
    Incrementally update a dataset with a bulk import mechanism in Amazon Personalize
    We are excited to announce that Amazon Personalize now supports incremental bulk dataset imports; a new option for updating your data and improving the quality of your recommendations. Keeping your datasets current is an important part of maintaining the relevance of your recommendations. Prior to this new feature launch, Amazon Personalize offered two mechanisms for […]  ( 5 min )
  • Open

    Is the Matrix Coming to Life?
    In June 2018, the metaverse was first announced and the world went crazy. This caused cryptocurrencies to go crazy and Elon Musk could not…  ( 9 min )
  • Open

    Code Released: Conformal Training
    The code for our ICLR'22 paper on learning optimal conformal classifiers is now available on GitHub. The repository not only includes our implementation of conformal training but also relevant baselines such as coverage training and several conformal predictors for evaluation. Furthermore, it allows to reproduce the majority of experiments from the paper. The post Code Released: Conformal Training appeared first on David Stutz.  ( 3 min )
  • Open

    Cross platform muscle memory
    I’ve mostly used Windows and Linux for the last several years. When I needed to get a new laptop recently I got a MacBook Pro. I’ve used a Mac before, but it’s been a while, and so I’m starting over. I can move between Windows and Linux and almost forget what OS I’m using because […] Cross platform muscle memory first appeared on John D. Cook.  ( 5 min )
    More on near-integer decibels
    In base 10, four decibel values are approximately integers. Yesterday I explored whether base 10 was unique in this regard. I defined the value of n decibels in base b to be g(n, b) = bn/b. Sticking in 10 for b gives the usual definition of decibel levels. There are two ways to quantify what […] More on near-integer decibels first appeared on John D. Cook.  ( 5 min )
  • Open

    Immunai Co-Founder Luis Voloch on Using Deep Learning to Develop New Drugs
    Mapping the immune system could lead to the creation of drugs that help our bodies win the fight against cancer and other diseases. That’s the big idea behind immunotherapy. The problem: the immune system is incredibly complex. Enter Immunai, a biotech company that’s using cutting-edge genomics & ML technology to map the human immune system Read article > The post Immunai Co-Founder Luis Voloch on Using Deep Learning to Develop New Drugs appeared first on NVIDIA Blog.  ( 4 min )
  • Open

    Hyperparameter Optimization of Generative Adversarial Network Models for High-Energy Physics Simulations. (arXiv:2208.07715v1 [hep-ex])
    The Generative Adversarial Network (GAN) is a powerful and flexible tool that can generate high-fidelity synthesized data by learning. It has seen many applications in simulating events in High Energy Physics (HEP), including simulating detector responses and physics events. However, training GANs is notoriously hard and optimizing their hyperparameters even more so. It normally requires many trial-and-error training attempts to force a stable training and reach a reasonable fidelity. Significant tuning work has to be done to achieve the accuracy required by physics analyses. This work uses the physics-agnostic and high-performance-computer-friendly hyperparameter optimization tool HYPPO to optimize and examine the sensitivities of the hyperparameters of a GAN for two independent HEP datasets. This work provides the first insights into efficiently tuning GANs for Large Hadron Collider data. We show that given proper hyperparameter tuning, we can find GANs that provide high-quality approximations of the desired quantities. We also provide guidelines for how to go about GAN architecture tuning using the analysis tools in HYPPO.
    Proceedings of the ICML 2022 Expressive Vocalizations Workshop and Competition: Recognizing, Generating, and Personalizing Vocal Bursts. (arXiv:2207.06958v2 [cs.SD] UPDATED)
    This is the Proceedings of the ICML Expressive Vocalization (ExVo) Competition. The ExVo competition focuses on understanding and generating vocal bursts: laughs, gasps, cries, and other non-verbal vocalizations that are central to emotional expression and communication. ExVo 2022, included three competition tracks using a large-scale dataset of 59,201 vocalizations from 1,702 speakers. The first, ExVo-MultiTask, requires participants to train a multi-task model to recognize expressed emotions and demographic traits from vocal bursts. The second, ExVo-Generate, requires participants to train a generative model that produces vocal bursts conveying ten different emotions. The third, ExVo-FewShot, requires participants to leverage few-shot learning incorporating speaker identity to train a model for the recognition of 10 emotions conveyed by vocal bursts.
    Counterfactual Supervision-based Information Bottleneck for Out-of-Distribution Generalization. (arXiv:2208.07798v1 [cs.LG])
    Learning invariant (causal) features for out-of-distribution (OOD) generalization has attracted extensive attention recently, and among the proposals invariant risk minimization (IRM) (Arjovsky et al., 2019) is a notable solution. In spite of its theoretical promise for linear regression, the challenges of using IRM in linear classification problems yet remain (Rosenfeld et al.,2020, Nagarajan et al., 2021). Along this line, a recent study (Arjovsky et al., 2019) has made a first step and proposes a learning principle of information bottleneck based invariant risk minimization (IB-IRM). In this paper, we first show that the key assumption of support overlap of invariant features used in (Arjovsky et al., 2019) is rather strong for the guarantee of OOD generalization and it is still possible to achieve the optimal solution without such assumption. To further answer the question of whether IB-IRM is sufficient for learning invariant features in linear classification problems, we show that IB-IRM would still fail in two cases whether or not the invariant features capture all information about the label. To address such failures, we propose a \textit{Counterfactual Supervision-based Information Bottleneck (CSIB)} learning algorithm that provably recovers the invariant features. The proposed algorithm works even when accessing data from a single environment, and has theoretically consistent results for both binary and multi-class problems. We present empirical experiments on three synthetic datasets that verify the efficacy of our proposed method.
    Learning Facial Liveness Representation for Domain Generalized Face Anti-spoofing. (arXiv:2208.07828v1 [cs.CV])
    Face anti-spoofing (FAS) aims at distinguishing face spoof attacks from the authentic ones, which is typically approached by learning proper models for performing the associated classification task. In practice, one would expect such models to be generalized to FAS in different image domains. Moreover, it is not practical to assume that the type of spoof attacks would be known in advance. In this paper, we propose a deep learning model for addressing the aforementioned domain-generalized face anti-spoofing task. In particular, our proposed network is able to disentangle facial liveness representation from the irrelevant ones (i.e., facial content and image domain features). The resulting liveness representation exhibits sufficient domain invariant properties, and thus it can be applied for performing domain-generalized FAS. In our experiments, we conduct experiments on five benchmark datasets with various settings, and we verify that our model performs favorably against state-of-the-art approaches in identifying novel types of spoof attacks in unseen image domains.
    Near Optimal Adversarial Attack on UCB Bandits. (arXiv:2008.09312v2 [cs.LG] UPDATED)
    We consider a stochastic multi-arm bandit problem where rewards are subject to adversarial corruption. We propose a novel attack strategy that manipulates a UCB principle into pulling some non-optimal target arm $T - o(T)$ times with a cumulative cost that scales as $\sqrt{\log T}$, where $T$ is the number of rounds. We also prove the first lower bound on the cumulative attack cost. Our lower bound matches our upper bound up to $\log \log T$ factors, showing our attack to be near optimal.
    Pre-training Enhanced Spatial-temporal Graph Neural Network for Multivariate Time Series Forecasting. (arXiv:2206.09113v2 [cs.LG] UPDATED)
    Multivariate Time Series (MTS) forecasting plays a vital role in a wide range of applications. Recently, Spatial-Temporal Graph Neural Networks (STGNNs) have become increasingly popular MTS forecasting methods. STGNNs jointly model the spatial and temporal patterns of MTS through graph neural networks and sequential models, significantly improving the prediction accuracy. But limited by model complexity, most STGNNs only consider short-term historical MTS data, such as data over the past one hour. However, the patterns of time series and the dependencies between them (i.e., the temporal and spatial patterns) need to be analyzed based on long-term historical MTS data. To address this issue, we propose a novel framework, in which STGNN is Enhanced by a scalable time series Pre-training model (STEP). Specifically, we design a pre-training model to efficiently learn temporal patterns from very long-term history time series (e.g., the past two weeks) and generate segment-level representations. These representations provide contextual information for short-term time series input to STGNNs and facilitate modeling dependencies between time series. Experiments on three public real-world datasets demonstrate that our framework is capable of significantly enhancing downstream STGNNs, and our pre-training model aptly captures temporal patterns.
    Learning Representations with Contrastive Self-Supervised Learning for Histopathology Applications. (arXiv:2112.05760v2 [eess.IV] UPDATED)
    Unsupervised learning has made substantial progress over the last few years, especially by means of contrastive self-supervised learning. The dominating dataset for benchmarking self-supervised learning has been ImageNet, for which recent methods are approaching the performance achieved by fully supervised training. The ImageNet dataset is however largely object-centric, and it is not clear yet what potential those methods have on widely different datasets and tasks that are not object-centric, such as in digital pathology. While self-supervised learning has started to be explored within this area with encouraging results, there is reason to look closer at how this setting differs from natural images and ImageNet. In this paper we make an in-depth analysis of contrastive learning for histopathology, pin-pointing how the contrastive objective will behave differently due to the characteristics of histopathology data. We bring forward a number of considerations, such as view generation for the contrastive objective and hyper-parameter tuning. In a large battery of experiments, we analyze how the downstream performance in tissue classification will be affected by these considerations. The results point to how contrastive learning can reduce the annotation effort within digital pathology, but that the specific dataset characteristics need to be considered. To take full advantage of the contrastive learning objective, different calibrations of view generation and hyper-parameters are required. Our results pave the way for realizing the full potential of self-supervised learning for histopathology applications.
    Hierarchical Kickstarting for Skill Transfer in Reinforcement Learning. (arXiv:2207.11584v2 [cs.LG] UPDATED)
    Practising and honing skills forms a fundamental component of how humans learn, yet artificial agents are rarely specifically trained to perform them. Instead, they are usually trained end-to-end, with the hope being that useful skills will be implicitly learned in order to maximise discounted return of some extrinsic reward function. In this paper, we investigate how skills can be incorporated into the training of reinforcement learning (RL) agents in complex environments with large state-action spaces and sparse rewards. To this end, we created SkillHack, a benchmark of tasks and associated skills based on the game of NetHack. We evaluate a number of baselines on this benchmark, as well as our own novel skill-based method Hierarchical Kickstarting (HKS), which is shown to outperform all other evaluated methods. Our experiments show that learning with a prior knowledge of useful skills can significantly improve the performance of agents on complex problems. We ultimately argue that utilising predefined skills provides a useful inductive bias for RL problems, especially those with large state-action spaces and sparse rewards.
    Interactive and Visual Prompt Engineering for Ad-hoc Task Adaptation with Large Language Models. (arXiv:2208.07852v1 [cs.CL])
    State-of-the-art neural language models can now be used to solve ad-hoc language tasks through zero-shot prompting without the need for supervised training. This approach has gained popularity in recent years, and researchers have demonstrated prompts that achieve strong accuracy on specific NLP tasks. However, finding a prompt for new tasks requires experimentation. Different prompt templates with different wording choices lead to significant accuracy differences. PromptIDE allows users to experiment with prompt variations, visualize prompt performance, and iteratively optimize prompts. We developed a workflow that allows users to first focus on model feedback using small data before moving on to a large data regime that allows empirical grounding of promising prompts using quantitative measures of the task. The tool then allows easy deployment of the newly created ad-hoc models. We demonstrate the utility of PromptIDE (demo at this http URL) and our workflow using several real-world use cases.
    Do Invariances in Deep Neural Networks Align with Human Perception?. (arXiv:2111.14726v3 [cs.CV] UPDATED)
    An evaluation criterion for safe and trustworthy deep learning is how well the invariances captured by representations of deep neural networks (DNNs) are shared with humans. We identify challenges in measuring these invariances. Prior works used gradient-based methods to generate \textit{identically represented inputs} (IRIs), \ie, inputs which have identical representations (on a given layer) of a neural network, and thus capture invariances of a given network. One necessary criterion for a network's invariances to align with human perception is for its IRIs look `similar` to humans. Prior works, however, have mixed takeaways; some argue that later layers of DNNs do not learn human-like invariances (\cite{jenelle2019metamers}) yet others seem to indicate otherwise (\cite{mahendran2014understanding}). We argue that the loss function used to generate IRIs can heavily affect takeaways about invariances of the network and is the primary reason for these conflicting findings. We propose an \textit{adversarial} regularizer on the IRI generation loss that finds IRIs that make any model appear to have very little shared invariance with humans. Based on this evidence, we argue that there is scope for improving models to have human-like invariances, and further, to have meaningful comparisons between models one should use IRIs generated using the \textit{regularizer-free} loss. We then conduct an in-depth investigation of how different components (\eg~architectures, training losses, data augmentations) of the deep learning pipeline contribute to learning models that have good alignment with humans. We find that architectures with residual connections trained using a (self-supervised) contrastive loss with $\ell_p$ ball adversarial data augmentation tend to learn invariances that are most aligned with humans.
    QuickSkill: Novice Skill Estimation in Online Multiplayer Games. (arXiv:2208.07704v1 [cs.LG])
    Matchmaking systems are vital for creating fair matches in online multiplayer games, which directly affects players' satisfactions and game experience. Most of the matchmaking systems largely rely on precise estimation of players' game skills to construct equitable games. However, the skill rating of a novice is usually inaccurate, as current matchmaking rating algorithms require considerable amount of games for learning the true skill of a new player. Using these unreliable skill scores at early stages for matchmaking usually leads to disparities in terms of team performance, which causes negative game experience. This is known as the ''cold-start'' problem for matchmaking rating algorithms. To overcome this conundrum, this paper proposes QuickSKill, a deep learning based novice skill estimation framework to quickly probe abilities of new players in online multiplayer games. QuickSKill extracts sequential performance features from initial few games of a player to predict his/her future skill rating with a dedicated neural network, thus delivering accurate skill estimation at the player's early game stage. By employing QuickSKill for matchmaking, game fairness can be dramatically improved in the initial cold-start period. We conduct experiments in a popular mobile multiplayer game in both offline and online scenarios. Results obtained with two real-world anonymized gaming datasets demonstrate that proposed QuickSKill delivers precise estimation of game skills for novices, leading to significantly lower team skill disparities and better player game experience. To the best of our knowledge, proposed QuickSKill is the first framework that tackles the cold-start problem for traditional skill rating algorithms.
    A Latent Feature Analysis-based Approach for Spatio-Temporal Traffic Data Recovery. (arXiv:2208.07739v1 [eess.SP])
    Missing data is an inevitable and common problem in data-driven intelligent transportation systems (ITS). In the past decade, scholars have done many research on the recovery of missing traffic data, however how to make full use of spatio-temporal traffic patterns to improve the recovery performance is still an open problem. Aiming at the spatio-temporal characteristics of traffic speed data, this paper regards the recovery of missing data as a matrix completion problem, and proposes a spatio-temporal traffic data completion method based on hidden feature analysis, which discovers spatio-temporal patterns and underlying structures from incomplete data to complete the recovery task. Therefore, we introduce spatial and temporal correlation to capture the main underlying features of each dimension. Finally, these latent features are applied to recovery traffic data through latent feature analysis. The experimental and evaluation results show that the evaluation criterion value of the model is small, which indicates that the model has better performance. The results show that the model can accurately estimate the continuous missing data.
    Active Bucketized Learning for ACOPF Optimization Proxies. (arXiv:2208.07497v1 [cs.LG])
    This paper considers optimization proxies for Optimal Power Flow (OPF), i.e., machine-learning models that approximate the input/output relationship of OPF. Recent work has focused on showing that such proxies can be of high fidelity. However, their training requires significant data, each instance necessitating the (offline) solving of an OPF for a sample of the input distribution. To meet the requirements of market-clearing applications, this paper proposes Active Bucketized Sampling (ABS), a novel active learning framework that aims at training the best possible OPF proxy within a time limit. ABS partitions the input distribution into buckets and uses an acquisition function to determine where to sample next. It relies on an adaptive learning rate that increases and decreases over time. Experimental results demonstrate the benefits of ABS.
    Subtype-Aware Dynamic Unsupervised Domain Adaptation. (arXiv:2208.07754v1 [cs.CV])
    Unsupervised domain adaptation (UDA) has been successfully applied to transfer knowledge from a labeled source domain to target domains without their labels. Recently introduced transferable prototypical networks (TPN) further addresses class-wise conditional alignment. In TPN, while the closeness of class centers between source and target domains is explicitly enforced in a latent space, the underlying fine-grained subtype structure and the cross-domain within-class compactness have not been fully investigated. To counter this, we propose a new approach to adaptively perform a fine-grained subtype-aware alignment to improve performance in the target domain without the subtype label in both domains. The insight of our approach is that the unlabeled subtypes in a class have the local proximity within a subtype, while exhibiting disparate characteristics, because of different conditional and label shifts. Specifically, we propose to simultaneously enforce subtype-wise compactness and class-wise separation, by utilizing intermediate pseudo-labels. In addition, we systematically investigate various scenarios with and without prior knowledge of subtype numbers, and propose to exploit the underlying subtype structure. Furthermore, a dynamic queue framework is developed to evolve the subtype cluster centroids steadily using an alternative processing scheme. Experimental results, carried out with multi-view congenital heart disease data and VisDA and DomainNet, show the effectiveness and validity of our subtype-aware UDA, compared with state-of-the-art UDA methods.
    Modeling Occasion Evolution in Frequency Domain for Promotion-Aware Click-Through Rate Prediction. (arXiv:2112.13747v3 [cs.LG] UPDATED)
    Promotions are becoming very important and frequent in e-commerce platforms to attract customers and boost sales, resulting in various occasions which drive users to behave differently. Due to the frequent changes of occasions, existing Click-Through Rate (CTR) prediction methods are not able to generalize well to online serving because the data distribution is uncertain. Besides, with training data collected from different occasions, the assumption of identical distribution does not hold, imposing extra difficulties on model learning. In this paper, we propose a novel CTR model named MOEF for recommendation under frequent changes of occasions. Firstly, we generate occasion signals from the online business scenario with a proper sampling interval. For each occasion signal, we obtain a sequence of frequency spectrum via Fast Fourier Transformation applied on sliding time windows. Occasion signals are more discriminative in frequency domain, so we can model occasion evolution with sequences of frequency spectrum via LSTM more easily to learn a better occasion representation, helping tackle the online distribution uncertainty. To ease the difficulties of model learning introduced by non-identically distributed training data, we adopt multiple experts to learn feature representations from multiple aspects, which are guided by the occasion representation via an attention mechanism. Accordingly, a mixture of feature representations is obtained adaptively for different occasions and used for the final CTR prediction. Experimental results on real-world datasets validate the superiority of our MOEF model. Online A/B tests also show MOEF achieves significant gains of 4.23% on CTR and 6.47% on IPV during promotion periods as well as 4.61% and 6.96% in normal days, respectively. The code will be made publicly available.
    $L^p$ sampling numbers for the Fourier-analytic Barron space. (arXiv:2208.07605v1 [math.FA])
    In this paper, we consider Barron functions $f : [0,1]^d \to \mathbb{R}$ of smoothness $\sigma > 0$, which are functions that can be written as \[ f(x) = \int_{\mathbb{R}^d} F(\xi) \, e^{2 \pi i \langle x, \xi \rangle} \, d \xi \quad \text{with} \quad \int_{\mathbb{R}^d} |F(\xi)| \cdot (1 + |\xi|)^{\sigma} \, d \xi < \infty. \] For $\sigma = 1$, these functions play a prominent role in machine learning, since they can be efficiently approximated by (shallow) neural networks without suffering from the curse of dimensionality. For these functions, we study the following question: Given $m$ point samples $f(x_1),\dots,f(x_m)$ of an unknown Barron function $f : [0,1]^d \to \mathbb{R}$ of smoothness $\sigma$, how well can $f$ be recovered from these samples, for an optimal choice of the sampling points and the reconstruction procedure? Denoting the optimal reconstruction error measured in $L^p$ by $s_m (\sigma; L^p)$, we show that \[ m^{- \frac{1}{\max \{ p,2 \}} - \frac{\sigma}{d}} \lesssim s_m(\sigma;L^p) \lesssim (\ln (e + m))^{\alpha(\sigma,d) / p} \cdot m^{- \frac{1}{\max \{ p,2 \}} - \frac{\sigma}{d}} , \] where the implied constants only depend on $\sigma$ and $d$ and where $\alpha(\sigma,d)$ stays bounded as $d \to \infty$.
    On Optimizing Back-Substitution Methods for Neural Network Verification. (arXiv:2208.07669v1 [cs.LG])
    With the increasing application of deep learning in mission-critical systems, there is a growing need to obtain formal guarantees about the behaviors of neural networks. Indeed, many approaches for verifying neural networks have been recently proposed, but these generally struggle with limited scalability or insufficient accuracy. A key component in many state-of-the-art verification schemes is computing lower and upper bounds on the values that neurons in the network can obtain for a specific input domain -- and the tighter these bounds, the more likely the verification is to succeed. Many common algorithms for computing these bounds are variations of the symbolic-bound propagation method; and among these, approaches that utilize a process called back-substitution are particularly successful. In this paper, we present an approach for making back-substitution produce tighter bounds. To achieve this, we formulate and then minimize the imprecision errors incurred during back-substitution. Our technique is general, in the sense that it can be integrated into numerous existing symbolic-bound propagation techniques, with only minor modifications. We implement our approach as a proof-of-concept tool, and present favorable results compared to state-of-the-art verifiers that perform back-substitution.
    On Efficient Real-Time Semantic Segmentation: A Survey. (arXiv:2206.08605v2 [cs.CV] UPDATED)
    Semantic segmentation is the problem of assigning a class label to every pixel in an image, and is an important component of an autonomous vehicle vision stack for facilitating scene understanding and object detection. However, many of the top performing semantic segmentation models are extremely complex and cumbersome, and as such are not suited to deployment onboard autonomous vehicle platforms where computational resources are limited and low-latency operation is a vital requirement. In this survey, we take a thorough look at the works that aim to address this misalignment with more compact and efficient models capable of deployment on low-memory embedded systems while meeting the constraint of real-time inference. We discuss several of the most prominent works in the field, placing them within a taxonomy based on their major contributions, and finally we evaluate the inference speed of the discussed models under consistent hardware and software setups that represent a typical research environment with high-end GPU and a realistic deployed scenario using low-memory embedded GPU hardware. Our experimental results demonstrate that many works are capable of real-time performance on resource-constrained hardware, while illustrating the consistent trade-off between latency and accuracy.
    Parametric Scattering Networks. (arXiv:2107.09539v4 [cs.LG] UPDATED)
    The wavelet scattering transform creates geometric invariants and deformation stability. In multiple signal domains, it has been shown to yield more discriminative representations compared to other non-learned representations and to outperform learned representations in certain tasks, particularly on limited labeled data and highly structured signals. The wavelet filters used in the scattering transform are typically selected to create a tight frame via a parameterized mother wavelet. In this work, we investigate whether this standard wavelet filterbank construction is optimal. Focusing on Morlet wavelets, we propose to learn the scales, orientations, and aspect ratios of the filters to produce problem-specific parameterizations of the scattering transform. We show that our learned versions of the scattering transform yield significant performance gains in small-sample classification settings over the standard scattering transform. Moreover, our empirical results suggest that traditional filterbank constructions may not always be necessary for scattering transforms to extract effective representations.
    Efficient Multimodal Transformer with Dual-Level Feature Restoration for Robust Multimodal Sentiment Analysis. (arXiv:2208.07589v1 [cs.LG])
    With the proliferation of user-generated online videos, Multimodal Sentiment Analysis (MSA) has attracted increasing attention recently. Despite significant progress, there are still two major challenges on the way towards robust MSA: 1) inefficiency when modeling cross-modal interactions in unaligned multimodal data; and 2) vulnerability to random modality feature missing which typically occurs in realistic settings. In this paper, we propose a generic and unified framework to address them, named Efficient Multimodal Transformer with Dual-Level Feature Restoration (EMT-DLFR). Concretely, EMT employs utterance-level representations from each modality as the global multimodal context to interact with local unimodal features and mutually promote each other. It not only avoids the quadratic scaling cost of previous local-local cross-modal interaction methods but also leads to better performance. To improve model robustness in the incomplete modality setting, on the one hand, DLFR performs low-level feature reconstruction to implicitly encourage the model to learn semantic information from incomplete data. On the other hand, it innovatively regards complete and incomplete data as two different views of one sample and utilizes siamese representation learning to explicitly attract their high-level representations. Comprehensive experiments on three popular datasets demonstrate that our method achieves superior performance in both complete and incomplete modality settings.
    A Large-Scale Dataset of Twitter Chatter about Online Learning during the Current COVID-19 Omicron Wave. (arXiv:2208.07810v1 [cs.SI])
    The COVID-19 Omicron variant, reported to be the most immune evasive variant of COVID-19, is resulting in a surge of COVID-19 cases globally. This has caused schools, colleges, and universities in different parts of the world to transition to online learning. As a result, social media platforms such as Twitter are seeing an increase in conversations related to online learning in the form of tweets. Mining such tweets to develop a dataset can serve as a data resource for different applications and use-cases related to the analysis of interest, views, opinions, perspectives, attitudes, and feedback towards online learning during the current surge of COVID-19 cases caused by the Omicron variant. Therefore, this work presents a large-scale open-access Twitter dataset of conversations about online learning from different parts of the world since the first detected case of the COVID-19 Omicron variant in November 2021. The dataset is compliant with the privacy policy, developer agreement, and guidelines for content redistribution of Twitter, as well as with the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) principles for scientific data management. The paper also briefly outlines some potential applications in the fields of Big Data, Data Mining, Natural Language Processing, and their related disciplines, with a specific focus on online learning during this Omicron wave that may be studied, explored, and investigated by using this dataset.
    Predicting student performance using sequence classification with time-based windows. (arXiv:2208.07749v1 [cs.LG])
    A growing number of universities worldwide use various forms of online and blended learning as part of their academic curricula. Furthermore, the recent changes caused by the COVID-19 pandemic have led to a drastic increase in importance and ubiquity of online education. Among the major advantages of e-learning is not only improving students' learning experience and widening their educational prospects, but also an opportunity to gain insights into students' learning processes with learning analytics. This study contributes to the topic of improving and understanding e-learning processes in the following ways. First, we demonstrate that accurate predictive models can be built based on sequential patterns derived from students' behavioral data, which are able to identify underperforming students early in the course. Second, we investigate the specificity-generalizability trade-off in building such predictive models by investigating whether predictive models should be built for every course individually based on course-specific sequential patterns, or across several courses based on more general behavioral patterns. Finally, we present a methodology for capturing temporal aspects in behavioral data and analyze its influence on the predictive performance of the models. The results of our improved sequence classification technique are capable to predict student performance with high levels of accuracy, reaching 90 percent for course-specific models.
    Notes on Worst-case Inefficiency of Gradient Descent Even in R^2. (arXiv:2008.07513v2 [cs.LG] UPDATED)
    Gradient descent is a popular algorithm in optimization, and its performance in convex settings is mostly well understood. In non-convex settings, it has been shown that gradient descent is able to escape saddle points asymptotically and converge to local minimizers [Lee et. al. 2016]. Recent studies also show a perturbed version of gradient descent is enough to escape saddle points efficiently [Jin et. al. 2015, Ge et. al. 2017]. In this paper we show a negative result: gradient descent may take exponential time to escape saddle points, with non-pathological two dimensional functions. While our focus is theoretical, we also conduct experiments verifying our theoretical result. Through our analysis we demonstrate that stochasticity is essential to escape saddle points efficiently.
    Mask and Reason: Pre-Training Knowledge Graph Transformers for Complex Logical Queries. (arXiv:2208.07638v1 [cs.LG])
    Knowledge graph (KG) embeddings have been a mainstream approach for reasoning over incomplete KGs. However, limited by their inherently shallow and static architectures, they can hardly deal with the rising focus on complex logical queries, which comprise logical operators, imputed edges, multiple source entities, and unknown intermediate entities. In this work, we present the Knowledge Graph Transformer (kgTransformer) with masked pre-training and fine-tuning strategies. We design a KG triple transformation method to enable Transformer to handle KGs, which is further strengthened by the Mixture-of-Experts (MoE) sparse activation. We then formulate the complex logical queries as masked prediction and introduce a two-stage masked pre-training strategy to improve transferability and generalizability. Extensive experiments on two benchmarks demonstrate that kgTransformer can consistently outperform both KG embedding-based baselines and advanced encoders on nine in-domain and out-of-domain reasoning tasks. Additionally, kgTransformer can reason with explainability via providing the full reasoning paths to interpret given answers.
    Deletion Robust Non-Monotone Submodular Maximization over Matroids. (arXiv:2208.07582v1 [cs.DS])
    Maximizing a submodular function is a fundamental task in machine learning and in this paper we study the deletion robust version of the problem under the classic matroids constraint. Here the goal is to extract a small size summary of the dataset that contains a high value independent set even after an adversary deleted some elements. We present constant-factor approximation algorithms, whose space complexity depends on the rank $k$ of the matroid and the number $d$ of deleted elements. In the centralized setting we present a $(4.597+O(\varepsilon))$-approximation algorithm with summary size $O( \frac{k+d}{\varepsilon^2}\log \frac{k}{\varepsilon})$ that is improved to a $(3.582+O(\varepsilon))$-approximation with $O(k + \frac{d}{\varepsilon^2}\log \frac{k}{\varepsilon})$ summary size when the objective is monotone. In the streaming setting we provide a $(9.435 + O(\varepsilon))$-approximation algorithm with summary size and memory $O(k + \frac{d}{\varepsilon^2}\log \frac{k}{\varepsilon})$; the approximation factor is then improved to $(5.582+O(\varepsilon))$ in the monotone case.
    High Fidelity Visualization of What Your Self-Supervised Representation Knows About. (arXiv:2112.09164v2 [cs.LG] UPDATED)
    Discovering what is learned by neural networks remains a challenge. In self-supervised learning, classification is the most common task used to evaluate how good a representation is. However, relying only on such downstream task can limit our understanding of what information is retained in the representation of a given input. In this work, we showcase the use of a Representation Conditional Diffusion Model (RCDM) to visualize in data space the representations learned by self-supervised models. The use of RCDM is motivated by its ability to generate high-quality samples -- on par with state-of-the-art generative models -- while ensuring that the representations of those samples are faithful i.e. close to the one used for conditioning. By using RCDM to analyze self-supervised models, we are able to clearly show visually that i) SSL (backbone) representation are not invariant to the data augmentations they were trained with -- thus debunking an often restated but mistaken belief; ii) SSL post-projector embeddings appear indeed invariant to these data augmentation, along with many other data symmetries; iii) SSL representations appear more robust to small adversarial perturbation of their inputs than representations trained in a supervised manner; and iv) that SSL-trained representations exhibit an inherent structure that can be explored thanks to RCDM visualization and enables image manipulation.
    Representation Learning on Graphs to Identifying Circular Trading in Goods and Services Tax. (arXiv:2208.07660v1 [cs.LG])
    Circular trading is a form of tax evasion in Goods and Services Tax where a group of fraudulent taxpayers (traders) aims to mask illegal transactions by superimposing several fictitious transactions (where no value is added to the goods or service) among themselves in a short period. Due to the vast database of taxpayers, it is infeasible for authorities to manually identify groups of circular traders and the illegitimate transactions they are involved in. This work uses big data analytics and graph representation learning techniques to propose a framework to identify communities of circular traders and isolate the illegitimate transactions in the respective communities. Our approach is tested on real-life data provided by the Department of Commercial Taxes, Government of Telangana, India, where we uncovered several communities of circular traders.
    Towards Certified Robustness of Distance Metric Learning. (arXiv:2006.05945v2 [stat.ML] UPDATED)
    Metric learning aims to learn a distance metric such that semantically similar instances are pulled together while dissimilar instances are pushed away. Many existing methods consider maximizing or at least constraining a distance margin in the feature space that separates similar and dissimilar pairs of instances to guarantee their generalization ability. In this paper, we advocate imposing an adversarial margin in the input space so as to improve the generalization and robustness of metric learning algorithms. We first show that, the adversarial margin, defined as the distance between training instances and their closest adversarial examples in the input space, takes account of both the distance margin in the feature space and the correlation between the metric and triplet constraints. Next, to enhance robustness to instance perturbation, we propose to enlarge the adversarial margin through minimizing a derived novel loss function termed the perturbation loss. The proposed loss can be viewed as a data-dependent regularizer and easily plugged into any existing metric learning methods. Finally, we show that the enlarged margin is beneficial to the generalization ability by using the theoretical technique of algorithmic robustness. Experimental results on 16 datasets demonstrate the superiority of the proposed method over existing state-of-the-art methods in both discrimination accuracy and robustness against possible noise.
    Decoupled Dynamic Spatial-Temporal Graph Neural Network for Traffic Forecasting. (arXiv:2206.09112v3 [cs.LG] UPDATED)
    We all depend on mobility, and vehicular transportation affects the daily lives of most of us. Thus, the ability to forecast the state of traffic in a road network is an important functionality and a challenging task. Traffic data is often obtained from sensors deployed in a road network. Recent proposals on spatial-temporal graph neural networks have achieved great progress at modeling complex spatial-temporal correlations in traffic data, by modeling traffic data as a diffusion process. However, intuitively, traffic data encompasses two different kinds of hidden time series signals, namely the diffusion signals and inherent signals. Unfortunately, nearly all previous works coarsely consider traffic signals entirely as the outcome of the diffusion, while neglecting the inherent signals, which impacts model performance negatively. To improve modeling performance, we propose a novel Decoupled Spatial-Temporal Framework (DSTF) that separates the diffusion and inherent traffic information in a data-driven manner, which encompasses a unique estimation gate and a residual decomposition mechanism. The separated signals can be handled subsequently by the diffusion and inherent modules separately. Further, we propose an instantiation of DSTF, Decoupled Dynamic Spatial-Temporal Graph Neural Network (D2STGNN), that captures spatial-temporal correlations and also features a dynamic graph learning module that targets the learning of the dynamic characteristics of traffic networks. Extensive experiments with four real-world traffic datasets demonstrate that the framework is capable of advancing the state-of-the-art.
    Training Latent Variable Models with Auto-encoding Variational Bayes: A Tutorial. (arXiv:2208.07818v1 [cs.LG])
    Auto-encoding Variational Bayes (AEVB) is a powerful and general algorithm for fitting latent variable models (a promising direction for unsupervised learning), and is well-known for training the Variational Auto-Encoder (VAE). In this tutorial, we focus on motivating AEVB from the classic Expectation Maximization (EM) algorithm, as opposed to from deterministic auto-encoders. Though natural and somewhat self-evident, the connection between EM and AEVB is not emphasized in the recent deep learning literature, and we believe that emphasizing this connection can improve the community's understanding of AEVB. In particular, we find it especially helpful to view (1) optimizing the evidence lower bound (ELBO) with respect to inference parameters as approximate E-step and (2) optimizing ELBO with respect to generative parameters as approximate M-step; doing both simultaneously as in AEVB is then simply tightening and pushing up ELBO at the same time. We discuss how approximate E-step can be interpreted as performing variational inference. Important concepts such as amortization and the reparametrization trick are discussed in great detail. Finally, we derive from scratch the AEVB training procedures of a non-deep and several deep latent variable models, including VAE, Conditional VAE, Gaussian Mixture VAE and Variational RNN. It is our hope that readers would recognize AEVB as a general algorithm that can be used to fit a wide range of latent variable models (not just VAE), and apply AEVB to such models that arise in their own fields of research. PyTorch code for all included models are publicly available.
    Combining Predictions under Uncertainty: The Case of Random Decision Trees. (arXiv:2208.07403v1 [cs.LG])
    A common approach to aggregate classification estimates in an ensemble of decision trees is to either use voting or to average the probabilities for each class. The latter takes uncertainty into account, but not the reliability of the uncertainty estimates (so to say, the "uncertainty about the uncertainty"). More generally, much remains unknown about how to best combine probabilistic estimates from multiple sources. In this paper, we investigate a number of alternative prediction methods. Our methods are inspired by the theories of probability, belief functions and reliable classification, as well as a principle that we call evidence accumulation. Our experiments on a variety of data sets are based on random decision trees which guarantees a high diversity in the predictions to be combined. Somewhat unexpectedly, we found that taking the average over the probabilities is actually hard to beat. However, evidence accumulation showed consistently better results on all but very small leafs.
    Machine Learning-Based Test Smell Detection. (arXiv:2208.07574v1 [cs.SE])
    Context: Test smells are symptoms of sub-optimal design choices adopted when developing test cases. Previous studies have proved their harmfulness for test code maintainability and effectiveness. Therefore, researchers have been proposing automated, heuristic-based techniques to detect them. However, the performance of such detectors is still limited and dependent on thresholds to be tuned. Objective: We propose the design and experimentation of a novel test smell detection approach based on machine learning to detect four test smells. Method: We plan to develop the largest dataset of manually-validated test smells. This dataset will be leveraged to train six machine learners and assess their capabilities in within- and cross-project scenarios. Finally, we plan to compare our approach with state-of-the-art heuristic-based techniques.
    Learning Operators with Ignore Effects for Bilevel Planning in Continuous Domains. (arXiv:2208.07737v1 [cs.AI])
    Bilevel planning, in which a high-level search over an abstraction of an environment is used to guide low-level decision making, is an effective approach to solving long-horizon tasks in continuous state and action spaces. Recent work has shown that action abstractions that enable such bilevel planning can be learned in the form of symbolic operators and neural samplers given symbolic predicates and demonstrations that achieve known goals. In this work, we show that existing approaches fall short in environments where actions tend to cause a large number of predicates to change. To address this issue, we propose to learn operators with ignore effects. The key idea motivating our approach is that modeling every observed change in the predicates is unnecessary; the only changes that need be modeled are those that are necessary for high-level search to achieve the specified goal. Experimentally, we show that our approach is able to learn operators with ignore effects across six hybrid robotic domains that enable an agent to solve novel variations of a task, with different initial states, goals, and numbers of objects, significantly more efficiently than several baselines.
    An initial alignment between neural network and target is needed for gradient descent to learn. (arXiv:2202.12846v2 [cs.LG] UPDATED)
    This paper introduces the notion of ``Initial Alignment'' (INAL) between a neural network at initialization and a target function. It is proved that if a network and a Boolean target function do not have a noticeable INAL, then noisy gradient descent on a fully connected network with normalized i.i.d. initialization will not learn in polynomial time. Thus a certain amount of knowledge about the target (measured by the INAL) is needed in the architecture design. This also provides an answer to an open problem posed in [AS20]. The results are based on deriving lower-bounds for descent algorithms on symmetric neural networks without explicit knowledge of the target function beyond its INAL.
    CARD: Classification and Regression Diffusion Models. (arXiv:2206.07275v2 [stat.ML] UPDATED)
    Learning the distribution of a continuous or categorical response variable $\boldsymbol y$ given its covariates $\boldsymbol x$ is a fundamental problem in statistics and machine learning. Deep neural network-based supervised learning algorithms have made great progress in predicting the mean of $\boldsymbol y$ given $\boldsymbol x$, but they are often criticized for their ability to accurately capture the uncertainty of their predictions. In this paper, we introduce classification and regression diffusion (CARD) models, which combine a denoising diffusion-based conditional generative model and a pre-trained conditional mean estimator, to accurately predict the distribution of $\boldsymbol y$ given $\boldsymbol x$. We demonstrate the outstanding ability of CARD in conditional distribution prediction with both toy examples and real-world datasets, the experimental results on which show that CARD in general outperforms state-of-the-art methods, including Bayesian neural network-based ones that are designed for uncertainty estimation, especially when the conditional distribution of $\boldsymbol y$ given $\boldsymbol x$ is multi-modal.
    TAR: Neural Logical Reasoning across TBox and ABox. (arXiv:2205.14591v2 [cs.AI] UPDATED)
    Many ontologies, i.e., Description Logic (DL) knowledge bases, have been developed to provide rich knowledge about various domains. An ontology consists of an ABox, i.e., assertion axioms between two entities or between a concept and an entity, and a TBox, i.e., terminology axioms between two concepts. Neural logical reasoning (NLR) is a fundamental task to explore such knowledge bases, which aims at answering multi-hop queries with logical operations based on distributed representations of queries and answers. While previous NLR methods can give specific entity-level answers, i.e., ABox answers, they are not able to provide descriptive concept-level answers, i.e., TBox answers, where each concept is a description of a set of entities. In other words, previous NLR methods only reason over the ABox of an ontology while ignoring the TBox. In particular, providing TBox answers enables inferring the explanations of each query with descriptive concepts, which make answers comprehensible to users and are of great usefulness in the field of applied ontology. In this work, we formulate the problem of neural logical reasoning across TBox and ABox (TA-NLR), solving which needs to address challenges in incorporating, representing, and operating on concepts. We propose an original solution named TAR for TA-NLR. Firstly, we incorporate description logic based ontological axioms to provide the source of concepts. Then, we represent concepts and queries as fuzzy sets, i.e., sets whose elements have degrees of membership, to bridge concepts and queries with entities. Moreover, we design operators involving concepts on top of fuzzy set representation of concepts and queries for optimization and inference. Extensive experimental results on two real-world datasets demonstrate the effectiveness of TAR for TA-NLR.
    Estimating the Mixing Time of Ergodic Markov Chains. (arXiv:1902.01224v4 [math.ST] UPDATED)
    We address the problem of estimating the mixing time $t_{\mathsf{mix}}$ of an arbitrary ergodic finite-state Markov chain from a single trajectory of length $m$. The reversible case was addressed by Hsu et al. [2019], who left the general case as an open problem. In the reversible case, the analysis is greatly facilitated by the fact that the Markov operator is self-adjoint, and Weyl's inequality allows for a dimension-free perturbation analysis of the empirical eigenvalues. As Hsu et al. point out, in the absence of reversibility (which induces asymmetric pair probabilities matrices), the existing perturbation analysis has a worst-case exponential dependence on the number of states $d$. Furthermore, even if an eigenvalue perturbation analysis with better dependence on $d$ were available, in the non-reversible case the connection between the spectral gap and the mixing time is not nearly as straightforward as in the reversible case. Our key insight is to estimate the pseudo-spectral gap $\gamma_{\mathsf{ps}}$ instead, which allows us to overcome the loss of symmetry and to achieve a polynomial dependence on the minimal stationary probability $\pi_\star$ and $\gamma_{\mathsf{ps}}$. Additionally, in the reversible case, we obtain simultaneous nearly (up to logarithmic factors) minimax rates in $t_{\mathsf{mix}}$ and precision $\varepsilon$, closing a gap in Hsu et al., who treated $\varepsilon$ as constant in the lower bounds. Finally, we construct fully empirical confidence intervals for $\gamma_{\mathsf{ps}}$, which shrink to zero at a rate of roughly $1/\sqrt{m}$, and improve the state of the art in even the reversible case.
    Knowledge-Injected Federated Learning. (arXiv:2208.07530v1 [cs.LG])
    Federated learning is an emerging technique for training models from decentralized data sets. In many applications, data owners participating in the federated learning system hold not only the data but also a set of domain knowledge. Such knowledge includes human know-how and craftsmanship that can be extremely helpful to the federated learning task. In this work, we propose a federated learning framework that allows the injection of participants' domain knowledge, where the key idea is to refine the global model with knowledge locally. The scenario we consider is motivated by a real industry-level application, and we demonstrate the effectiveness of our approach to this application.
    Multi-Point Integrated Sensing and Communication: Fusion Model and Functionality Selection. (arXiv:2208.07592v1 [cs.IT])
    Integrated sensing and communication (ISAC) represents a paradigm shift, where previously competing wireless transmissions are jointly designed to operate in harmony via the shared use of the hardware platform for improving the spectral, energy, and hardware efficiencies. However, due to adversarial factors such as fading and blockages, ISAC without fusion may suffer from high sensing uncertainties. This paper presents a multi-point ISAC (MPISAC) system that fuses the outputs from multiple ISAC devices for achieving higher sensing performance by exploiting multi-radar data redundancy. Furthermore, we propose to effectively explore the performance trade-off between sensing and communication via a functionality selection module that adaptively determines the working state (i.e., sensing or communication) of an ISAC device. The crux of our approach is to adopt a fusion model that predicts the fusion accuracy via hypothesis testing and optimal voting analysis. Simulation results demonstrate the superiority of MPISAC over various benchmark schemes and show that the proposed approach can effectively span the trade-off region in ISAC systems.
    End-to-End Video-To-Speech Synthesis using Generative Adversarial Networks. (arXiv:2104.13332v3 [cs.LG] UPDATED)
    Video-to-speech is the process of reconstructing the audio speech from a video of a spoken utterance. Previous approaches to this task have relied on a two-step process where an intermediate representation is inferred from the video, and is then decoded into waveform audio using a vocoder or a waveform reconstruction algorithm. In this work, we propose a new end-to-end video-to-speech model based on Generative Adversarial Networks (GANs) which translates spoken video to waveform end-to-end without using any intermediate representation or separate waveform synthesis algorithm. Our model consists of an encoder-decoder architecture that receives raw video as input and generates speech, which is then fed to a waveform critic and a power critic. The use of an adversarial loss based on these two critics enables the direct synthesis of raw audio waveform and ensures its realism. In addition, the use of our three comparative losses helps establish direct correspondence between the generated audio and the input video. We show that this model is able to reconstruct speech with remarkable realism for constrained datasets such as GRID, and that it is the first end-to-end model to produce intelligible speech for LRW (Lip Reading in the Wild), featuring hundreds of speakers recorded entirely `in the wild'. We evaluate the generated samples in two different scenarios -- seen and unseen speakers -- using four objective metrics which measure the quality and intelligibility of artificial speech. We demonstrate that the proposed approach outperforms all previous works in most metrics on GRID and LRW.
    Role of Data Augmentation in Unsupervised Anomaly Detection. (arXiv:2208.07734v1 [cs.LG])
    Self-supervised learning (SSL) has emerged as a promising alternative to create supervisory signals to real-world tasks, avoiding extensive cost of careful labeling. SSL is particularly attractive for unsupervised problems such as anomaly detection (AD), where labeled anomalies are costly to secure, difficult to simulate, or even nonexistent. A large catalog of augmentation functions have been used for SSL-based AD (SSAD), and recent works have observed that the type of augmentation has a significant impact on performance. Motivated by those, this work sets out to put SSAD under a larger lens and carefully investigate the role of data augmentation in AD through extensive experiments on many testbeds. Our main finding is that self-supervision acts as a yet-another model hyperparameter, and should be chosen carefully in regards to the nature of true anomalies in the data. That is, the alignment between the augmentation and the underlying anomaly-generating mechanism is the key for the success of SSAD, and in the lack thereof, SSL can even impair (!) detection performance. Moving beyond proposing another SSAD method, our study contributes to the better understanding of this growing area and lays out new directions for future research.
    GNNear: Accelerating Full-Batch Training of Graph Neural Networks with Near-Memory Processing. (arXiv:2111.00680v2 [cs.LG] UPDATED)
    Recently, Graph Neural Networks (GNNs) have become state-of-the-art algorithms for analyzing non-euclidean graph data. However, to realize efficient GNN training is challenging, especially on large graphs. The reasons are many-folded: 1) GNN training incurs a substantial memory footprint. Full-batch training on large graphs even requires hundreds to thousands of gigabytes of memory. 2) GNN training involves both memory-intensive and computation-intensive operations, challenging current CPU/GPU platforms. 3) The irregularity of graphs can result in severe resource under-utilization and load-imbalance problems. This paper presents a GNNear accelerator to tackle these challenges. GNNear adopts a DIMM-based memory system to provide sufficient memory capacity. To match the heterogeneous nature of GNN training, we offload the memory-intensive Reduce operations to in-DIMM Near-Memory-Engines (NMEs), making full use of the high aggregated local bandwidth. We adopt a Centralized-Acceleration-Engine (CAE) to process the computation-intensive Update operations. We further propose several optimization strategies to deal with the irregularity of input graphs and improve GNNear's performance. Comprehensive evaluations on 16 GNN training tasks demonstrate that GNNear achieves 30.8$\times$/2.5$\times$ geomean speedup and 79.6$\times$/7.3$\times$(geomean) higher energy efficiency compared to Xeon E5-2698-v4 CPU and NVIDIA V100 GPU.
    CYBORGS: Contrastively Bootstrapping Object Representations by Grounding in Segmentation. (arXiv:2203.09343v2 [cs.CV] UPDATED)
    Many recent approaches in contrastive learning have worked to close the gap between pretraining on iconic images like ImageNet and pretraining on complex scenes like COCO. This gap exists largely because commonly used random crop augmentations obtain semantically inconsistent content in crowded scene images of diverse objects. Previous works use preprocessing pipelines to localize salient objects for improved cropping, but an end-to-end solution is still elusive. In this work, we propose a framework which accomplishes this goal via joint learning of representations and segmentation. We leverage segmentation masks to train a model with a mask-dependent contrastive loss, and use the partially trained model to bootstrap better masks. By iterating between these two components, we ground the contrastive updates in segmentation information, and simultaneously improve segmentation throughout pretraining. Experiments show our representations transfer robustly to downstream tasks in classification, detection and segmentation.
    CTI4AI: Threat Intelligence Generation and Sharing after Red Teaming AI Models. (arXiv:2208.07476v1 [cs.CR])
    As the practicality of Artificial Intelligence (AI) and Machine Learning (ML) based techniques grow, there is an ever increasing threat of adversarial attacks. There is a need to red team this ecosystem to identify system vulnerabilities, potential threats, characterize properties that will enhance system robustness, and encourage the creation of effective defenses. A secondary need is to share this AI security threat intelligence between different stakeholders like, model developers, users, and AI/ML security professionals. In this paper, we create and describe a prototype system CTI4AI, to overcome the need to methodically identify and share AI/ML specific vulnerabilities and threat intelligence.
    Federated Learning with Partial Model Personalization. (arXiv:2204.03809v2 [cs.LG] UPDATED)
    We consider two federated learning algorithms for training partially personalized models, where the shared and personal parameters are updated either simultaneously or alternately on the devices. Both algorithms have been proposed in the literature, but their convergence properties are not fully understood, especially for the alternating variant. We provide convergence analyses of both algorithms in the general nonconvex setting with partial participation and delineate the regime where one dominates the other. Our experiments on real-world image, text, and speech datasets demonstrate that (a) partial personalization can obtain most of the benefits of full model personalization with a small fraction of personal parameters, and, (b) the alternating update algorithm often outperforms the simultaneous update algorithm by a small but consistent margin.
    Generating a Terrain-Robustness Benchmark for Legged Locomotion: A Prototype via Terrain Authoring and Active Learning. (arXiv:2208.07681v1 [cs.RO])
    Terrain-aware locomotion has become an emerging topic in legged robotics. However, it is hard to generate challenging and realistic terrains in simulation, which limits the way researchers evaluate their locomotion policies. In this paper, we prototype the generation of a terrain dataset via terrain authoring and active learning, and the learned samplers can stably generate diverse high-quality terrains. Hopefully, the generated dataset can make a terrain-robustness benchmark for legged locomotion. The dataset and the code implementation are released at https://bit.ly/3bn4j7f.
    Deep learning for enhanced free-space optical communications. (arXiv:2208.07712v1 [cs.LG])
    Atmospheric effects, such as turbulence and background thermal noise, inhibit the propagation of coherent light used in ON-OFF keying free-space optical communication. Here we present and experimentally validate a convolutional neural network to reduce the bit error rate of free-space optical communication in post-processing that is significantly simpler and cheaper than existing solutions based on advanced optics. Our approach consists of two neural networks, the first determining the presence of coherent bit sequences in thermal noise and turbulence and the second demodulating the coherent bit sequences. All data used for training and testing our network is obtained experimentally by generating ON-OFF keying bit streams of coherent light, combining these with thermal light, and passing the resultant light through a turbulent water tank which we have verified mimics turbulence in the air to a high degree of accuracy. Our convolutional neural network improves detection accuracy over threshold classification schemes and has the capability to be integrated with current demodulation and error correction schemes.
    Towards Domain-Independent and Real-Time Gesture Recognition Using mmWave Signal. (arXiv:2111.06195v2 [cs.CV] UPDATED)
    Human gesture recognition using millimeter-wave (mmWave) signals provides attractive applications including smart home and in-car interfaces. While existing works achieve promising performance under controlled settings, practical applications are still limited due to the need of intensive data collection, extra training efforts when adapting to new domains, and poor performance for real-time recognition. In this paper, we propose DI-Gesture, a domain-independent and real-time mmWave gesture recognition system. Specifically, we first derive signal variations corresponding to human gestures with spatial-temporal processing. To enhance the robustness of the system and reduce data collecting efforts, we design a data augmentation framework for mmWave signals based on correlations between signal patterns and gesture variations. Furthermore, a spatial-temporal gesture segmentation algorithm is employed for real-time recognition. Extensive experimental results show DI-Gesture achieves an average accuracy of 97.92\%, 99.18\%, and 98.76\% for new users, environments, and locations, respectively. We also evaluate DI-Gesture in challenging scenarios like real-time recognition and sensing at extreme angles, all of which demonstrates the superior robustness and effectiveness of our system.
    SVTS: Scalable Video-to-Speech Synthesis. (arXiv:2205.02058v2 [cs.SD] UPDATED)
    Video-to-speech synthesis (also known as lip-to-speech) refers to the translation of silent lip movements into the corresponding audio. This task has received an increasing amount of attention due to its self-supervised nature (i.e., can be trained without manual labelling) combined with the ever-growing collection of audio-visual data available online. Despite these strong motivations, contemporary video-to-speech works focus mainly on small- to medium-sized corpora with substantial constraints in both vocabulary and setting. In this work, we introduce a scalable video-to-speech framework consisting of two components: a video-to-spectrogram predictor and a pre-trained neural vocoder, which converts the mel-frequency spectrograms into waveform audio. We achieve state-of-the art results for GRID and considerably outperform previous approaches on LRW. More importantly, by focusing on spectrogram prediction using a simple feedforward model, we can efficiently and effectively scale our method to very large and unconstrained datasets: To the best of our knowledge, we are the first to show intelligible results on the challenging LRS3 dataset.
    Supernet Training for Federated Image Classification under System Heterogeneity. (arXiv:2206.01366v4 [cs.LG] UPDATED)
    Efficient deployment of deep neural networks across many devices and resource constraints, especially on edge devices, is one of the most challenging problems in the presence of data-privacy preservation issues. Conventional approaches have evolved to either improve a single global model while keeping each local training data decentralized (i.e., data-heterogeneity) or to train a once-for-all network that supports diverse architectural settings to address heterogeneous systems equipped with different computational capabilities (i.e., model-heterogeneity). However, little research has considered both directions simultaneously. In this work, we propose a novel framework to consider both scenarios, namely Federation of Supernet Training (FedSup), where clients send and receive a supernet whereby it contains all possible architectures sampled from itself. It is inspired by how averaging parameters in the model aggregation stage of Federated Learning (FL) is similar to weight-sharing in supernet training. Specifically, in the FedSup framework, a weight-sharing approach widely used in the training single shot model is combined with the averaging of Federated Learning (FedAvg). Under our framework, we present an efficient algorithm (E-FedSup) by sending the sub-model to clients in the broadcast stage for reducing communication costs and training overhead. We demonstrate several strategies to enhance supernet training in the FL environment and conduct extensive empirical evaluations. The resulting framework is shown to pave the way for the robustness of both data- and model-heterogeneity on several standard benchmarks.
    Enhancing Dynamic Mode Decomposition Workflow with In-Situ Visualization and Data Compression. (arXiv:2208.07767v1 [cs.GR])
    Modern computational science and engineering applications are being improved by the advances in scientific machine learning. Data-driven methods such as Dynamic Mode Decomposition (DMD) can extract coherent structures from spatio-temporal data generated from dynamical systems and infer different scenarios for said systems. The spatio-temporal data comes as snapshots containing spatial information for each time instant. In modern engineering applications, the generation of high-dimensional snapshots can be time and/or resource-demanding. In the present study, we consider two strategies for enhancing DMD workflow in large numerical simulations: (i) snapshots compression to relieve disk pressure; (ii) the use of in situ visualization images to reconstruct the dynamics (or part of) in runtime. We evaluate our approaches with two 3D fluid dynamics simulations and consider DMD to reconstruct the solutions. Results reveal that snapshot compression considerably reduces the required disk space. We have observed that lossy compression reduces storage by almost $50\%$ with low relative errors in the signal reconstructions and other quantities of interest. We also extend our analysis to data generated on-the-fly, using in-situ visualization tools to generate image files of our state vectors during runtime. On large simulations, the generation of snapshots may be slow enough to use batch algorithms for inference. Streaming DMD takes advantage of the incremental SVD algorithm and updates the modes with the arrival of each new snapshot. We use streaming DMD to reconstruct the dynamics from in-situ generated images. We show that this process is efficient, and the reconstructed dynamics are accurate.
    SupMAE: Supervised Masked Autoencoders Are Efficient Vision Learners. (arXiv:2205.14540v2 [cs.CV] UPDATED)
    Recently, self-supervised Masked Autoencoders (MAE) have attracted unprecedented attention for their impressive representation learning ability. However, the pretext task, Masked Image Modeling (MIM), reconstructs the missing local patches, lacking the global understanding of the image. This paper extends MAE to a fully-supervised setting by adding a supervised classification branch, thereby enabling MAE to effectively learn global features from golden labels. The proposed Supervised MAE (SupMAE) only exploits a visible subset of image patches for classification, unlike the standard supervised pre-training where all image patches are used. Through experiments, we demonstrate that not only is SupMAE more training efficient but also it learns more robust and transferable features. Specifically, SupMAE achieves comparable performance with MAE using only 30% of compute when evaluated on ImageNet with the ViT-B/16 model. SupMAE's robustness on ImageNet variants and transfer learning performance outperforms MAE and standard supervised pre-training counterparts. Code will be made publicly available.
    Uncertainty-guided Source-free Domain Adaptation. (arXiv:2208.07591v1 [cs.CV])
    Source-free domain adaptation (SFDA) aims to adapt a classifier to an unlabelled target data set by only using a pre-trained source model. However, the absence of the source data and the domain shift makes the predictions on the target data unreliable. We propose quantifying the uncertainty in the source model predictions and utilizing it to guide the target adaptation. For this, we construct a probabilistic source model by incorporating priors on the network parameters inducing a distribution over the model predictions. Uncertainties are estimated by employing a Laplace approximation and incorporated to identify target data points that do not lie in the source manifold and to down-weight them when maximizing the mutual information on the target data. Unlike recent works, our probabilistic treatment is computationally lightweight, decouples source training and target adaptation, and requires no specialized source training or changes of the model architecture. We show the advantages of uncertainty-guided SFDA over traditional SFDA in the closed-set and open-set settings and provide empirical evidence that our approach is more robust to strong domain shifts even without tuning.
    An Overview and Prospective Outlook on Robust Training and Certification of Machine Learning Models. (arXiv:2208.07464v1 [cs.LG])
    In this discussion paper, we survey recent research surrounding robustness of machine learning models. As learning algorithms become increasingly more popular in data-driven control systems, their robustness to data uncertainty must be ensured in order to maintain reliable safety-critical operations. We begin by reviewing common formalisms for such robustness, and then move on to discuss popular and state-of-the-art techniques for training robust machine learning models as well as methods for provably certifying such robustness. From this unification of robust machine learning, we identify and discuss pressing directions for future research in the area.
    Algorithmic Assistance with Recommendation-Dependent Preferences. (arXiv:2208.07626v1 [cs.LG])
    When we use algorithms to produce recommendations, we typically think of these recommendations as providing helpful information, such as when risk assessments are presented to judges or doctors. But when a decision-maker obtains a recommendation, they may not only react to the information. The decision-maker may view the recommendation as a default action, making it costly for them to deviate, for example when a judge is reluctant to overrule a high-risk assessment of a defendant or a doctor fears the consequences of deviating from recommended procedures. In this article, we consider the effect and design of recommendations when they affect choices not just by shifting beliefs, but also by altering preferences. We motivate our model from institutional factors, such as a desire to avoid audits, as well as from well-established models in behavioral science that predict loss aversion relative to a reference point, which here is set by the algorithm. We show that recommendation-dependent preferences create inefficiencies where the decision-maker is overly responsive to the recommendation, which changes the optimal design of the algorithm towards providing less conservative recommendations. As a potential remedy, we discuss an algorithm that strategically withholds recommendations, and show how it can improve the quality of final decisions.
    Does lossy image compression affect racial bias within face recognition?. (arXiv:2208.07613v1 [cs.CV])
    Yes - This study investigates the impact of commonplace lossy image compression on face recognition algorithms with regard to the racial characteristics of the subject. We adopt a recently proposed racial phenotype-based bias analysis methodology to measure the effect of varying levels of lossy compression across racial phenotype categories. Additionally, we determine the relationship between chroma-subsampling and race-related phenotypes for recognition performance. Prior work investigates the impact of lossy JPEG compression algorithm on contemporary face recognition performance. However, there is a gap in how this impact varies with different race-related inter-sectional groups and the cause of this impact. Via an extensive experimental setup, we demonstrate that common lossy image compression approaches have a more pronounced negative impact on facial recognition performance for specific racial phenotype categories such as darker skin tones (by up to 34.55\%). Furthermore, removing chroma-subsampling during compression improves the false matching rate (up to 15.95\%) across all phenotype categories affected by the compression, including darker skin tones, wide noses, big lips, and monolid eye categories. In addition, we outline the characteristics that may be attributable as the underlying cause of such phenomenon for lossy compression algorithms such as JPEG.
    Certified Robustness via Randomized Smoothing over Multiplicative Parameters of Input Transformations. (arXiv:2106.14432v3 [cs.LG] UPDATED)
    Currently the most popular method of providing robustness certificates is randomized smoothing where an input is smoothed via some probability distribution. We propose a novel approach to randomized smoothing over multiplicative parameters. Using this method we construct certifiably robust classifiers with respect to a gamma correction perturbation and compare the result with classifiers obtained via other smoothing distributions (Gaussian, Laplace, uniform). The experiments show that asymmetrical Rayleigh distribution allows to obtain better certificates for some values of perturbation parameters. To the best of our knowledge it is the first work concerning certified robustness against the multiplicative gamma correction transformation and the first to study effects of asymmetrical distributions in randomized smoothing.
    On Sample Complexity of Offline Reinforcement Learning with Deep ReLU Networks in Besov Spaces. (arXiv:2103.06671v4 [stat.ML] UPDATED)
    Offline reinforcement learning (RL) leverages previously collected data for policy optimization without any further active exploration. Despite the recent interest in this problem, its theoretical results in neural network function approximation setting remain limited. In this paper, we study the statistical theory of offline RL with deep ReLU network function approximation. In particular, we establish the sample complexity of $\tilde{\mathcal{O}}\left( \kappa^{1 + d/\alpha} \cdot \epsilon^{-2 - 2d/\alpha} \right)$ for offline RL with deep ReLU networks, where $\kappa$ is a measure of distributional shift, $d$ is the dimension of the state-action space, $\alpha$ is a (possibly fractional) smoothness parameter of the underlying Markov decision process (MDP), and $\epsilon$ is a user-specified error. Notably, our sample complexity holds under two novel considerations, namely the Besov dynamic closure and the correlated structure that arises from value regression for offline RL. While the Besov dynamic closure generalizes the dynamic conditions for offline RL in the prior works, the correlated structure renders the prior works of offline RL with general/neural network function approximation improper or inefficient. To the best of our knowledge, this is the first theoretical characterization of the sample complexity of offline RL with deep neural network function approximation under the general Besov regularity condition that goes beyond the traditional Reproducing Hilbert kernel spaces and Neural Tangent Kernels.
    ProtoRes: Proto-Residual Network for Pose Authoring via Learned Inverse Kinematics. (arXiv:2106.01981v6 [cs.CV] UPDATED)
    Our work focuses on the development of a learnable neural representation of human pose for advanced AI assisted animation tooling. Specifically, we tackle the problem of constructing a full static human pose based on sparse and variable user inputs (e.g. locations and/or orientations of a subset of body joints). To solve this problem, we propose a novel neural architecture that combines residual connections with prototype encoding of a partially specified pose to create a new complete pose from the learned latent space. We show that our architecture outperforms a baseline based on Transformer, both in terms of accuracy and computational efficiency. Additionally, we develop a user interface to integrate our neural model in Unity, a real-time 3D development platform. Furthermore, we introduce two new datasets representing the static human pose modeling problem, based on high-quality human motion capture data, which will be released publicly along with model code.
    Semidefinite Programming versus Burer-Monteiro Factorization for Matrix Sensing. (arXiv:2208.07469v1 [math.OC])
    Many fundamental low-rank optimization problems, such as matrix completion, phase synchronization/retrieval, power system state estimation, and robust PCA, can be formulated as the matrix sensing problem. Two main approaches for solving matrix sensing are based on semidefinite programming (SDP) and Burer-Monteiro (B-M) factorization. The SDP method suffers from high computational and space complexities, whereas the B-M method may return a spurious solution due to the non-convexity of the problem. The existing theoretical guarantees for the success of these methods have led to similar conservative conditions, which may wrongly imply that these methods have comparable performances. In this paper, we shed light on some major differences between these two methods. First, we present a class of structured matrix completion problems for which the B-M methods fail with an overwhelming probability, while the SDP method works correctly. Second, we identify a class of highly sparse matrix completion problems for which the B-M method works and the SDP method fails. Third, we prove that although the B-M method exhibits the same performance independent of the rank of the unknown solution, the success of the SDP method is correlated to the rank of the solution and improves as the rank increases. Unlike the existing literature that has mainly focused on those instances of matrix sensing for which both SDP and B-M work, this paper offers the first result on the unique merit of each method over the alternative approach.
    New drugs and stock market: how to predict pharma market reaction to clinical trial announcements. (arXiv:2208.07248v2 [q-fin.ST] UPDATED)
    Pharmaceutical companies operate in a strictly regulated and highly risky environment in which a single slip can lead to serious financial implications. Accordingly, the announcements of clinical trial results tend to determine the future course of events, hence being closely monitored by the public. In this work, we provide statistical evidence for the result promulgation influence on the public pharma market value. Whereas most works focus on retrospective impact analysis, the present research aims to predict the numerical values of announcement-induced changes in stock prices. For this purpose, we develop a pipeline that includes a BERT-based model for extracting sentiment polarity of announcements, a Temporal Fusion Transformer for forecasting the expected return, a graph convolution network for capturing event relationships, and gradient boosting for predicting the price change. The challenge of the problem lies in inherently different patterns of responses to positive and negative announcements, reflected in a stronger and more pronounced reaction to the negative news. Moreover, such phenomenon as the drop in stocks after the positive announcements affirms the counterintuitiveness of the price behavior. Importantly, we discover two crucial factors that should be considered while working within a predictive framework. The first factor is the drug portfolio size of the company, indicating the greater susceptibility to an announcement in the case of small drug diversification. The second one is the network effect of the events related to the same company or nosology. All findings and insights are gained on the basis of one of the biggest FDA (the Food and Drug Administration) announcement datasets, consisting of 5436 clinical trial announcements from 681 companies over the last five years.
    Uconv-Conformer: High Reduction of Input Sequence Length for End-to-End Speech Recognition. (arXiv:2208.07657v1 [eess.AS])
    Optimization of modern ASR architectures is among the highest priority tasks since it saves many computational resources for model training and inference. The work proposes a new Uconv-Conformer architecture based on the standard Conformer model that consistently reduces the input sequence length by 16 times, which results in speeding up the work of the intermediate layers. To solve the convergence problem with such a significant reduction of the time dimension, we use upsampling blocks similar to the U-Net architecture to ensure the correct CTC loss calculation and stabilize network training. The Uconv-Conformer architecture appears to be not only faster in terms of training and inference but also shows better WER compared to the baseline Conformer. Our best Uconv-Conformer model showed 40.3% epoch training time reduction, 47.8%, and 23.5% inference acceleration on the CPU and GPU, respectively. Relative WER on Librispeech test_clean and test_other decreased by 7.3% and 9.2%.
    Delaunay-Triangulation-Based Learning with Hessian Total-Variation Regularization. (arXiv:2208.07787v1 [eess.SP])
    Regression is one of the core problems tackled in supervised learning. Rectified linear unit (ReLU) neural networks generate continuous and piecewise-linear (CPWL) mappings and are the state-of-the-art approach for solving regression problems. In this paper, we propose an alternative method that leverages the expressivity of CPWL functions. In contrast to deep neural networks, our CPWL parameterization guarantees stability and is interpretable. Our approach relies on the partitioning of the domain of the CPWL function by a Delaunay triangulation. The function values at the vertices of the triangulation are our learnable parameters and identify the CPWL function uniquely. Formulating the learning scheme as a variational problem, we use the Hessian total variation (HTV) as regularizer to favor CPWL functions with few affine pieces. In this way, we control the complexity of our model through a single hyperparameter. By developing a computational framework to compute the HTV of any CPWL function parameterized by a triangulation, we discretize the learning problem as the generalized least absolute shrinkage and selection operator (LASSO). Our experiments validate the usage of our method in low-dimensional scenarios.
    Deep convolutional surrogates and degrees of freedom in thermal design. (arXiv:2208.07482v1 [cs.LG])
    We present surrogate models for heat transfer and pressure drop prediction of complex fin geometries generated using composite Bezier curves. Thermal design process includes iterative high fidelity simulation which is complex, computationally expensive, and time-consuming. With the advancement in machine learning algorithms as well as Graphics Processing Units (GPUs), we can utilize the parallel processing architecture of GPUs rather than solely relying on CPUs to accelerate the thermo-fluid simulation. In this study, Convolutional Neural Networks (CNNs) are used to predict results of Computational Fluid Dynamics (CFD) directly from topologies saved as images. The case with a single fin as well as multiple morphable fins are studied. A comparison of Xception network and regular CNN is presented for the case with a single fin design. Results show that high accuracy in prediction is observed for single fin design particularly using Xception network. Increasing design freedom to multiple fins increases the error in prediction. This error, however, remains within three percent for pressure drop and heat transfer estimation which is valuable for design purpose.
    Model Optimization in Imbalanced Regression. (arXiv:2206.09991v2 [cs.LG] UPDATED)
    Imbalanced domain learning aims to produce accurate models in predicting instances that, though underrepresented, are of utmost importance for the domain. Research in this field has been mainly focused on classification tasks. Comparatively, the number of studies carried out in the context of regression tasks is negligible. One of the main reasons for this is the lack of loss functions capable of focusing on minimizing the errors of extreme (rare) values. Recently, an evaluation metric was introduced: Squared Error Relevance Area (SERA). This metric posits a bigger emphasis on the errors committed at extreme values while also accounting for the performance in the overall target variable domain, thus preventing severe bias. However, its effectiveness as an optimization metric is unknown. In this paper, our goal is to study the impacts of using SERA as an optimization criterion in imbalanced regression tasks. Using gradient boosting algorithms as proof of concept, we perform an experimental study with 36 data sets of different domains and sizes. Results show that models that used SERA as an objective function are practically better than the models produced by their respective standard boosting algorithms at the prediction of extreme values. This confirms that SERA can be embedded as a loss function into optimization-based learning algorithms for imbalanced regression scenarios.
    A Library for Representing Python Programs as Graphs for Machine Learning. (arXiv:2208.07461v1 [cs.LG])
    Graph representations of programs are commonly a central element of machine learning for code research. We introduce an open source Python library python_graphs that applies static analysis to construct graph representations of Python programs suitable for training machine learning models. Our library admits the construction of control-flow graphs, data-flow graphs, and composite ``program graphs'' that combine control-flow, data-flow, syntactic, and lexical information about a program. We present the capabilities and limitations of the library, perform a case study applying the library to millions of competitive programming submissions, and showcase the library's utility for machine learning research.
    Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere. (arXiv:2005.10242v10 [cs.LG] UPDATED)
    Contrastive representation learning has been outstandingly successful in practice. In this work, we identify two key properties related to the contrastive loss: (1) alignment (closeness) of features from positive pairs, and (2) uniformity of the induced distribution of the (normalized) features on the hypersphere. We prove that, asymptotically, the contrastive loss optimizes these properties, and analyze their positive effects on downstream tasks. Empirically, we introduce an optimizable metric to quantify each property. Extensive experiments on standard vision and language datasets confirm the strong agreement between both metrics and downstream task performance. Remarkably, directly optimizing for these two metrics leads to representations with comparable or better performance at downstream tasks than contrastive learning. Project Page: https://tongzhouwang.info/hypersphere Code: https://github.com/SsnL/align_uniform , https://github.com/SsnL/moco_align_uniform
    Unsupervised Domain Adaptation for Segmentation with Black-box Source Model. (arXiv:2208.07769v1 [cs.CV])
    Unsupervised domain adaptation (UDA) has been widely used to transfer knowledge from a labeled source domain to an unlabeled target domain to counter the difficulty of labeling in a new domain. The training of conventional solutions usually relies on the existence of both source and target domain data. However, privacy of the large-scale and well-labeled data in the source domain and trained model parameters can become the major concern of cross center/domain collaborations. In this work, to address this, we propose a practical solution to UDA for segmentation with a black-box segmentation model trained in the source domain only, rather than original source data or a white-box source model. Specifically, we resort to a knowledge distillation scheme with exponential mixup decay (EMD) to gradually learn target-specific representations. In addition, unsupervised entropy minimization is further applied to regularization of the target domain confidence. We evaluated our framework on the BraTS 2018 database, achieving performance on par with white-box source model adaptation approaches.
    Do As I Can, Not As I Say: Grounding Language in Robotic Affordances. (arXiv:2204.01691v2 [cs.RO] UPDATED)
    Large language models can encode a wealth of semantic knowledge about the world. Such knowledge could be extremely useful to robots aiming to act upon high-level, temporally extended instructions expressed in natural language. However, a significant weakness of language models is that they lack real-world experience, which makes it difficult to leverage them for decision making within a given embodiment. For example, asking a language model to describe how to clean a spill might result in a reasonable narrative, but it may not be applicable to a particular agent, such as a robot, that needs to perform this task in a particular environment. We propose to provide real-world grounding by means of pretrained skills, which are used to constrain the model to propose natural language actions that are both feasible and contextually appropriate. The robot can act as the language model's "hands and eyes," while the language model supplies high-level semantic knowledge about the task. We show how low-level skills can be combined with large language models so that the language model provides high-level knowledge about the procedures for performing complex and temporally-extended instructions, while value functions associated with these skills provide the grounding necessary to connect this knowledge to a particular physical environment. We evaluate our method on a number of real-world robotic tasks, where we show the need for real-world grounding and that this approach is capable of completing long-horizon, abstract, natural language instructions on a mobile manipulator. The project's website and the video can be found at https://say-can.github.io/.
    Langevin Diffusion Variational Inference. (arXiv:2208.07743v1 [cs.LG])
    Many methods that build powerful variational distributions based on unadjusted Langevin transitions exist. Most of these were developed using a wide range of different approaches and techniques. Unfortunately, the lack of a unified analysis and derivation makes developing new methods and reasoning about existing ones a challenging task. We address this giving a single analysis that unifies and generalizes these existing techniques. The main idea is to augment the target and variational by numerically simulating the underdamped Langevin diffusion process and its time reversal. The benefits of this approach are twofold: it provides a unified formulation for many existing methods, and it simplifies the development of new ones. In fact, using our formulation we propose a new method that combines the strengths of previously existing algorithms; it uses underdamped Langevin transitions and powerful augmentations parameterized by a score network. Our empirical evaluation shows that our proposed method consistently outperforms relevant baselines in a wide range of tasks.
    Towards Informed Design and Validation Assistance in Computer Games Using Imitation Learning. (arXiv:2208.07811v1 [cs.SE])
    In games, as in and many other domains, design validation and testing is a huge challenge as systems are growing in size and manual testing is becoming infeasible. This paper proposes a new approach to automated game validation and testing. Our method leverages a data-driven imitation learning technique, which requires little effort and time and no knowledge of machine learning or programming, that designers can use to efficiently train game testing agents. We investigate the validity of our approach through a user study with industry experts. The survey results show that our method is indeed a valid approach to game validation and that data-driven programming would be a useful aid to reducing effort and increasing quality of modern playtesting. The survey also highlights several open challenges. With the help of the most recent literature, we analyze the identified challenges and propose future research directions suitable for supporting and maximizing the utility of our approach.
    Fair Machine Learning in Healthcare: A Review. (arXiv:2206.14397v2 [cs.LG] UPDATED)
    Benefiting from the digitization of healthcare data and the development of computing power, machine learning methods are increasingly used in the healthcare domain. Fairness problems have been identified in machine learning for healthcare, resulting in an unfair allocation of limited healthcare resources or excessive health risks for certain groups. Therefore, addressing the fairness problems has recently attracted increasing attention from the healthcare community. However, the intersection of machine learning for healthcare and fairness in machine learning remains understudied. In this review, we build the bridge by exposing fairness problems, summarizing possible biases, sorting out mitigation methods and pointing out challenges along with opportunities for the future.
    FRAug: Tackling Federated Learning with Non-IID Features via Representation Augmentation. (arXiv:2205.14900v2 [cs.LG] UPDATED)
    Federated Learning (FL) is a decentralized learning paradigm, in which multiple clients collaboratively train deep learning models without centralizing their local data, and hence preserve data privacy. Real-world applications usually involve a distribution shift across the datasets of the different clients, which hurts the generalization ability of the clients to unseen samples from their respective data distributions. In this work, we address the recently proposed feature shift problem where the clients have different feature distributions, while the label distribution is the same. We propose Federated Representation Augmentation (FRAug) to tackle this practical and challenging problem. Our approach generates synthetic client-specific samples in the embedding space to augment the usually small client datasets. For that, we train a shared generative model to fuse the clients knowledge learned from their different feature distributions. This generator synthesizes client-agnostic embeddings, which are then locally transformed into client-specific embeddings by Representation Transformation Networks (RTNets). By transferring knowledge across the clients, the generated embeddings act as a regularizer for the client models and reduce overfitting to the local original datasets, hence improving generalization. Our empirical evaluation on public benchmarks and a real-world medical dataset demonstrates the effectiveness of the proposed method, which substantially outperforms the current state-of-the-art FL methods for non-IID features, including PartialFed and FedBN.
    tile2tile: Learning Game Filters for Platformer Style Transfer. (arXiv:2208.07699v1 [cs.LG])
    We present tile2tile, an approach for style transfer between levels of tile-based platformer games. Our method involves training models that translate levels from a lower-resolution sketch representation based on tile affordances to the original tile representation for a given game. This enables these models, which we refer to as filters, to translate level sketches into the style of a specific game. Moreover, by converting a level of one game into sketch form and then translating the resulting sketch into the tiles of another game, we obtain a method of style transfer between two games. We use Markov random fields and autoencoders for learning the game filters and apply them to demonstrate style transfer between levels of Super Mario Bros, Kid Icarus, Mega Man and Metroid.
    Deep Unsupervised Domain Adaptation: A Review of Recent Advances and Perspectives. (arXiv:2208.07422v1 [cs.CV])
    Deep learning has become the method of choice to tackle real-world problems in different domains, partly because of its ability to learn from data and achieve impressive performance on a wide range of applications. However, its success usually relies on two assumptions: (i) vast troves of labeled datasets are required for accurate model fitting, and (ii) training and testing data are independent and identically distributed. Its performance on unseen target domains, thus, is not guaranteed, especially when encountering out-of-distribution data at the adaptation stage. The performance drop on data in a target domain is a critical problem in deploying deep neural networks that are successfully trained on data in a source domain. Unsupervised domain adaptation (UDA) is proposed to counter this, by leveraging both labeled source domain data and unlabeled target domain data to carry out various tasks in the target domain. UDA has yielded promising results on natural image processing, video analysis, natural language processing, time-series data analysis, medical image analysis, etc. In this review, as a rapidly evolving topic, we provide a systematic comparison of its methods and applications. In addition, the connection of UDA with its closely related tasks, e.g., domain generalization and out-of-distribution detection, has also been discussed. Furthermore, deficiencies in current methods and possible promising directions are highlighted.
    Online Learning for Non-monotone Submodular Maximization: From Full Information to Bandit Feedback. (arXiv:2208.07632v1 [cs.LG])
    In this paper, we revisit the online non-monotone continuous DR-submodular maximization problem over a down-closed convex set, which finds wide real-world applications in the domain of machine learning, economics, and operations research. At first, we present the Meta-MFW algorithm achieving a $1/e$-regret of $O(\sqrt{T})$ at the cost of $T^{3/2}$ stochastic gradient evaluations per round. As far as we know, Meta-MFW is the first algorithm to obtain $1/e$-regret of $O(\sqrt{T})$ for the online non-monotone continuous DR-submodular maximization problem over a down-closed convex set. Furthermore, in sharp contrast with ODC algorithm \citep{thang2021online}, Meta-MFW relies on the simple online linear oracle without discretization, lifting, or rounding operations. Considering the practical restrictions, we then propose the Mono-MFW algorithm, which reduces the per-function stochastic gradient evaluations from $T^{3/2}$ to 1 and achieves a $1/e$-regret bound of $O(T^{4/5})$. Next, we extend Mono-MFW to the bandit setting and propose the Bandit-MFW algorithm which attains a $1/e$-regret bound of $O(T^{8/9})$. To the best of our knowledge, Mono-MFW and Bandit-MFW are the first sublinear-regret algorithms to explore the one-shot and bandit setting for online non-monotone continuous DR-submodular maximization problem over a down-closed convex set, respectively. Finally, we conduct numerical experiments on both synthetic and real-world datasets to verify the effectiveness of our methods.
    Entity Anchored ICD Coding. (arXiv:2208.07444v1 [cs.LG])
    Medical coding is a complex task, requiring assignment of a subset of over 72,000 ICD codes to a patient's notes. Modern natural language processing approaches to these tasks have been challenged by the length of the input and size of the output space. We limit our model inputs to a small window around medical entities found in our documents. From those local contexts, we build contextualized representations of both ICD codes and entities, and aggregate over these representations to form document-level predictions. In contrast to existing methods which use a representation fixed either in size or by codes seen in training, we represent ICD codes by encoding the code description with local context. We discuss metrics appropriate to deploying coding systems in practice. We show that our approach is superior to existing methods in both standard and deployable measures, including performance on rare and unseen codes.
    Investigating and Explaining the Frequency Bias in Image Classification. (arXiv:2205.03154v2 [cs.CV] UPDATED)
    CNNs exhibit many behaviors different from humans, one of which is the capability of employing high-frequency components. This paper discusses the frequency bias phenomenon in image classification tasks: the high-frequency components are actually much less exploited than the low- and mid-frequency components. We first investigate the frequency bias phenomenon by presenting two observations on feature discrimination and learning priority. Furthermore, we hypothesize that (i) the spectral density, (ii) class consistency directly affect the frequency bias. Specifically, our investigations verify that the spectral density of datasets mainly affects the learning priority, while the class consistency mainly affects the feature discrimination.
    How can spherical CNNs benefit ML-based diffusion MRI parameter estimation?. (arXiv:2207.00572v2 [eess.IV] UPDATED)
    This paper demonstrates spherical convolutional neural networks (S-CNN) offer distinct advantages over conventional fully-connected networks (FCN) at estimating scalar parameters of tissue microstructure from diffusion MRI (dMRI). Such microstructure parameters are valuable for identifying pathology and quantifying its extent. However, current clinical practice commonly acquires dMRI data consisting of only 6 diffusion weighted images (DWIs), limiting the accuracy and precision of estimated microstructure indices. Machine learning (ML) has been proposed to address this challenge. However, existing ML-based methods are not robust to differing dMRI gradient sampling schemes, nor are they rotation equivariant. Lack of robustness to sampling schemes requires a new network to be trained for each scheme, complicating the analysis of data from multiple sources. A possible consequence of the lack of rotational equivariance is that the training dataset must contain a diverse range of microstucture orientations. Here, we show spherical CNNs represent a compelling alternative that is robust to new sampling schemes as well as offering rotational equivariance. We show the latter can be leveraged to decrease the number of training datapoints required.
    Learnable Filters for Geometric Scattering Modules. (arXiv:2208.07458v1 [cs.LG])
    We propose a new graph neural network (GNN) module, based on relaxations of recently proposed geometric scattering transforms, which consist of a cascade of graph wavelet filters. Our learnable geometric scattering (LEGS) module enables adaptive tuning of the wavelets to encourage band-pass features to emerge in learned representations. The incorporation of our LEGS-module in GNNs enables the learning of longer-range graph relations compared to many popular GNNs, which often rely on encoding graph structure via smoothness or similarity between neighbors. Further, its wavelet priors result in simplified architectures with significantly fewer learned parameters compared to competing GNNs. We demonstrate the predictive performance of LEGS-based networks on graph classification benchmarks, as well as the descriptive quality of their learned features in biochemical graph data exploration tasks. Our results show that LEGS-based networks match or outperforms popular GNNs, as well as the original geometric scattering construction, on many datasets, in particular in biochemical domains, while retaining certain mathematical properties of handcrafted (non-learned) geometric scattering.
    Making Reinforcement Learning Work on Swimmer. (arXiv:2208.07587v1 [cs.LG])
    The SWIMMER environment is a standard benchmark in reinforcement learning (RL). In particular, it is often used in papers comparing or combining RL methods with direct policy search methods such as genetic algorithms or evolution strategies. A lot of these papers report poor performance on SWIMMER from RL methods and much better performance from direct policy search methods. In this technical report we show that the low performance of RL methods on SWIMMER simply comes from the inadequate tuning of an important hyper-parameter and that, by setting this hyper-parameter to a correct value, the issue can be very easily fixed.
    Value-Consistent Representation Learning for Data-Efficient Reinforcement Learning. (arXiv:2206.12542v2 [cs.LG] UPDATED)
    Deep reinforcement learning (RL) algorithms suffer severe performance degradation when the interaction data is scarce, which limits their real-world application. Recently, visual representation learning has been shown to be effective and promising for boosting sample efficiency in RL. These methods usually rely on contrastive learning and data augmentation to train a transition model for state prediction, which is different from how the model is used in RL--performing value-based planning. Accordingly, the learned representation by these visual methods may be good for recognition but not optimal for estimating state value and solving the decision problem. To address this issue, we propose a novel method, called value-consistent representation learning (VCR), to learn representations that are directly related to decision-making. More specifically, VCR trains a model to predict the future state (also referred to as the ''imagined state'') based on the current one and a sequence of actions. Instead of aligning this imagined state with a real state returned by the environment, VCR applies a $Q$-value head on both states and obtains two distributions of action values. Then a distance is computed and minimized to force the imagined state to produce a similar action value prediction as that by the real state. We develop two implementations of the above idea for the discrete and continuous action spaces respectively. We conduct experiments on Atari 100K and DeepMind Control Suite benchmarks to validate their effectiveness for improving sample efficiency. It has been demonstrated that our methods achieve new state-of-the-art performance for search-free RL algorithms.
    Coil2Coil: Self-supervised MR image denoising using phased-array coil images. (arXiv:2208.07552v1 [eess.IV])
    Denoising of magnetic resonance images is beneficial in improving the quality of low signal-to-noise ratio images. Recently, denoising using deep neural networks has demonstrated promising results. Most of these networks, however, utilize supervised learning, which requires large training images of noise-corrupted and clean image pairs. Obtaining training images, particularly clean images, is expensive and time-consuming. Hence, methods such as Noise2Noise (N2N) that require only pairs of noise-corrupted images have been developed to reduce the burden of obtaining training datasets. In this study, we propose a new self-supervised denoising method, Coil2Coil (C2C), that does not require the acquisition of clean images or paired noise-corrupted images for training. Instead, the method utilizes multichannel data from phased-array coils to generate training images. First, it divides and combines multichannel coil images into two images, one for input and the other for label. Then, they are processed to impose noise independence and sensitivity normalization such that they can be used for the training images of N2N. For inference, the method inputs a coil-combined image (e.g., DICOM image), enabling a wide application of the method. When evaluated using synthetic noise-added images, C2C shows the best performance against several self-supervised methods, reporting comparable outcomes to supervised methods. When testing the DICOM images, C2C successfully denoised real noise without showing structure-dependent residuals in the error maps. Because of the significant advantage of not requiring additional scans for clean or paired images, the method can be easily utilized for various clinical applications.
    Self-paced learning to improve text row detection in historical documents with missing labels. (arXiv:2201.12216v3 [cs.CV] UPDATED)
    An important preliminary step of optical character recognition systems is the detection of text rows. To address this task in the context of historical data with missing labels, we propose a self-paced learning algorithm capable of improving the row detection performance. We conjecture that pages with more ground-truth bounding boxes are less likely to have missing annotations. Based on this hypothesis, we sort the training examples in descending order with respect to the number of ground-truth bounding boxes, and organize them into k batches. Using our self-paced learning method, we train a row detector over k iterations, progressively adding batches with less ground-truth annotations. At each iteration, we combine the ground-truth bounding boxes with pseudo-bounding boxes (bounding boxes predicted by the model itself) using non-maximum suppression, and we include the resulting annotations at the next training iteration. We demonstrate that our self-paced learning strategy brings significant performance gains on two data sets of historical documents, improving the average precision of YOLOv4 with more than 12% on one data set and 39% on the other.
    Reward Design For An Online Reinforcement Learning Algorithm Supporting Oral Self-Care. (arXiv:2208.07406v1 [cs.AI])
    Dental disease is one of the most common chronic diseases despite being largely preventable. However, professional advice on optimal oral hygiene practices is often forgotten or abandoned by patients. Therefore patients may benefit from timely and personalized encouragement to engage in oral self-care behaviors. In this paper, we develop an online reinforcement learning (RL) algorithm for use in optimizing the delivery of mobile-based prompts to encourage oral hygiene behaviors. One of the main challenges in developing such an algorithm is ensuring that the algorithm considers the impact of the current action on the effectiveness of future actions (i.e., delayed effects), especially when the algorithm has been made simple in order to run stably and autonomously in a constrained, real-world setting (i.e., highly noisy, sparse data). We address this challenge by designing a quality reward which maximizes the desired health outcome (i.e., high-quality brushing) while minimizing user burden. We also highlight a procedure for optimizing the hyperparameters of the reward by building a simulation environment test bed and evaluating candidates using the test bed. The RL algorithm discussed in this paper will be deployed in Oralytics, an oral self-care app that provides behavioral strategies to boost patient engagement in oral hygiene practices.
    A Review of the Convergence of 5G/6G Architecture and Deep Learning. (arXiv:2208.07643v1 [cs.LG])
    The convergence of 5G architecture and deep learning has gained a lot of research interests in both the fields of wireless communication and artificial intelligence. This is because deep learning technologies have been identified to be the potential driver of the 5G technologies, that make up the 5G architecture. Hence, there have been extensive surveys on the convergence of 5G architecture and deep learning. However, most of the existing survey papers mainly focused on how deep learning can converge with a specific 5G technology, thus, not covering the full spectrum of the 5G architecture. Although there is a recent survey paper that appears to be robust, a review of that paper shows that it is not well structured to specifically cover the convergence of deep learning and the 5G technologies. Hence, this paper provides a robust overview of the convergence of the key 5G technologies and deep learning. The challenges faced by such convergence are discussed. In addition, a brief overview of the future 6G architecture, and how it can converge with deep learning is also discussed.
    A unifying partially-interpretable framework for neural network-based extreme quantile regression. (arXiv:2208.07581v1 [stat.ML])
    Risk management in many environmental settings requires an understanding of the mechanisms that drive extreme events. Useful metrics for quantifying such risk are extreme quantiles of response variables conditioned on predictor variables that describe e.g., climate, biosphere and environmental states. Typically these quantiles lie outside the range of observable data and so, for estimation, require specification of parametric extreme value models within a regression framework. Classical approaches in this context utilise linear or additive relationships between predictor and response variables and suffer in either their predictive capabilities or computational efficiency; moreover, their simplicity is unlikely to capture the truly complex structures that lead to the creation of extreme wildfires. In this paper, we propose a new methodological framework for performing extreme quantile regression using artificial neutral networks, which are able to capture complex non-linear relationships and scale well to high-dimensional data. The "black box" nature of neural networks means that they lack the desirable trait of interpretability often favoured by practitioners; thus, we combine aspects of linear, and additive, models with deep learning to create partially interpretable neural networks that can be used for statistical inference but retain high prediction accuracy. To complement this methodology, we further propose a novel point process model for extreme values which overcomes the finite lower-endpoint problem associated with the generalised extreme value class of distributions. Efficacy of our unified framework is illustrated on U.S. wildfire data with a high-dimensional predictor set and we illustrate vast improvements in predictive performance over linear and spline-based regression techniques.
    Scalable Quantum Neural Networks for Classification. (arXiv:2208.07719v1 [quant-ph])
    Many recent machine learning tasks resort to quantum computing to improve classification accuracy and training efficiency by taking advantage of quantum mechanics, known as quantum machine learning (QML). The variational quantum circuit (VQC) is frequently utilized to build a quantum neural network (QNN), which is a counterpart to the conventional neural network. Due to hardware limitations, however, current quantum devices only allow one to use few qubits to represent data and perform simple quantum computations. The limited quantum resource on a single quantum device degrades the data usage and limits the scale of the quantum circuits, preventing quantum advantage to some extent. To alleviate this constraint, we propose an approach to implementing a scalable quantum neural network (SQNN) by utilizing the quantum resource of multiple small-size quantum devices cooperatively. In an SQNN system, several quantum devices are used as quantum feature extractors, extracting local features from an input instance in parallel, and a quantum device works as a quantum predictor, performing prediction over the local features collected through classical communication channels. The quantum feature extractors in the SQNN system are independent of each other, so one can flexibly use quantum devices of varying sizes, with larger quantum devices extracting more local features. Especially, the SQNN can be performed on a single quantum device in a modular fashion. Our work is exploratory and carried out on a quantum system simulator using the TensorFlow Quantum library. The evaluation conducts a binary classification on the MNIST dataset. It shows that the SQNN model achieves a comparable classification accuracy to a regular QNN model of the same scale. Furthermore, it demonstrates that the SQNN model with more quantum resources can significantly improve classification accuracy.
    Private Query Release via the Johnson-Lindenstrauss Transform. (arXiv:2208.07410v1 [cs.DS])
    We introduce a new method for releasing answers to statistical queries with differential privacy, based on the Johnson-Lindenstrauss lemma. The key idea is to randomly project the query answers to a lower dimensional space so that the distance between any two vectors of feasible query answers is preserved up to an additive error. Then we answer the projected queries using a simple noise-adding mechanism, and lift the answers up to the original dimension. Using this method, we give, for the first time, purely differentially private mechanisms with optimal worst case sample complexity under average error for answering a workload of $k$ queries over a universe of size $N$. As other applications, we give the first purely private efficient mechanisms with optimal sample complexity for computing the covariance of a bounded high-dimensional distribution, and for answering 2-way marginal queries. We also show that, up to the dependence on the error, a variant of our mechanism is nearly optimal for every given query workload.
    Enhancement to Training of Bidirectional GAN : An Approach to Demystify Tax Fraud. (arXiv:2208.07675v1 [cs.LG])
    Outlier detection is a challenging activity. Several machine learning techniques are proposed in the literature for outlier detection. In this article, we propose a new training approach for bidirectional GAN (BiGAN) to detect outliers. To validate the proposed approach, we train a BiGAN with the proposed training approach to detect taxpayers, who are manipulating their tax returns. For each taxpayer, we derive six correlation parameters and three ratio parameters from tax returns submitted by him/her. We train a BiGAN with the proposed training approach on this nine-dimensional derived ground-truth data set. Next, we generate the latent representation of this data set using the $encoder$ (encode this data set using the $encoder$) and regenerate this data set using the $generator$ (decode back using the $generator$) by giving this latent representation as the input. For each taxpayer, compute the cosine similarity between his/her ground-truth data and regenerated data. Taxpayers with lower cosine similarity measures are potential return manipulators. We applied our method to analyze the iron and steel taxpayers data set provided by the Commercial Taxes Department, Government of Telangana, India.
    Score-Based Diffusion meets Annealed Importance Sampling. (arXiv:2208.07698v1 [stat.ML])
    More than twenty years after its introduction, Annealed Importance Sampling (AIS) remains one of the most effective methods for marginal likelihood estimation. It relies on a sequence of distributions interpolating between a tractable initial distribution and the target distribution of interest which we simulate from approximately using a non-homogeneous Markov chain. To obtain an importance sampling estimate of the marginal likelihood, AIS introduces an extended target distribution to reweight the Markov chain proposal. While much effort has been devoted to improving the proposal distribution used by AIS, by changing the intermediate distributions and corresponding Markov kernels, an underappreciated issue is that AIS uses a convenient but suboptimal extended target distribution. This can hinder its performance. We here leverage recent progress in score-based generative modeling (SGM) to approximate the optimal extended target distribution for AIS proposals corresponding to the discretization of Langevin and Hamiltonian dynamics. We demonstrate these novel, differentiable, AIS procedures on a number of synthetic benchmark distributions and variational auto-encoders.
    LEMON: Explainable Entity Matching. (arXiv:2110.00516v2 [cs.DB] UPDATED)
    State-of-the-art entity matching (EM) methods are hard to interpret, and there is significant value in bringing explainable AI to EM. Unfortunately, most popular explainability methods do not work well out of the box for EM and need adaptation. In this paper, we identify three challenges of applying local post hoc feature attribution methods to entity matching: cross-record interaction effects, non-match explanations, and variation in sensitivity. We propose our novel model-agnostic and schema-flexible method LEMON that addresses all three challenges by (i) producing dual explanations to avoid cross-record interaction effects, (ii) introducing the novel concept of attribution potential to explain how two records could have matched, and (iii) automatically choosing explanation granularity to match the sensitivity of the matcher and record pair in question. Experiments on public datasets demonstrate that the proposed method is more faithful to the matcher and does a better job of helping users understand the decision boundary of the matcher than previous work. Furthermore, user studies show that the rate at which human subjects can construct counterfactual examples after seeing an explanation from our proposed method increases from 54% to 64% for matches and from 15% to 49% for non-matches compared to explanations from a standard adaptation of LIME.
    GFlowNet Foundations. (arXiv:2111.09266v3 [cs.LG] UPDATED)
    Generative Flow Networks (GFlowNets) have been introduced as a method to sample a diverse set of candidates in an active learning context, with a training objective that makes them approximately sample in proportion to a given reward function. In this paper, we show a number of additional theoretical properties of GFlowNets. They can be used to estimate joint probability distributions and the corresponding marginal distributions where some variables are unspecified and, of particular interest, can represent distributions over composite objects like sets and graphs. GFlowNets amortize the work typically done by computationally expensive MCMC methods in a single but trained generative pass. They could also be used to estimate partition functions and free energies, conditional probabilities of supersets (supergraphs) given a subset (subgraph), as well as marginal distributions over all supersets (supergraphs) of a given set (graph). We introduce variations enabling the estimation of entropy and mutual information, sampling from a Pareto frontier, connections to reward-maximizing policies, and extensions to stochastic environments, continuous actions and modular energy functions.
    A Deep Reinforcement Learning-based Adaptive Charging Policy for Wireless Rechargeable Sensor Networks. (arXiv:2208.07824v1 [cs.LG])
    Wireless sensor networks consist of randomly distributed sensor nodes for monitoring targets or areas of interest. Maintaining the network for continuous surveillance is a challenge due to the limited battery capacity in each sensor. Wireless power transfer technology is emerging as a reliable solution for energizing the sensors by deploying a mobile charger (MC) to recharge the sensor. However, designing an optimal charging path for the MC is challenging because of uncertainties arising in the networks. The energy consumption rate of the sensors may fluctuate significantly due to unpredictable changes in the network topology, such as node failures. These changes also lead to shifts in the importance of each sensor, which are often assumed to be the same in existing works. We address these challenges in this paper by proposing a novel adaptive charging scheme using a deep reinforcement learning (DRL) approach. Specifically, we endow the MC with a charging policy that determines the next sensor to charge conditioning on the current state of the network. We then use a deep neural network to parametrize this charging policy, which will be trained by reinforcement learning techniques. Our model can adapt to spontaneous changes in the network topology. The empirical results show that the proposed algorithm outperforms the existing on-demand algorithms by a significant margin.
    Universal Solutions of Feedforward ReLU Networks for Interpolations. (arXiv:2208.07498v1 [cs.LG])
    This paper provides a theoretical framework on the solution of feedforward ReLU networks for interpolations, in terms of what is called an interpolation matrix, which is the summary, extension and generalization of our three preceding works, with the expectation that the solution of engineering could be included in this framework and finally understood. To three-layer networks, we classify different kinds of solutions and model them in a normalized form; the solution finding is investigated by three dimensions, including data, networks and the training; the mechanism of overparameterization solutions is interpreted. To deep-layer networks, we present a general result called sparse-matrix principle, which could describe some basic behavior of deep layers and explain the phenomenon of the sparse-activation mode that appears in engineering applications associated with brain science; an advantage of deep layers compared to shallower ones is manifested in this principle. As applications, a general solution of deep neural networks for classification is constructed by that principle; and we also use the principle to study the data-disentangling property of encoders. Analogous to the three-layer case, the solution of deep layers is also explored through several dimensions. The mechanism of multi-output neural networks is explained from the perspective of interpolation matrices.
    SOLBP: Second-Order Loopy Belief Propagation for Inference in Uncertain Bayesian Networks. (arXiv:2208.07368v1 [cs.AI])
    In second-order uncertain Bayesian networks, the conditional probabilities are only known within distributions, i.e., probabilities over probabilities. The delta-method has been applied to extend exact first-order inference methods to propagate both means and variances through sum-product networks derived from Bayesian networks, thereby characterizing epistemic uncertainty, or the uncertainty in the model itself. Alternatively, second-order belief propagation has been demonstrated for polytrees but not for general directed acyclic graph structures. In this work, we extend Loopy Belief Propagation to the setting of second-order Bayesian networks, giving rise to Second-Order Loopy Belief Propagation (SOLBP). For second-order Bayesian networks, SOLBP generates inferences consistent with those generated by sum-product networks, while being more computationally efficient and scalable.
    Efficient Randomized Subspace Embeddings for Distributed Optimization under a Communication Budget. (arXiv:2103.07578v4 [cs.LG] UPDATED)
    We study first-order optimization algorithms under the constraint that the descent direction is quantized using a pre-specified budget of $R$-bits per dimension, where $R \in (0 ,\infty)$. We propose computationally efficient optimization algorithms with convergence rates matching the information-theoretic performance lower bounds for: (i) Smooth and Strongly-Convex objectives with access to an Exact Gradient oracle, as well as (ii) General Convex and Non-Smooth objectives with access to a Noisy Subgradient oracle. The crux of these algorithms is a polynomial complexity source coding scheme that embeds a vector into a random subspace before quantizing it. These embeddings are such that with high probability, their projection along any of the canonical directions of the transform space is small. As a consequence, quantizing these embeddings followed by an inverse transform to the original space yields a source coding method with optimal covering efficiency while utilizing just $R$-bits per dimension. Our algorithms guarantee optimality for arbitrary values of the bit-budget $R$, which includes both the sub-linear budget regime ($R < 1$), as well as the high-budget regime ($R \geq 1$), while requiring $O\left(n^2\right)$ multiplications, where $n$ is the dimension. We also propose an efficient relaxation of this coding scheme using Hadamard subspaces that requires a near-linear time, i.e., $O\left(n \log n\right)$ additions.Furthermore, we show that the utility of our proposed embeddings can be extended to significantly improve the performance of gradient sparsification schemes. Numerical simulations validate our theoretical claims. Our implementations are available at https://github.com/rajarshisaha95/DistOptConstrComm.
    Reliable Decision from Multiple Subtasks through Threshold Optimization: Content Moderation in the Wild. (arXiv:2208.07522v1 [cs.LG])
    Social media platforms struggle to protect users from harmful content through content moderation. These platforms have recently leveraged machine learning models to cope with the vast amount of user-generated content daily. Since moderation policies vary depending on countries and types of products, it is common to train and deploy the models per policy. However, this approach is highly inefficient, especially when the policies change, requiring dataset re-labeling and model re-training on the shifted data distribution. To alleviate this cost inefficiency, social media platforms often employ third-party content moderation services that provide prediction scores of multiple subtasks, such as predicting the existence of underage personnel, rude gestures, or weapons, instead of directly providing final moderation decisions. However, making a reliable automated moderation decision from the prediction scores of the multiple subtasks for a specific target policy has not been widely explored yet. In this study, we formulate real-world scenarios of content moderation and introduce a simple yet effective threshold optimization method that searches the optimal thresholds of the multiple subtasks to make a reliable moderation decision in a cost-effective way. Extensive experiments demonstrate that our approach shows better performance in content moderation compared to existing threshold optimization methods and heuristics.
    FedMR: Fedreated Learning via Model Recombination. (arXiv:2208.07677v1 [cs.LG])
    As a promising privacy-preserving machine learning method, Federated Learning (FL) enables global model training across clients without compromising their confidential local data. However, existing FL methods suffer from the problem of low inference performance for unevenly distributed data, since most of them rely on Federated Averaging (FedAvg)-based aggregation. By averaging model parameters in a coarse manner, FedAvg eclipses the individual characteristics of local models, which strongly limits the inference capability of FL. Worse still, in each round of FL training, FedAvg dispatches the same initial local models to clients, which can easily result in stuck-at-local-search for optimal global models. To address the above issues, this paper proposes a novel and effective FL paradigm named FedMR (Federating Model Recombination). Unlike conventional FedAvg-based methods, the cloud server of FedMR shuffles each layer of collected local models and recombines them to achieve new models for local training on clients. Due to the fine-grained model recombination and local training in each FL round, FedMR can quickly figure out one globally optimal model for all the clients. Comprehensive experimental results demonstrate that, compared with state-of-the-art FL methods, FedMR can significantly improve the inference accuracy without causing extra communication overhead.
    FALSE: Fake News Automatic and Lightweight Solution. (arXiv:2208.07686v1 [cs.LG])
    Fake news existed ever since there was news, from rumors to printed media then radio and television. Recently, the information age, with its communications and Internet breakthroughs, exacerbated the spread of fake news. Additionally, aside from e-Commerce, the current Internet economy is dependent on advertisements, views and clicks, which prompted many developers to bait the end users to click links or ads. Consequently, the wild spread of fake news through social media networks has impacted real world issues from elections to 5G adoption and the handling of the Covid- 19 pandemic. Efforts to detect and thwart fake news has been there since the advent of fake news, from fact checkers to artificial intelligence-based detectors. Solutions are still evolving as more sophisticated techniques are employed by fake news propagators. In this paper, R code have been used to study and visualize a modern fake news dataset. We use clustering, classification, correlation and various plots to analyze and present the data. The experiments show high efficiency of classifiers in telling apart real from fake news.
    Hypergraphs with Edge-Dependent Vertex Weights: p-Laplacians and Spectral Clustering. (arXiv:2208.07457v1 [cs.LG])
    We study p-Laplacians and spectral clustering for a recently proposed hypergraph model that incorporates edge-dependent vertex weights (EDVWs). These weights can reflect different importance of vertices within a hyperedge, thus conferring the hypergraph model higher expressivity and flexibility. By constructing submodular EDVWs-based splitting functions, we convert hypergraphs with EDVWs into submodular hypergraphs for which the spectral theory is better developed. In this way, existing concepts and theorems such as p-Laplacians and Cheeger inequalities proposed under the submodular hypergraph setting can be directly extended to hypergraphs with EDVWs. For submodular hypergraphs with EDVWs-based splitting functions, we propose an efficient algorithm to compute the eigenvector associated with the second smallest eigenvalue of the hypergraph 1-Laplacian. We then utilize this eigenvector to cluster the vertices, achieving higher clustering accuracy than traditional spectral clustering based on the 2-Laplacian. More broadly, the proposed algorithm works for all submodular hypergraphs that are graph reducible. Numerical experiments using real-world data demonstrate the effectiveness of combining spectral clustering based on the 1-Laplacian and EDVWs.
    Reinforcement Learning for Branch-and-Bound Optimisation using Retrospective Trajectories. (arXiv:2205.14345v2 [cs.LG] UPDATED)
    Combinatorial optimisation problems framed as mixed integer linear programmes (MILPs) are ubiquitous across a range of real-world applications. The canonical branch-and-bound algorithm seeks to exactly solve MILPs by constructing a search tree of increasingly constrained sub-problems. In practice, its solving time performance is dependent on heuristics, such as the choice of the next variable to constrain ('branching'). Recently, machine learning (ML) has emerged as a promising paradigm for branching. However, prior works have struggled to apply reinforcement learning (RL), citing sparse rewards, difficult exploration, and partial observability as significant challenges. Instead, leading ML methodologies resort to approximating high quality handcrafted heuristics with imitation learning (IL), which precludes the discovery of novel policies and requires expensive data labelling. In this work, we propose retro branching; a simple yet effective approach to RL for branching. By retrospectively deconstructing the search tree into multiple paths each contained within a sub-tree, we enable the agent to learn from shorter trajectories with more predictable next states. In experiments on four combinatorial tasks, our approach enables learning-to-branch without any expert guidance or pre-training. We outperform the current state-of-the-art RL branching algorithm by 3-5x and come within 20% of the best IL method's performance on MILPs with 500 constraints and 1000 variables, with ablations verifying that our retrospectively constructed trajectories are essential to achieving these results.
    Self-Supervised Learning for Anomalous Channel Detection in EEG Graphs: Application to Seizure Analysis. (arXiv:2208.07448v1 [cs.LG])
    Electroencephalogram (EEG) signals are effective tools towards seizure analysis where one of the most important challenges is accurate detection of seizure events and brain regions in which seizure happens or initiates. However, all existing machine learning-based algorithms for seizure analysis require access to the labeled seizure data while acquiring labeled data is very labor intensive, expensive, as well as clinicians dependent given the subjective nature of the visual qualitative interpretation of EEG signals. In this paper, we propose to detect seizure channels and clips in a self-supervised manner where no access to the seizure data is needed. The proposed method considers local structural and contextual information embedded in EEG graphs by employing positive and negative sub-graphs. We train our method through minimizing contrastive and generative losses. The employ of local EEG sub-graphs makes the algorithm an appropriate choice when accessing to the all EEG channels is impossible due to complications such as skull fractures. We conduct an extensive set of experiments on the largest seizure dataset and demonstrate that our proposed framework outperforms the state-of-the-art methods in the EEG-based seizure study. The proposed method is the only study that requires no access to the seizure data in its training phase, yet establishes a new state-of-the-art to the field, and outperforms all related supervised methods.
    Towards Better Data Augmentation using Wasserstein Distance in Variational Auto-encoder. (arXiv:2109.14795v2 [cs.LG] UPDATED)
    VAE, or variational auto-encoder, compresses data into latent attributes, and generates new data of different varieties. VAE based on KL divergence has been considered as an effective technique for data augmentation. In this paper, we propose the use of Wasserstein distance as a measure of distributional similarity for the latent attributes, and show its superior theoretical lower bound (ELBO) compared with that of KL divergence under mild conditions. Using multiple experiments, we demonstrate that the new loss function exhibits better convergence property and generates artificial images that could better aid the image classification tasks.
  • Open

    An Overview and Prospective Outlook on Robust Training and Certification of Machine Learning Models. (arXiv:2208.07464v1 [cs.LG])
    In this discussion paper, we survey recent research surrounding robustness of machine learning models. As learning algorithms become increasingly more popular in data-driven control systems, their robustness to data uncertainty must be ensured in order to maintain reliable safety-critical operations. We begin by reviewing common formalisms for such robustness, and then move on to discuss popular and state-of-the-art techniques for training robust machine learning models as well as methods for provably certifying such robustness. From this unification of robust machine learning, we identify and discuss pressing directions for future research in the area.
    Higher-order accurate two-sample network inference and network hashing. (arXiv:2208.07573v1 [stat.ME])
    Two-sample hypothesis testing for comparing two networks is an important yet difficult problem. Major challenges include: potentially different sizes and sparsity levels; non-repeated observations of adjacency matrices; computational scalability; and theoretical investigations, especially on finite-sample accuracy and minimax optimality. In this article, we propose the first provably higher-order accurate two-sample inference method by comparing network moments. Our method extends the classical two-sample t-test to the network setting. We make weak modeling assumptions and can effectively handle networks of different sizes and sparsity levels. We establish strong finite-sample theoretical guarantees, including rate-optimality properties. Our method is easy to implement and computes fast. We also devise a novel nonparametric framework of offline hashing and fast querying particularly effective for maintaining and querying very large network databases. We demonstrate the effectiveness of our method by comprehensive simulations. We apply our method to two real-world data sets and discover interesting novel structures.
    A unifying partially-interpretable framework for neural network-based extreme quantile regression. (arXiv:2208.07581v1 [stat.ML])
    Risk management in many environmental settings requires an understanding of the mechanisms that drive extreme events. Useful metrics for quantifying such risk are extreme quantiles of response variables conditioned on predictor variables that describe e.g., climate, biosphere and environmental states. Typically these quantiles lie outside the range of observable data and so, for estimation, require specification of parametric extreme value models within a regression framework. Classical approaches in this context utilise linear or additive relationships between predictor and response variables and suffer in either their predictive capabilities or computational efficiency; moreover, their simplicity is unlikely to capture the truly complex structures that lead to the creation of extreme wildfires. In this paper, we propose a new methodological framework for performing extreme quantile regression using artificial neutral networks, which are able to capture complex non-linear relationships and scale well to high-dimensional data. The "black box" nature of neural networks means that they lack the desirable trait of interpretability often favoured by practitioners; thus, we combine aspects of linear, and additive, models with deep learning to create partially interpretable neural networks that can be used for statistical inference but retain high prediction accuracy. To complement this methodology, we further propose a novel point process model for extreme values which overcomes the finite lower-endpoint problem associated with the generalised extreme value class of distributions. Efficacy of our unified framework is illustrated on U.S. wildfire data with a high-dimensional predictor set and we illustrate vast improvements in predictive performance over linear and spline-based regression techniques.
    Private Query Release via the Johnson-Lindenstrauss Transform. (arXiv:2208.07410v1 [cs.DS])
    We introduce a new method for releasing answers to statistical queries with differential privacy, based on the Johnson-Lindenstrauss lemma. The key idea is to randomly project the query answers to a lower dimensional space so that the distance between any two vectors of feasible query answers is preserved up to an additive error. Then we answer the projected queries using a simple noise-adding mechanism, and lift the answers up to the original dimension. Using this method, we give, for the first time, purely differentially private mechanisms with optimal worst case sample complexity under average error for answering a workload of $k$ queries over a universe of size $N$. As other applications, we give the first purely private efficient mechanisms with optimal sample complexity for computing the covariance of a bounded high-dimensional distribution, and for answering 2-way marginal queries. We also show that, up to the dependence on the error, a variant of our mechanism is nearly optimal for every given query workload.
    CARD: Classification and Regression Diffusion Models. (arXiv:2206.07275v2 [stat.ML] UPDATED)
    Learning the distribution of a continuous or categorical response variable $\boldsymbol y$ given its covariates $\boldsymbol x$ is a fundamental problem in statistics and machine learning. Deep neural network-based supervised learning algorithms have made great progress in predicting the mean of $\boldsymbol y$ given $\boldsymbol x$, but they are often criticized for their ability to accurately capture the uncertainty of their predictions. In this paper, we introduce classification and regression diffusion (CARD) models, which combine a denoising diffusion-based conditional generative model and a pre-trained conditional mean estimator, to accurately predict the distribution of $\boldsymbol y$ given $\boldsymbol x$. We demonstrate the outstanding ability of CARD in conditional distribution prediction with both toy examples and real-world datasets, the experimental results on which show that CARD in general outperforms state-of-the-art methods, including Bayesian neural network-based ones that are designed for uncertainty estimation, especially when the conditional distribution of $\boldsymbol y$ given $\boldsymbol x$ is multi-modal.
    Accelerating nanomaterials discovery with artificial intelligence at the HPC centers. (arXiv:2208.07612v1 [cond-mat.mtrl-sci])
    Study of properties of chemicals, drugs, biomaterials and alloys requires decades of dedicated work. Often times the outcome, however, is not what is expected for practical applications. This research procedure can be inverted by the new artificial intelligence and optimization methods. Instead of studying the properties of a material and its structurally close derivatives, the chemical and structural parameter space that contains all possible derivatives of that material can be scanned in a fast and smart way at the HPC centers. As a result of this, the particular material that has the specific physical or chemical properties can be found. Here we show how Bayesian optimization, Gaussian regression and artificial neural networks can be used towards this goal. We present an example smart search on the doped graphene quantum dot parameter space.
    GFlowNet Foundations. (arXiv:2111.09266v3 [cs.LG] UPDATED)
    Generative Flow Networks (GFlowNets) have been introduced as a method to sample a diverse set of candidates in an active learning context, with a training objective that makes them approximately sample in proportion to a given reward function. In this paper, we show a number of additional theoretical properties of GFlowNets. They can be used to estimate joint probability distributions and the corresponding marginal distributions where some variables are unspecified and, of particular interest, can represent distributions over composite objects like sets and graphs. GFlowNets amortize the work typically done by computationally expensive MCMC methods in a single but trained generative pass. They could also be used to estimate partition functions and free energies, conditional probabilities of supersets (supergraphs) given a subset (subgraph), as well as marginal distributions over all supersets (supergraphs) of a given set (graph). We introduce variations enabling the estimation of entropy and mutual information, sampling from a Pareto frontier, connections to reward-maximizing policies, and extensions to stochastic environments, continuous actions and modular energy functions.
    Model Optimization in Imbalanced Regression. (arXiv:2206.09991v2 [cs.LG] UPDATED)
    Imbalanced domain learning aims to produce accurate models in predicting instances that, though underrepresented, are of utmost importance for the domain. Research in this field has been mainly focused on classification tasks. Comparatively, the number of studies carried out in the context of regression tasks is negligible. One of the main reasons for this is the lack of loss functions capable of focusing on minimizing the errors of extreme (rare) values. Recently, an evaluation metric was introduced: Squared Error Relevance Area (SERA). This metric posits a bigger emphasis on the errors committed at extreme values while also accounting for the performance in the overall target variable domain, thus preventing severe bias. However, its effectiveness as an optimization metric is unknown. In this paper, our goal is to study the impacts of using SERA as an optimization criterion in imbalanced regression tasks. Using gradient boosting algorithms as proof of concept, we perform an experimental study with 36 data sets of different domains and sizes. Results show that models that used SERA as an objective function are practically better than the models produced by their respective standard boosting algorithms at the prediction of extreme values. This confirms that SERA can be embedded as a loss function into optimization-based learning algorithms for imbalanced regression scenarios.
    Near Optimal Adversarial Attack on UCB Bandits. (arXiv:2008.09312v2 [cs.LG] UPDATED)
    We consider a stochastic multi-arm bandit problem where rewards are subject to adversarial corruption. We propose a novel attack strategy that manipulates a UCB principle into pulling some non-optimal target arm $T - o(T)$ times with a cumulative cost that scales as $\sqrt{\log T}$, where $T$ is the number of rounds. We also prove the first lower bound on the cumulative attack cost. Our lower bound matches our upper bound up to $\log \log T$ factors, showing our attack to be near optimal.
    Estimating the Mixing Time of Ergodic Markov Chains. (arXiv:1902.01224v4 [math.ST] UPDATED)
    We address the problem of estimating the mixing time $t_{\mathsf{mix}}$ of an arbitrary ergodic finite-state Markov chain from a single trajectory of length $m$. The reversible case was addressed by Hsu et al. [2019], who left the general case as an open problem. In the reversible case, the analysis is greatly facilitated by the fact that the Markov operator is self-adjoint, and Weyl's inequality allows for a dimension-free perturbation analysis of the empirical eigenvalues. As Hsu et al. point out, in the absence of reversibility (which induces asymmetric pair probabilities matrices), the existing perturbation analysis has a worst-case exponential dependence on the number of states $d$. Furthermore, even if an eigenvalue perturbation analysis with better dependence on $d$ were available, in the non-reversible case the connection between the spectral gap and the mixing time is not nearly as straightforward as in the reversible case. Our key insight is to estimate the pseudo-spectral gap $\gamma_{\mathsf{ps}}$ instead, which allows us to overcome the loss of symmetry and to achieve a polynomial dependence on the minimal stationary probability $\pi_\star$ and $\gamma_{\mathsf{ps}}$. Additionally, in the reversible case, we obtain simultaneous nearly (up to logarithmic factors) minimax rates in $t_{\mathsf{mix}}$ and precision $\varepsilon$, closing a gap in Hsu et al., who treated $\varepsilon$ as constant in the lower bounds. Finally, we construct fully empirical confidence intervals for $\gamma_{\mathsf{ps}}$, which shrink to zero at a rate of roughly $1/\sqrt{m}$, and improve the state of the art in even the reversible case.
    Towards Certified Robustness of Distance Metric Learning. (arXiv:2006.05945v2 [stat.ML] UPDATED)
    Metric learning aims to learn a distance metric such that semantically similar instances are pulled together while dissimilar instances are pushed away. Many existing methods consider maximizing or at least constraining a distance margin in the feature space that separates similar and dissimilar pairs of instances to guarantee their generalization ability. In this paper, we advocate imposing an adversarial margin in the input space so as to improve the generalization and robustness of metric learning algorithms. We first show that, the adversarial margin, defined as the distance between training instances and their closest adversarial examples in the input space, takes account of both the distance margin in the feature space and the correlation between the metric and triplet constraints. Next, to enhance robustness to instance perturbation, we propose to enlarge the adversarial margin through minimizing a derived novel loss function termed the perturbation loss. The proposed loss can be viewed as a data-dependent regularizer and easily plugged into any existing metric learning methods. Finally, we show that the enlarged margin is beneficial to the generalization ability by using the theoretical technique of algorithmic robustness. Experimental results on 16 datasets demonstrate the superiority of the proposed method over existing state-of-the-art methods in both discrimination accuracy and robustness against possible noise.
    SOLBP: Second-Order Loopy Belief Propagation for Inference in Uncertain Bayesian Networks. (arXiv:2208.07368v1 [cs.AI])
    In second-order uncertain Bayesian networks, the conditional probabilities are only known within distributions, i.e., probabilities over probabilities. The delta-method has been applied to extend exact first-order inference methods to propagate both means and variances through sum-product networks derived from Bayesian networks, thereby characterizing epistemic uncertainty, or the uncertainty in the model itself. Alternatively, second-order belief propagation has been demonstrated for polytrees but not for general directed acyclic graph structures. In this work, we extend Loopy Belief Propagation to the setting of second-order Bayesian networks, giving rise to Second-Order Loopy Belief Propagation (SOLBP). For second-order Bayesian networks, SOLBP generates inferences consistent with those generated by sum-product networks, while being more computationally efficient and scalable.  ( 2 min )
    On Sample Complexity of Offline Reinforcement Learning with Deep ReLU Networks in Besov Spaces. (arXiv:2103.06671v4 [stat.ML] UPDATED)
    Offline reinforcement learning (RL) leverages previously collected data for policy optimization without any further active exploration. Despite the recent interest in this problem, its theoretical results in neural network function approximation setting remain limited. In this paper, we study the statistical theory of offline RL with deep ReLU network function approximation. In particular, we establish the sample complexity of $\tilde{\mathcal{O}}\left( \kappa^{1 + d/\alpha} \cdot \epsilon^{-2 - 2d/\alpha} \right)$ for offline RL with deep ReLU networks, where $\kappa$ is a measure of distributional shift, $d$ is the dimension of the state-action space, $\alpha$ is a (possibly fractional) smoothness parameter of the underlying Markov decision process (MDP), and $\epsilon$ is a user-specified error. Notably, our sample complexity holds under two novel considerations, namely the Besov dynamic closure and the correlated structure that arises from value regression for offline RL. While the Besov dynamic closure generalizes the dynamic conditions for offline RL in the prior works, the correlated structure renders the prior works of offline RL with general/neural network function approximation improper or inefficient. To the best of our knowledge, this is the first theoretical characterization of the sample complexity of offline RL with deep neural network function approximation under the general Besov regularity condition that goes beyond the traditional Reproducing Hilbert kernel spaces and Neural Tangent Kernels.  ( 3 min )
    Deletion Robust Non-Monotone Submodular Maximization over Matroids. (arXiv:2208.07582v1 [cs.DS])
    Maximizing a submodular function is a fundamental task in machine learning and in this paper we study the deletion robust version of the problem under the classic matroids constraint. Here the goal is to extract a small size summary of the dataset that contains a high value independent set even after an adversary deleted some elements. We present constant-factor approximation algorithms, whose space complexity depends on the rank $k$ of the matroid and the number $d$ of deleted elements. In the centralized setting we present a $(4.597+O(\varepsilon))$-approximation algorithm with summary size $O( \frac{k+d}{\varepsilon^2}\log \frac{k}{\varepsilon})$ that is improved to a $(3.582+O(\varepsilon))$-approximation with $O(k + \frac{d}{\varepsilon^2}\log \frac{k}{\varepsilon})$ summary size when the objective is monotone. In the streaming setting we provide a $(9.435 + O(\varepsilon))$-approximation algorithm with summary size and memory $O(k + \frac{d}{\varepsilon^2}\log \frac{k}{\varepsilon})$; the approximation factor is then improved to $(5.582+O(\varepsilon))$ in the monotone case.  ( 2 min )
    Estimating Appearance Models for Image Segmentation via Tensor Factorization. (arXiv:2208.07853v1 [cs.CV])
    Image Segmentation is one of the core tasks in Computer Vision and solving it often depends on modeling the image appearance data via the color distributions of each it its constituent regions. Whereas many segmentation algorithms handle the appearance models dependence using alternation or implicit methods, we propose here a new approach to directly estimate them from the image without prior information on the underlying segmentation. Our method uses local high order color statistics from the image as an input to tensor factorization-based estimator for latent variable models. This approach is able to estimate models in multiregion images and automatically output the regions proportions without prior user interaction, overcoming the drawbacks from a prior attempt to this problem. We also demonstrate the performance of our proposed method in many challenging synthetic and real imaging scenarios and show that it leads to an efficient segmentation algorithm.  ( 2 min )
    Neural Networks for Extreme Quantile Regression with an Application to Forecasting of Flood Risk. (arXiv:2208.07590v1 [stat.ME])
    Risk assessment for extreme events requires accurate estimation of high quantiles that go beyond the range of historical observations. When the risk depends on the values of observed predictors, regression techniques are used to interpolate in the predictor space. We propose the EQRN model that combines tools from neural networks and extreme value theory into a method capable of extrapolation in the presence of complex predictor dependence. Neural networks can naturally incorporate additional structure in the data. We develop a recurrent version of EQRN that is able to capture complex sequential dependence in time series. We apply this method to forecasting of flood risk in the Swiss Aare catchment. It exploits information from multiple covariates in space and time to provide one-day-ahead predictions of return levels and exceedances probabilities. This output complements the static return level from a traditional extreme value analysis and the predictions are able to adapt to distributional shifts as experienced in a changing climate. Our model can help authorities to manage flooding more effectively and to minimize their disastrous impacts through early warning systems.  ( 2 min )
    Training Latent Variable Models with Auto-encoding Variational Bayes: A Tutorial. (arXiv:2208.07818v1 [cs.LG])
    Auto-encoding Variational Bayes (AEVB) is a powerful and general algorithm for fitting latent variable models (a promising direction for unsupervised learning), and is well-known for training the Variational Auto-Encoder (VAE). In this tutorial, we focus on motivating AEVB from the classic Expectation Maximization (EM) algorithm, as opposed to from deterministic auto-encoders. Though natural and somewhat self-evident, the connection between EM and AEVB is not emphasized in the recent deep learning literature, and we believe that emphasizing this connection can improve the community's understanding of AEVB. In particular, we find it especially helpful to view (1) optimizing the evidence lower bound (ELBO) with respect to inference parameters as approximate E-step and (2) optimizing ELBO with respect to generative parameters as approximate M-step; doing both simultaneously as in AEVB is then simply tightening and pushing up ELBO at the same time. We discuss how approximate E-step can be interpreted as performing variational inference. Important concepts such as amortization and the reparametrization trick are discussed in great detail. Finally, we derive from scratch the AEVB training procedures of a non-deep and several deep latent variable models, including VAE, Conditional VAE, Gaussian Mixture VAE and Variational RNN. It is our hope that readers would recognize AEVB as a general algorithm that can be used to fit a wide range of latent variable models (not just VAE), and apply AEVB to such models that arise in their own fields of research. PyTorch code for all included models are publicly available.  ( 3 min )
    Score-Based Diffusion meets Annealed Importance Sampling. (arXiv:2208.07698v1 [stat.ML])
    More than twenty years after its introduction, Annealed Importance Sampling (AIS) remains one of the most effective methods for marginal likelihood estimation. It relies on a sequence of distributions interpolating between a tractable initial distribution and the target distribution of interest which we simulate from approximately using a non-homogeneous Markov chain. To obtain an importance sampling estimate of the marginal likelihood, AIS introduces an extended target distribution to reweight the Markov chain proposal. While much effort has been devoted to improving the proposal distribution used by AIS, by changing the intermediate distributions and corresponding Markov kernels, an underappreciated issue is that AIS uses a convenient but suboptimal extended target distribution. This can hinder its performance. We here leverage recent progress in score-based generative modeling (SGM) to approximate the optimal extended target distribution for AIS proposals corresponding to the discretization of Langevin and Hamiltonian dynamics. We demonstrate these novel, differentiable, AIS procedures on a number of synthetic benchmark distributions and variational auto-encoders.  ( 2 min )
    Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere. (arXiv:2005.10242v10 [cs.LG] UPDATED)
    Contrastive representation learning has been outstandingly successful in practice. In this work, we identify two key properties related to the contrastive loss: (1) alignment (closeness) of features from positive pairs, and (2) uniformity of the induced distribution of the (normalized) features on the hypersphere. We prove that, asymptotically, the contrastive loss optimizes these properties, and analyze their positive effects on downstream tasks. Empirically, we introduce an optimizable metric to quantify each property. Extensive experiments on standard vision and language datasets confirm the strong agreement between both metrics and downstream task performance. Remarkably, directly optimizing for these two metrics leads to representations with comparable or better performance at downstream tasks than contrastive learning. Project Page: https://tongzhouwang.info/hypersphere Code: https://github.com/SsnL/align_uniform , https://github.com/SsnL/moco_align_uniform  ( 3 min )
    Notes on Worst-case Inefficiency of Gradient Descent Even in R^2. (arXiv:2008.07513v2 [cs.LG] UPDATED)
    Gradient descent is a popular algorithm in optimization, and its performance in convex settings is mostly well understood. In non-convex settings, it has been shown that gradient descent is able to escape saddle points asymptotically and converge to local minimizers [Lee et. al. 2016]. Recent studies also show a perturbed version of gradient descent is enough to escape saddle points efficiently [Jin et. al. 2015, Ge et. al. 2017]. In this paper we show a negative result: gradient descent may take exponential time to escape saddle points, with non-pathological two dimensional functions. While our focus is theoretical, we also conduct experiments verifying our theoretical result. Through our analysis we demonstrate that stochasticity is essential to escape saddle points efficiently.  ( 2 min )
    $L^p$ sampling numbers for the Fourier-analytic Barron space. (arXiv:2208.07605v1 [math.FA])
    In this paper, we consider Barron functions $f : [0,1]^d \to \mathbb{R}$ of smoothness $\sigma > 0$, which are functions that can be written as \[ f(x) = \int_{\mathbb{R}^d} F(\xi) \, e^{2 \pi i \langle x, \xi \rangle} \, d \xi \quad \text{with} \quad \int_{\mathbb{R}^d} |F(\xi)| \cdot (1 + |\xi|)^{\sigma} \, d \xi < \infty. \] For $\sigma = 1$, these functions play a prominent role in machine learning, since they can be efficiently approximated by (shallow) neural networks without suffering from the curse of dimensionality. For these functions, we study the following question: Given $m$ point samples $f(x_1),\dots,f(x_m)$ of an unknown Barron function $f : [0,1]^d \to \mathbb{R}$ of smoothness $\sigma$, how well can $f$ be recovered from these samples, for an optimal choice of the sampling points and the reconstruction procedure? Denoting the optimal reconstruction error measured in $L^p$ by $s_m (\sigma; L^p)$, we show that \[ m^{- \frac{1}{\max \{ p,2 \}} - \frac{\sigma}{d}} \lesssim s_m(\sigma;L^p) \lesssim (\ln (e + m))^{\alpha(\sigma,d) / p} \cdot m^{- \frac{1}{\max \{ p,2 \}} - \frac{\sigma}{d}} , \] where the implied constants only depend on $\sigma$ and $d$ and where $\alpha(\sigma,d)$ stays bounded as $d \to \infty$.  ( 2 min )

  • Open

    Google AI Open-Sources ‘Rax’, A Python Library for LTR (Learning to Rank) in the JAX ecosystem
    submitted by /u/ai-lover [link] [comments]  ( 88 min )
    Dall-e Mini generated tour of Nashville
    submitted by /u/iTieRoomsTogether [link] [comments]  ( 89 min )
    First time trying out midjourney
    submitted by /u/OryonBlack [link] [comments]  ( 86 min )
    IBM Releases Deep Search For Scientific Discovery
    submitted by /u/pmz [link] [comments]  ( 86 min )
    Somwhere in the future
    submitted by /u/widgia [link] [comments]  ( 86 min )
    Entire AI lifecycle in one platform! You can turn data into Apps in just minutes!
    submitted by /u/SamuelSmith1416 [link] [comments]  ( 86 min )
    Using DALLE to visualise the concept of a Paperclip Maximizer
    submitted by /u/newt-s [link] [comments]  ( 86 min )
    The Latest Language Model From Meta AI, ‘Atlas,’ Has Outperformed Previous Models Like ‘Palm’ And Reached Over 42% Accuracy On Natural Questions Using Only 64 Examples
    submitted by /u/ai-lover [link] [comments]  ( 90 min )
    My first attempt at creating wallpapers for my phone: The God Emperor | Using MidJourney AI (Image Creator bot for Discord)
    submitted by /u/Potato_Player_BR [link] [comments]  ( 86 min )
    AI-Designed Graphic Tees
    submitted by /u/cityofgoul [link] [comments]  ( 87 min )
    Midjourney – AI Revolution in Art and Graphic Design
    submitted by /u/kbf_ [link] [comments]  ( 86 min )
    How do you feel when someone participates on your art on MJ
    submitted by /u/MuralPassport [link] [comments]  ( 86 min )
    What are your favourite AI/ML RSS feeds (Podcasts, Blogs, News)?
    What are your favourite RSS/Atom feeds you consume to stay up to date with all things AI/ML in terms of News sites, Podcasts, Blogs or even YouTube channels? Ideally anything that isn't already covered by sites like allainews.com. Things like: research, innovations new libraries & how to use them products/tech related to AI/ML educational stuff, how to get into AI/Data Science anything else that's related submitted by /u/ai_jobs [link] [comments]  ( 86 min )
    The Walter White Scene From the Better Call Saul Finale Upscaled to 4K 60FPS
    submitted by /u/hirep14316 [link] [comments]  ( 86 min )
  • Open

    Announcing the launch of the model copy feature for Amazon Rekognition Custom Labels
    Amazon Rekognition Custom Labels is a fully managed computer vision service that allows developers to build custom models to classify and identify objects in images that are specific and unique to your business. Rekognition Custom Labels doesn’t require you to have any prior computer vision expertise. For example, you can find your logo in social […]  ( 9 min )
    Cloud-based medical imaging reconstruction using deep neural networks
    Medical imaging techniques like computed tomography (CT), magnetic resonance imaging (MRI), medical x-ray imaging, ultrasound imaging, and others are commonly used by doctors for various reasons. Some examples include detecting changes in the appearance of organs, tissues, and vessels, and detecting abnormalities such as tumors and various other type of pathologies. Before doctors can use […]  ( 7 min )
  • Open

    [R] Transframer: Arbitrary Frame Prediction with Generative Models - DeepMind 2022 - Can generate 30 Second Videos from a single frame while also being able to do 8 different Vision tasks including depth estimation, object detection and instance segmentation.
    Paper: https://arxiv.org/abs/2203.09494 Abstract: We present a general-purpose framework for image modelling and vision tasks based on probabilistic frame prediction. Our approach unifies a broad range of tasks, from image segmentation, to novel view synthesis and video interpolation. We pair this framework with an architecture we term Transframer, which uses U-Net and Transformer components to condition on annotated context frames, and outputs sequences of sparse, compressed image features. Transframer is the state-of-the-art on a variety of video generation benchmarks, is competitive with the strongest models on few-shot view synthesis, and can generate coherent 30 second videos from a single image without any explicit geometric information. A single generalist Transframer simultaneously produces promising results on 8 tasks, including semantic segmentation, image classification and optical flow prediction with no task-specific architectural components, demonstrating that multi-task computer vision can be tackled using probabilistic image models. Our approach can in principle be applied to a wide range of applications that require learning the conditional structure of annotated image-formatted data. https://preview.redd.it/zmaxnf27k5i91.jpg?width=957&format=pjpg&auto=webp&s=e757c94a5f31cb5760919844aabca042ae810a5c https://preview.redd.it/p9p5em27k5i91.jpg?width=967&format=pjpg&auto=webp&s=05c4c547289be0cfea56a378067632645b22d8e2 https://preview.redd.it/653mvm27k5i91.jpg?width=1321&format=pjpg&auto=webp&s=7e93c66ac40d1fbb9211dad967847fc435e1c9a4 https://i.redd.it/32952t27k5i91.gif submitted by /u/Singularian2501 [link] [comments]  ( 110 min )
    Why does the CNN model accuracy vary too much when the dataset is the same? [P]
    I am training a CNN model to classify among 6 categories. My image size is not square but 14x100 pixel. Actually there are no real image but I converted some numbers (time series data) to an image array. The problem is after training when I evaluate the model, on different run it shows me different accuracy even though the dataset is same. I am using sklearn train_test_split and used random_state = 0 and 42 to keep the data same on different run. But it is showing varying accuracy from 50% to even 100%. Could someone explain what could be the reason and how to solve this? The CNN model is shown below: model_2 = tf.keras.Sequential([ tf.keras.layers.Conv2D(filters=3, kernel_size=3, strides=1, padding="same", activation="relu",input_shape=(X_train_scaled[0].shape)), tf.keras.layers.MaxPool2D(pool_size=2), tf.keras.layers.Conv2D(6,3, padding="same", activation='relu'), tf.keras.layers.MaxPool2D(pool_size=2), tf.keras.layers.Conv2D(12,3, padding="same", activation='relu'), tf.keras.layers.MaxPool2D(pool_size=2), tf.keras.layers.Flatten(), tf.keras.layers.Dense(72, activation='relu'), tf.keras.layers.Dense(36, activation='relu'), tf.keras.layers.Dense(2, activation= 'softmax') #Output layer ]) #Training the model model_2.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) model_2.fit(X_train_scaled, y_train, epochs=300) #Accuracy and loss loss, accuracy= model_2.evaluate(X_test_scaled, y_test) print(f'Loss: {loss}, Accuracy: {accuracy}') The model summary: Model Summary submitted by /u/alam1001 [link] [comments]  ( 89 min )
    [D] [R] Fine tuning a model with batch size equal to 2
    Hello, So I‘m trying to fine tune a CNN based model. But my gpu can only hold a batch size equal to 2. (I can’t use gradient checkpointing here ..) The problem is that the model contains batchnorm layers. So as I see it I has two solutions : Train as usual, this means that my statistics will be based on two images. -Make the batchnorm layer behave in training as in testing that means I will be using the mean nad variance in training as well. is this case I fear that the regularization effect will not be the same. Anyone can enlighten me on this please ? Thanks ! submitted by /u/Meddhouib10 [link] [comments]  ( 88 min )
    [D] How do contrastive loss functions avoid the issue of duplicates / near duplicates?
    Contrastive loss functions work as follows: You have a grid of (image, image) (SiMCLR), or (image, text) pairs (CLIP), and you maximize the similarity between the target pairs (2 different crops of an image in the (image, image) case), or (image, text) pairs in the (image, text) case. My question is, in the (image, image) case, how do you deal with the following issue: imagine you have 2 images that are near duplicates of each other. In this case, the contrastive loss would push those 2 images apart in embedding space, when in reality, you might want to those 2 images to be similar in embedding space. submitted by /u/vanilla-acc [link] [comments]  ( 89 min )
    [R] PAINT: Patching models with weight interpolations
    Even the best pre-trained models are not perfect. PAINT is a simple, fast and effective method to improve your model on a new downstream task, without overspecialization. Paper: https://arxiv.org/abs/2208.05592 Code: https://github.com/mlfoundations/patching Website: https://model-patching.github.io/ Abstract: Open-vocabulary models like CLIP achieve high accuracy across many image classification tasks. However, there are still settings where their zero-shot performance is far from optimal. We study model patching, where the goal is to improve accuracy on specific tasks without degrading accuracy on tasks where performance is already adequate. Towards this goal, we introduce PAINT, a patching method that uses interpolations between the weights of a model before fine-tuning and the weights after fine-tuning on a task to be patched. On nine tasks where zero-shot CLIP performs poorly, PAINT increases accuracy by 15 to 60 percentage points while preserving accuracy on ImageNet within one percentage point of the zero-shot model. PAINT also allows a single model to be patched on multiple tasks and improves with model scale. Furthermore, we identify cases of broad transfer, where patching on one task increases accuracy on other tasks even when the tasks have disjoint classes. Finally, we investigate applications beyond common benchmarks such as counting or reducing the impact of typographic attacks on CLIP. Our findings demonstrate that it is possible to expand the set of tasks on which open-vocabulary models achieve high accuracy without re-training them from scratch. ​ https://preview.redd.it/z1s6mgf0b5i91.png?width=984&format=png&auto=webp&s=b3936585bd0f14b4fa8994068718d0bdb954a963 submitted by /u/hakoka [link] [comments]  ( 89 min )
    [D] References regarding the learning behaviour of neural networks?
    I’m asked to give a talk about the black box nature of neural networks and what we know about how networks learn. One part is the standard stuff about explainable AI including SHAP values and visualising activation maps. However, I want to dig a little more into the theoretical work what we know about the inner optimisations of networks. What behaviour seems unexpected? What impresses us in a positive way? When and where do networks come to trivial solutions? One example I’m thinking of is the lottery ticket hypothesis which I want to include. The target audience is techy but not necessarily familiar with AI research. Do you have some other references that come to mind? Have you heard a similar talk elsewhere? Thank you submitted by /u/gopietz [link] [comments]  ( 88 min )
    [D] Should I ask someone before using their model for benchmarking?
    So I'm going to be publishing a paper soon and, for benchmarking our model in the paper, I've reimplemented someone else's model to compare against. Is it polite for me to ask permission of that model's creator before publishing? submitted by /u/Spentworth [link] [comments]  ( 89 min )
    [D] WACV 2023 Round-1 Paper Notification
    Discussions regarding accepted/rejected/revision for papers submitted @ WACV 2023 (Round-1). Results are supposed to be out today. submitted by /u/jarvvvis [link] [comments]  ( 88 min )
    [P] First public pretrained ViT-VQGAN released
    hi folks, 4 months ago I have announced a unofficial implementation of ViT-VQGAN at here and now I'm happily to announce that first public ViT-VQGAN (base config) weight ever is released. I hope it can be useful to you guys. Below is some images and its reconstruction: https://preview.redd.it/yebww82s04i91.png?width=1034&format=png&auto=webp&s=fbc0aaf74be06511e8028843b1a7cf153ad52a3f https://preview.redd.it/5xhyd4mj14i91.png?width=1034&format=png&auto=webp&s=8ee9be1e650bce9763ac63d959b6d0a9574a6eaa submitted by /u/ThunaClone [link] [comments]  ( 87 min )
    [P] Data pipelines for data scientists - looking for contributors for new open-source notebook
    We're currently working on a new open-source notebook to shape the future of building data pipelines. We would love for you to test out our current version in a collaborative effort to create better workflows for data scientists (and other data and machine learning professionals). Repo: https://github.com/mage-ai/mage-ai More about Mage: https://mage.ai Join our slack community: https://mage.ai/chat submitted by /u/tchungry [link] [comments]  ( 88 min )
    [D] Is UK PhD as good as US one?
    Despite equally great univs in machine learning in both UK (also include Europe) and US, the PhD duration differs a lot (UK typically 3-4 years while US typically 6). Is UK PhD less valued in the ml community than US? I do believe that there is more coursework in the initial years in US than UK, but is that the only difference. If both are equally good then why don't people opt more for UK PhDs? submitted by /u/kush2196 [link] [comments]  ( 92 min )
    [D] Why is Ordinal Regression so overlooked?
    Ordinal Regression (Ordinal Classification) is such a prevalent problem in business and academia, e.g. ranking of products/risk/health/drugs, however there is very little out there in terms of papers and libraries. The most recent and usable DL attempt I have found is the CORAL/CORN frameworks (keras, pytorch) which have just a few stars, and that's it. Is it a problem that all big companies have "solved" internally, or is it just that no one bothers and it's approximated by clipping regular regression outputs, or using regular multiclass classification? submitted by /u/koorm [link] [comments]  ( 91 min )
    [P] Deploy Detectron2 models with Triton inference server
    Detectron2 is a very popular PyTorch-based library for detection tasks. Although it has optimal implementations of many detection and segmentation models, it does not provide a good way to deploy the models to production. I share in the post below how I deployed Detectron2 models with Triton inference server (an inference system developed by NVIDIA). Hope you find it's helpful! https://tintn.github.io/deploy-detectron2-with-triton/ submitted by /u/Tin_Ng [link] [comments]  ( 87 min )
    [D] Anyone using Run AI to manage their GPU servers?
    Hi, We recently were building a simple model training as a service for GPU jobs and came across run.ai. There has been mentions of GPU virtualization but not quite sure how to use it with the existing Kubernetes setup of ours. Are there any organisations using Run AI currently? Are there any opensource alternatives for the same? submitted by /u/scb_11 [link] [comments]  ( 105 min )
    [D] What are your favourite AI/ML RSS feeds (Podcasts, Blogs, News)?
    What are your favourite RSS/Atom feeds you consume to stay up to date with all things AI/ML in terms of News sites, Podcasts, Blogs or even YouTube channels? Ideally anything that isn't already covered by sites like allainews.com. Things like: research, innovations new libraries & how to use them products/tech related to AI/ML educational stuff, how to get into AI/Data Science anything else that's related submitted by /u/ai_jobs [link] [comments]  ( 88 min )
    [P] OcrPy: A Python Library to OCR, Archive, Index and Search any documents with ease.
    Hey there, We wanted to share a library that we've been working on for a while. It's called OcrPy - a Python library for doing OCR++. At its core the library aims to unify a multitude of OCR systems (commercial & open-source) and provides a consistent and unified interface, along with a lot of additional functionalities that are usually really useful in practice - such as identifying the type of documents, the different types of layout's they come in etc and in turn processing them in different approaches. Once you parse these docs, it also lets you index and enables searching over these collections via semantic search. with that said, here are the links to the python library, documentation & tutorials. Github: https://github.com/maxent-ai/ocrpy Documentation: https://maxentlabs…  ( 90 min )
    [P] dataclass_array: Dataclasses which can be manipulated like numpy arrays (batched, reshaped, sliced,...) (for TF, Jax, Numpy,...)
    Available at: https://github.com/google-research/dataclass_array `dataclass_array` allow to have structured data that can be of arbitrary batch shape. For example: Defining a dataclass array: @dataclasses.dataclass(frozen=True) class Ray(dca.DataclassArray): pos: FloatArray['*batch_shape 3'] dir: FloatArray['*batch_shape 3'] Dataclass array can then be manipulated as if they were ndarray, while keeping the internal semantic structure rays = camera.rays() # Returns `Ray` with shape `(h, w)` rays.shape == (h, w) rays.pos.shape == (h, w, 3) # Individual ndarray fields accessible rays = rays.reshape('h w -> w h') # Native einops support rays = rays.flatten() rays = rays[..., :30] rays = rays[rays.norm() > 0] # Masking, filtering rays = rays.as_jax() # Native Jax, TF, NumPy... support For an example of dataclass_array used in practice, see: https://github.com/google-research/visu3d submitted by /u/Conchylicultor [link] [comments]  ( 88 min )
    [D] Doing similarity search for images?
    One way you can do similarity search is by training your own CLIP. But let's say you only have images, and you want to make a good similarity search algorithm, like https://same.energy/. What might you do? Is there any good model that can be trained on just images? Maybe you just use a Vision Transformer to produce embeddings? The thing is, same.energy says: Same Energy's core search uses deep learning. The most similar published work is CLIP by OpenAI. which indicates he is doing something more fancy than a vision transformer. submitted by /u/throwaway119284 [link] [comments]  ( 90 min )
  • Open

    Locating content of highest value groups in SQL
    Quite often, while skimming through SQL to prepare for interviews, one comes across the question of finding the employee with the highest or 2nd highest salary by joining a table that holds employee information with another that contains department information. This begs the question: what about finding the employee who earns the nth highest salary department-wise?… Read More »Locating content of highest value groups in SQL The post Locating content of highest value groups in SQL appeared first on Data Science Central.  ( 21 min )
    A Single Source of Truth: The 360 Customer View
    An average consumer uses various marketing channels while interacting with a brand. The numbers were calculated in Upland BlueVenn’s latest Digital Divide Report. They examined 500 marketers and 4000 consumers across the UK and US in 2021, and found that at least 20 marketing channels were used. When data comes from so many ends, the… Read More »A Single Source of Truth: The 360 Customer View The post A Single Source of Truth: The 360 Customer View appeared first on Data Science Central.  ( 20 min )
    DSC Weekly 16 August 2022: You Are Your Business
    Announcements As cybersecurity risks evolve, it’s more important than ever for organizations to be aware of emerging threats and developments. In the four-day Combating Cyber Threats & Breach Prevention 2022 summit, leading security experts will share best-in-class strategies for keeping abreast of vulnerabilities, anticipating coming threats and baking security into a wide range of enterprise operations and… Read More »DSC Weekly 16 August 2022: You Are Your Business The post DSC Weekly 16 August 2022: You Are Your Business appeared first on Data Science Central.  ( 21 min )
    Cloud Security for Healthcare Sector: All You Need to Know
    Endpoint security is vital for cloud computing. Tracking and applying security protocols across the device is an ongoing process; regular checkups, in the form of audits and penetration testing, can keep your cloud security strong. As the healthcare sector and the rest of the world continues to become increasingly reliant on digital technologies, there has… Read More »Cloud Security for Healthcare Sector: All You Need to Know The post Cloud Security for Healthcare Sector: All You Need to Know appeared first on Data Science Central.  ( 18 min )
    Blockchain Technology Optimizing Early Entrants in Education
    In its initial stages of development, Blockchain Technology is predicted to top $2 billion in three years. We’re living in an era of super-smart intelligent machines, rather, now, our future lies in being more human and less like a machine. This is how the consequences of this inevitable rise of technology are engraved in our… Read More »Blockchain Technology Optimizing Early Entrants in Education The post Blockchain Technology Optimizing Early Entrants in Education appeared first on Data Science Central.  ( 19 min )
    How to Implement a Data Privacy and Protection Strategy for Remote Teams
    In today’s digital world, data privacy and protection are increasingly important. Add in the complexity of remote teams, and you have a whole new ball game. It’s undeniable that remote work is favored by employees and now employers too, with 97% of workers. So it’s essential to implement a data privacy and protection strategy that… Read More »How to Implement a Data Privacy and Protection Strategy for Remote Teams The post How to Implement a Data Privacy and Protection Strategy for Remote Teams appeared first on Data Science Central.  ( 21 min )
  • Open

    Dump a pickle file to a readable text file
    I got a data file from a client recently in “pickle” format. I happen to know that pickle is a binary format for serializing Python objects, but trying to open a pickle file could be a puzzle if you didn’t know this. There are a couple problems with using pickle files for data transfer. First […] Dump a pickle file to a readable text file first appeared on John D. Cook.  ( 5 min )
    Duodecibels
    It’s a curious and convenient fact that many decibel values are close to integers [1]: 3 dB ≈ 2 6 dB ≈ 4 7 dB ≈ 5 9 dB ≈ 8 Is base 10 unique in this regard? If we were to look at the analogs of decibels in other bases, would we see a […] Duodecibels first appeared on John D. Cook.  ( 6 min )
  • Open

    What is the relationship between state value V and action Q function?
    https://preview.redd.it/5rj64g4zj3i91.png?width=667&format=png&auto=webp&s=4d148c9b5c57c9d050b60bd69a67f57677e5e45a submitted by /u/souhaielbensalem [link] [comments]  ( 87 min )
    Advantage clipping in PPO
    Per mini batch normalization of advantages seems to be quite common and is discussed in many papers such as Andrychowicz 2020 . However, especially with very large batch sizes, which are generally favourable for a smooth learning progress, outliers can still slip trough. My experiments showed, this can lead to very high KL-Divergence and therefore unlearning the previously learned policy. I had the idea to just clip the advantages by the 99th percentile or something semilar. But I was wondering if there are some papers already about this method, as I couldn't find some. Any thoughts, experience and papers would be much appreciated! ​ ​ At ~21k steps you can see the mean advantages are dropping. Unfortunately I did not record the max. advantages. I try to replicate this now. ​ https://preview.redd.it/kaf6q7xr53i91.png?width=344&format=png&auto=webp&s=6839d1cb73518563868328a442ac4812f9b5a589 ​ https://preview.redd.it/8jzzvntu53i91.png?width=345&format=png&auto=webp&s=5c31aee4382c8ee3cbb293a709766ff6a3e52d9d submitted by /u/flxh13 [link] [comments]  ( 87 min )
    Thompson Sampling for Contextual Bandits with Linear Payoffs- building a recommender system
    Hi guys, I am trying to implement the algorithm in paper "Thompson Sampling for Contextual Bandits with Linear Payoffs" (https://arxiv.org/pdf/1209.3352.pdf). I am just unsure of the linear algebra aspect of this paper. If you look at section 2.2, https://preview.redd.it/it9nuvyh40i91.png?width=1058&format=png&auto=webp&s=cbd2a01dd5b29f8cdec5f406e3bc7da2d01c8fe9 Here, we see that mu_tilde is sampled from a normal (Gaussian) distribution with mean mu_hat and standard deviation v^2*B(t)^-1. Just looking at matrix dimensions for simplicity, B(t) = identity[d x d] + features[d x 1] * features.T[1 x d] = [d x d] + [d x d] = [d x d] mu_hat = B(t)^-1 [d x d] * ( [d x 1] * [1 x 1]) = [d x d] * [d x 1] = [d x 1] This is all cool. Now we get to the actual algorithm, which samples mu_tilde from: Gaussian(mean=mu_hat, std=v^2*B(t)^-1) In Python implementation, I use np.random.normal(mean=mu_hat, std=v^2*B(t)^-1). Here is the problem: v^2 is a constant that is set in initialization. So, v^2*B(t)^-1 = [1x1] * [d x d] = [d x d] That means the sample from the gaussian will also be [dxd]. This means that the second part of the algo, which calculates features [1 x d] * mu_tilde [dxd] = [1xd]. From my understanding, we want this final term that goes into the argmax to be [1x1] (aka 1 number) because we want to pick the arm that maximizes this term. Given 3 [1xd] matrices, I cannot compute the argmax. Has anyone implemented this algorithm? ​ Is there a problem with my logic here? I am really struggling to implement this... can anyone help? ​ My code is here ​ https://preview.redd.it/de134mvr80i91.png?width=1786&format=png&auto=webp&s=67242107e22d281ae63bd339a9354c738389e21a submitted by /u/Prestigious-Energy26 [link] [comments]  ( 89 min )
  • Open

    Breakthrough AI + Robotic Arm + Nvidia | 3D Printed Materials Feel Movements | AI To Cure Epilepsy
    submitted by /u/kenickh [link] [comments]  ( 93 min )
  • Open

    Smart Devices, Smart Manufacturing: Pegatron Taps AI, Digital Twins
    In the fast-paced field of making the world’s tech devices, Pegatron Corp. initially harnessed AI to gain an edge. Now, it’s on the cusp of creating digital twins to further streamline its efficiency. Whether or not they’re familiar with the name, most people have probably used smartphones, tablets, Wi-Fi routers or other products that Taiwan-based Read article > The post Smart Devices, Smart Manufacturing: Pegatron Taps AI, Digital Twins appeared first on NVIDIA Blog.  ( 6 min )
    AI Shows the Way: Seoul Robotics Helps Cars Move, Park on Their Own
    Imagine driving a car — one without self-driving capabilities — to a mall, airport or parking garage, and using an app to have the car drive off to park itself. Software company Seoul Robotics is using NVIDIA technology to make this possible — turning non-autonomous cars into self-driving vehicles. Headquartered in Korea, the company’s initial Read article > The post AI Shows the Way: Seoul Robotics Helps Cars Move, Park on Their Own appeared first on NVIDIA Blog.  ( 6 min )
    Digital Art Professor Kate Parsons Inspires Next Generation of Creators This Week ‘In the NVIDIA Studio’
    Many artists can edit a video, paint a picture or build a model — but transforming one’s imagination into stunning creations can now involve breakthrough design technologies. Kate Parsons, a digital art professor at Pepperdine University and this week’s featured In the NVIDIA Studio artist, helped bring a music video for How Do I Get to Invincible to life using virtual reality and NVIDIA GeForce RTX GPUs. The post Digital Art Professor Kate Parsons Inspires Next Generation of Creators This Week ‘In the NVIDIA Studio’ appeared first on NVIDIA Blog.  ( 8 min )
  • Open

    Towards Helpful Robots: Grounding Language in Robotic Affordances
    Posted by Brian Ichter and Karol Hausman, Research Scientists, Google Research, Brain Team Over the last several years, we have seen significant progress in applying machine learning to robotics. However, robotic systems today are capable of executing only very short, hard-coded commands, such as “Pick up an apple,” because they tend to perform best with clear tasks and rewards. They struggle with learning to perform long-horizon tasks and reasoning about abstract goals, such as a user prompt like “I just worked out, can you get me a healthy snack?” Meanwhile, recent progress in training language models (LMs) has led to systems that can perform a wide range of language understanding and generation tasks with impressive results. However, these language models are inherently not grounded i…  ( 27 min )
  • Open

    Generative Art Systems*
    * an epistemological approach  ( 16 min )
  • Open

    Plasticity Neural Network Based on Astrocytic Influence at Critical Period, Synaptic Competition and Compensation by Current and Mnemonic Brain Plasticity and Synapse Formation. (arXiv:2203.11740v3 [cs.NE] UPDATED)
    The mechanism of our NN is very well in line with the results of the latest MIT brain plasticity study, in which researchers found that as a synapse strengthens, neighboring synapses automatically weaken themselves to compensate. Regarding the importance of this mechanism, Dr. Luo's team at Stanford University has put forward that competition regarding synapse formation for dendritic morphogenesis is crucial. We try to conduct research on the mechanism of failure in brain plasticity by model at the closure of critical period in details by contrasting with studies before. Cutting edge imaging and genetic tools are combined in their experimental studies, whereas our research lays more emphasis on the model, derivation and simulation of a new NN. In tests, which demonstrate that dendrite generation, to a certain extent, is curbed by synapse formation. Current and mnemonic brain plasticity as well as synaptic action range are also taken into account in the study. Furthermore, the frame of the new NN is based on current gradient informational and mnemonic negative and positive gradient informational synapse formation. The mnemonic gradient information needs to take into account the forgotten memory-astrocytic synapse formation memory persistence factor (including both negative and positive memories - i.e. the optimal gradient information so far and relatively inferior gradient information). We found that the astrocytic memory persistence factor, like the phagocytosis factor, produces the effect of reducing the local accumulation of synapses. The PNN in which only the synaptic phagocytosis effect is considered regardless of the gradients update, and whether the synaptic phagocytosis of different variables and synaptic positions is cancelled is determined by the correlation coefficient of the corresponding time interval, proves simple and effective.  ( 3 min )
    RangeUDF: Semantic Surface Reconstruction from 3D Point Clouds. (arXiv:2204.09138v1 [cs.CV] CROSS LISTED)
    We present RangeUDF, a new implicit representation based framework to recover the geometry and semantics of continuous 3D scene surfaces from point clouds. Unlike occupancy fields or signed distance fields which can only model closed 3D surfaces, our approach is not restricted to any type of topology. Being different from the existing unsigned distance fields, our framework does not suffer from any surface ambiguity. In addition, our RangeUDF can jointly estimate precise semantics for continuous surfaces. The key to our approach is a range-aware unsigned distance function together with a surface-oriented semantic segmentation module. Extensive experiments show that RangeUDF clearly surpasses state-of-the-art approaches for surface reconstruction on four point cloud datasets. Moreover, RangeUDF demonstrates superior generalization capability across multiple unseen datasets, which is nearly impossible for all existing approaches.  ( 2 min )
    Siamese neural networks for a generalized, quantitative comparison of complex model outputs. (arXiv:2208.06530v1 [cs.LG])
    Computational models are quantitative representations of systems. By analyzing and comparing the outputs of such models, it is possible to gain a better understanding of the system itself. Though, as the complexity of model outputs increases, it becomes increasingly difficult to compare simulations to each other. While it is straightforward to only compare a few specific model outputs across multiple simulations, it is more informative to be able to compare model simulations as a whole. However, it is difficult to holistically compare model simulations in an unbiased manner. To address these limitations, we use Siamese neural networks to compare model simulations as a single value, with the neural networks capturing the relationships between the model outputs. We provide an approach to training Siamese networks on model simulations and display how the trained networks can then be used to provide a holistic comparison of model outputs. This approach can be applied to a wide range of model types, providing a quantitative method of analyzing the complex outputs of computational models.  ( 2 min )
    Optimistic No-regret Algorithms for Discrete Caching. (arXiv:2208.06414v1 [cs.LG])
    We take a systematic look at the problem of storing whole files in a cache with limited capacity in the context of optimistic learning, where the caching policy has access to a prediction oracle (provided by, e.g., a Neural Network). The successive file requests are assumed to be generated by an adversary, and no assumption is made on the accuracy of the oracle. In this setting, we provide a universal lower bound for prediction-assisted online caching and proceed to design a suite of policies with a range of performance-complexity trade-offs. All proposed policies offer sublinear regret bounds commensurate with the accuracy of the oracle. Our results substantially improve upon all recently-proposed online caching policies, which, being unable to exploit the oracle predictions, offer only $O(\sqrt{T})$ regret. In this pursuit, we design, to the best of our knowledge, the first comprehensive optimistic Follow-the-Perturbed leader policy, which generalizes beyond the caching problem. We also study the problem of caching files with different sizes and the bipartite network caching problem. Finally, we evaluate the efficacy of the proposed policies through extensive numerical experiments using real-world traces.  ( 2 min )
    Unifying supervised learning and VAEs -- automating statistical inference in (astro-)particle physics with amortized conditional normalizing flows. (arXiv:2008.05825v3 [cs.LG] UPDATED)
    A KL-divergence objective of the joint distribution of data and labels allows to unify supervised learning and variational autoencoders (VAEs) under one umbrella of stochastic variational inference. The unification motivates an extended supervised scheme which allows to calculate a goodness-of-fit p-value for the neural network model. Conditional normalizing flows amortized with a neural network are crucial in this construction. We discuss how they allow to rigorously define coverage for posteriors defined jointly on a product space, e.g. $\mathbb{R}^n \times \mathcal{S}^m$, which encompasses posteriors over directions. Finally, systematic uncertainties are naturally included in the variational viewpoint. In classical likelihood approaches or other machine learning models, the ingredients of (1) systematics, (2) coverage and (3) goodness-of-fit are typically not all available or at least one of them strongly constrained. In contrast, the proposed extended supervised training with amortized normalizing flows accommodates all three of them for variational inference of arbitrary statistical distributions defined on product spaces like $\mathbb{R}^n \times \ldots \times \mathcal{S}^m$ and no fundamental barrier in terms of complexity of the underlying data. It therefore has great potential for the statistical toolbox of the contemporary (astro-)particle physicist.
    Court Judgement Labeling Using Topic Modeling and Syntactic Parsing. (arXiv:2208.04225v2 [cs.IR] UPDATED)
    In regions that practice common law, relevant historical cases are essential references for sentencing. To help legal practitioners find previous judgement easier, this paper aims to label each court judgement by some tags. These tags are legally important to summarize the judgement and can guide the user to similar judgements. We introduce a heuristic system to solve the problem, which starts from Aspect-driven Topic Modeling and uses Dependency Parsing and Constituency Parsing for phrase generation. We also construct a legal term tree for Hong Kong and implemented a sentence simplification module to support the system. Finally, we propose a similar document recommendation algorithm based on the generated tags. It enables users to find similar documents based on a few selected aspects rather than the whole passage. Experiment results show that this system is the best approach for this specific task. It is better than simple term extraction method in terms of summarizing the document, and the recommendation algorithm is more effective than full-text comparison approaches. We believe that the system has huge potential in law as well as in other areas.
    When do Models Generalize? A Perspective from Data-Algorithm Compatibility. (arXiv:2202.06054v2 [cs.LG] UPDATED)
    One of the major open problems in machine learning theory is to characterize generalization in the overparameterized regime, where most traditional generalization bounds become inconsistent. In many scenarios, their failure can be attributed to obscuring the crucial interplay between the training algorithm and the underlying data distribution. To address this shortcoming, we propose a concept named compatibility, which quantitatively characterizes generalization in a both data-relevant and algorithm-relevant manner. By considering the entire training trajectory and focusing on early-stopping iterates, compatibility fully exploits the algorithm information and therefore yields better generalization guarantees. We validate this by theoretically studying compatibility under the setting of overparameterized linear regression with gradient descent. Specifically, we perform a data-dependent trajectory analysis and derive a sufficient condition for compatibility under such a setting. Our theoretical results show that in the sense of compatibility, generalization holds with significantly weaker restrictions on the problem instance than the previous last iterate analysis.
    LM-CORE: Language Models with Contextually Relevant External Knowledge. (arXiv:2208.06458v1 [cs.CL])
    Large transformer-based pre-trained language models have achieved impressive performance on a variety of knowledge-intensive tasks and can capture factual knowledge in their parameters. We argue that storing large amounts of knowledge in the model parameters is sub-optimal given the ever-growing amounts of knowledge and resource requirements. We posit that a more efficient alternative is to provide explicit access to contextually relevant structured knowledge to the model and train it to use that knowledge. We present LM-CORE -- a general framework to achieve this -- that allows \textit{decoupling} of the language model training from the external knowledge source and allows the latter to be updated without affecting the already trained model. Experimental results show that LM-CORE, having access to external knowledge, achieves significant and robust outperformance over state-of-the-art knowledge-enhanced language models on knowledge probing tasks; can effectively handle knowledge updates; and performs well on two downstream tasks. We also present a thorough error analysis highlighting the successes and failures of LM-CORE.
    The SVD of Convolutional Weights: A CNN Interpretability Framework. (arXiv:2208.06894v1 [cs.CV])
    Deep neural networks used for image classification often use convolutional filters to extract distinguishing features before passing them to a linear classifier. Most interpretability literature focuses on providing semantic meaning to convolutional filters to explain a model's reasoning process and confirm its use of relevant information from the input domain. Fully connected layers can be studied by decomposing their weight matrices using a singular value decomposition, in effect studying the correlations between the rows in each matrix to discover the dynamics of the map. In this work we define a singular value decomposition for the weight tensor of a convolutional layer, which provides an analogous understanding of the correlations between filters, exposing the dynamics of the convolutional map. We validate our definition using recent results in random matrix theory. By applying the decomposition across the linear layers of an image classification network we suggest a framework against which interpretability methods might be applied using hypergraphs to model class separation. Rather than looking to the activations to explain the network, we use the singular vectors with the greatest corresponding singular values for each linear layer to identify those features most important to the network. We illustrate our approach with examples and introduce the DeepDataProfiler library, the analysis tool used for this study.
    Class-attention Video Transformer for Engagement Intensity Prediction. (arXiv:2208.07216v1 [cs.CV])
    In order to deal with variant-length long videos, prior works extract multi-modal features and fuse them to predict students' engagement intensity. In this paper, we present a new end-to-end method Class Attention in Video Transformer (CavT), which involves a single vector to process class embedding and to uniformly perform end-to-end learning on variant-length long videos and fixed-length short videos. Furthermore, to address the lack of sufficient samples, we propose a binary-order representatives sampling method (BorS) to add multiple video sequences of each video to augment the training set. BorS+CavT not only achieves the state-of-the-art MSE (0.0495) on the EmotiW-EP dataset, but also obtains the state-of-the-art MSE (0.0377) on the DAiSEE dataset. The code and models will be made publicly available at https://github.com/mountainai/cavt.
    RandomSCM: interpretable ensembles of sparse classifiers tailored for omics data. (arXiv:2208.06436v1 [cs.LG])
    Background: Understanding the relationship between the Omics and the phenotype is a central problem in precision medicine. The high dimensionality of metabolomics data challenges learning algorithms in terms of scalability and generalization. Most learning algorithms do not produce interpretable models -- Method: We propose an ensemble learning algorithm based on conjunctions or disjunctions of decision rules. -- Results : Applications on metabolomics data shows that it produces models that achieves high predictive performances. The interpretability of the models makes them useful for biomarker discovery and patterns discovery in high dimensional data.
    Recent Advances in Reinforcement Learning in Finance. (arXiv:2112.04553v3 [q-fin.MF] UPDATED)
    The rapid changes in the finance industry due to the increasing amount of data have revolutionized the techniques on data processing and data analysis and brought new theoretical and computational challenges. In contrast to classical stochastic control theory and other analytical approaches for solving financial decision-making problems that heavily reply on model assumptions, new developments from reinforcement learning (RL) are able to make full use of the large amount of financial data with fewer model assumptions and to improve decisions in complex financial environments. This survey paper aims to review the recent developments and use of RL approaches in finance. We give an introduction to Markov decision processes, which is the setting for many of the commonly used RL approaches. Various algorithms are then introduced with a focus on value and policy based methods that do not require any model assumptions. Connections are made with neural networks to extend the framework to encompass deep RL algorithms. Our survey concludes by discussing the application of these RL algorithms in a variety of decision-making problems in finance, including optimal execution, portfolio optimization, option pricing and hedging, market making, smart order routing, and robo-advising.
    PatchDropout: Economizing Vision Transformers Using Patch Dropout. (arXiv:2208.07220v1 [cs.CV])
    Vision transformers have demonstrated the potential to outperform CNNs in a variety of vision tasks. But the computational and memory requirements of these models prohibit their use in many applications, especially those that depend on high-resolution images, such as medical image classification. Efforts to train ViTs more efficiently are overly complicated, necessitating architectural changes or intricate training schemes. In this work, we show that standard ViT models can be efficiently trained at high resolution by randomly dropping input image patches. This simple approach, PatchDropout, reduces FLOPs and memory by at least 50% in standard natural image datasets such as ImageNet, and those savings only increase with image size. On CSAW, a high-resolution medical dataset, we observe a 5 times savings in computation and memory using PatchDropout, along with a boost in performance. For practitioners with a fixed computational or memory budget, PatchDropout makes it possible to choose image resolution, hyperparameters, or model size to get the most performance out of their model.
    Model Generalization: A Sharpness Aware Optimization Perspective. (arXiv:2208.06915v1 [cs.LG])
    Sharpness-Aware Minimization (SAM) and adaptive sharpness-aware minimization (ASAM) aim to improve the model generalization. And in this project, we proposed three experiments to valid their generalization from the sharpness aware perspective. And our experiments show that sharpness aware-based optimization techniques could help to provide models with strong generalization ability. Our experiments also show that ASAM could improve the generalization performance on un-normalized data, but further research is needed to confirm this.
    Self-supervised Contrastive Representation Learning for Semi-supervised Time-Series Classification. (arXiv:2208.06616v1 [cs.LG])
    Learning time-series representations when only unlabeled data or few labeled samples are available can be a challenging task. Recently, contrastive self-supervised learning has shown great improvement in extracting useful representations from unlabeled data via contrasting different augmented views of data. In this work, we propose a novel Time-Series representation learning framework via Temporal and Contextual Contrasting (TS-TCC) that learns representations from unlabeled data with contrastive learning. Specifically, we propose time-series specific weak and strong augmentations and use their views to learn robust temporal relations in the proposed temporal contrasting module, besides learning discriminative representations by our proposed contextual contrasting module. Additionally, we conduct a systematic study of time-series data augmentation selection, which is a key part of contrastive learning. We also extend TS-TCC to the semi-supervised learning settings and propose a Class-Aware TS-TCC (CA-TCC) that benefits from the available few labeled data to further improve representations learned by TS-TCC. Specifically, we leverage robust pseudo labels produced by TS-TCC to realize class-aware contrastive loss. Extensive experiments show that the linear evaluation of the features learned by our proposed framework performs comparably with the fully supervised training. Additionally, our framework shows high efficiency in few labeled data and transfer learning scenarios. The code is publicly available at \url{https://github.com/emadeldeen24/TS-TCC}.
    Privacy-Preserving Logistic Regression Training with A Faster Gradient Variant. (arXiv:2201.10838v2 [cs.CR] UPDATED)
    Logistic regression training over encrypted data has been an attractive idea to security concerns for years. In this paper, we propose a faster gradient variant called $\texttt{quadratic gradient}$ to implement logistic regression training in a homomorphic encryption domain, the core of which can be seen as an extension of the simplified fixed Hessian. We enhance Nesterov's accelerated gradient (NAG) and Adaptive Gradient Algorithm (Adagrad) respectively with this gradient variant and evaluate the enhanced algorithms on several datasets. Experimental results show that the enhanced methods have a state-of-the-art performance in convergence speed compared to the naive first-order gradient methods. We then adopt the enhanced NAG method to implement homomorphic logistic regression training and obtain a comparable result by only $3$ iterations.
    Pruning Self-attentions into Convolutional Layers in Single Path. (arXiv:2111.11802v3 [cs.CV] UPDATED)
    Vision Transformers (ViTs) have achieved impressive performance over various computer vision tasks. However, modelling global correlations with multi-head self-attention (MSA) layers leads to two widely recognized issues: the massive computational resource consumption and the lack of intrinsic inductive bias for modelling local visual patterns. To solve both issues, we devise a simple yet effective method named Single-Path Vision Transformer pruning (SPViT), to efficiently and automatically compress the pre-trained ViTs into compact models with proper locality added. Specifically, we first propose a novel weight-sharing scheme between MSA and convolutional operations, delivering a single-path space to encode all candidate operations. In this way, we cast the operation search problem as finding which subset of parameters to use in each MSA layer, which significantly reduces the computational cost and optimization difficulty, and the convolution kernels can be well initialized using pre-trained MSA parameters. Relying on the single-path space, we further introduce learnable binary gates to encode the operation choices, which are jointly optimized with network parameters to automatically determine the configuration of each layer. We conduct extensive experiments on two representative ViTs showing that our SPViT achieves a new SOTA for pruning on ImageNet-1k. For example, our SPViT can trim 52.0% FLOPs for DeiT-B and get an impressive 0.6% top-1 accuracy gain simultaneously. The source code is available at https://github.com/ziplab/SPViT.
    Finite Expression Method for Solving High-Dimensional Partial Differential Equations. (arXiv:2206.10121v2 [math.NA] UPDATED)
    Designing efficient and accurate numerical solvers for high-dimensional partial differential equations (PDEs) remains a challenging and important topic in computational science and engineering, mainly due to the "curse of dimensionality" in designing numerical schemes that scale in dimension. This paper introduces a new methodology that seeks an approximate PDE solution in the space of functions with finitely many analytic expressions and, hence, this methodology is named the finite expression method (FEX). It is proved in approximation theory that FEX can avoid the curse of dimensionality. As a proof of concept, a deep reinforcement learning method is proposed to implement FEX for various high-dimensional PDEs in different dimensions, achieving high and even machine accuracy with a memory complexity polynomial in dimension and an amenable time complexity. An approximate solution with finite analytic expressions also provides interpretable insights into the ground truth PDE solution, which can further help to advance the understanding of physical systems and design postprocessing techniques for a refined solution.
    Diffusion Models for Video Prediction and Infilling. (arXiv:2206.07696v2 [cs.CV] UPDATED)
    Predicting and anticipating future outcomes or reasoning about missing information in a sequence are critical skills for agents to be able to make intelligent decisions. This requires strong, temporally coherent generative capabilities. Diffusion models have shown remarkable success in several generative tasks, but have not been extensively explored in the video domain. We present Random-Mask Video Diffusion (RaMViD), which extends image diffusion models to videos using 3D convolutions, and introduces a new conditioning technique during training. By varying the mask we condition on, the model is able to perform video prediction, infilling, and upsampling. Due to our simple conditioning scheme, we can utilize the same architecture as used for unconditional training, which allows us to train the model in a conditional and unconditional fashion at the same time. We evaluate the model on two benchmark datasets for video prediction, on which we achieve state-of-the-art results, and one for video generation.
    Bounding Membership Inference. (arXiv:2202.12232v3 [cs.LG] UPDATED)
    Differential Privacy (DP) is the de facto standard for reasoning about the privacy guarantees of a training algorithm. Despite the empirical observation that DP reduces the vulnerability of models to existing membership inference (MI) attacks, a theoretical underpinning as to why this is the case is largely missing in the literature. In practice, this means that models need to be trained with DP guarantees that greatly decrease their accuracy. In this paper, we provide a tighter bound on the positive accuracy (i.e., attack precision) of any MI adversary when a training algorithm provides $\epsilon$-DP or $(\epsilon, \delta)$-DP. Our bound informs the design of a novel privacy amplification scheme, where an effective training set is sub-sampled from a larger set prior to the beginning of training, to greatly reduce the bound on MI accuracy. As a result, our scheme enables DP users to employ looser DP guarantees when training their model to limit the success of any MI adversary; this ensures that the model's accuracy is less impacted by the privacy guarantee. Finally, we discuss implications of our MI bound on the field of machine unlearning.
    The Enforced Transfer: An Instance-Based Divide-and-Conquer Unsupervised Domain Adaptation Algorithm. (arXiv:2201.10001v3 [cs.LG] UPDATED)
    Existing Domain Adaptation (DA) algorithms train target models and then use the target models to classify all samples in the target dataset. While this approach attempts to address the problem that the source and the target data are from different distributions, it fails to recognize the possibility that, within the target domain, some samples are closer to the distribution of the source domain than the distribution of the target domain. In this paper, we develop a novel DA algorithm, the Enforced Transfer, that deals with this situation. A straightforward but effective idea to deal with this dilemma is to use an out-of-distribution detection algorithm to decide if, during the testing phase, a given sample is closer to the distribution of the source domain, the target domain, or neither. In the first case, this sample is given to a machine learning classifier trained on source samples. In the second case, this sample is given to a machine learning classifier trained on target samples. In the third case, this sample is discarded as neither an ML model trained on source nor an ML model trained on target is suitable to classify it. It is widely known that the first few layers in a neural network extract low-level features, so the aforementioned approach can be extended from classifying samples in three different scenarios to classifying the samples' activations after an empirically determined layer in three different scenarios. The Enforced Transfer implements the idea. On three types of DA tasks, we outperform the state-of-the-art algorithms that we compare against.
    Prospects of federated machine learning in fluid dynamics. (arXiv:2208.07017v1 [cs.LG])
    Physics-based models have been mainstream in fluid dynamics for developing predictive models. In recent years, machine learning has offered a renaissance to the fluid community due to the rapid developments in data science, processing units, neural network based technologies, and sensor adaptations. So far in many applications in fluid dynamics, machine learning approaches have been mostly focused on a standard process that requires centralizing the training data on a designated machine or in a data center. In this letter, we present a federated machine learning approach that enables localized clients to collaboratively learn an aggregated and shared predictive model while keeping all the training data on each edge device. We demonstrate the feasibility and prospects of such decentralized learning approach with an effort to forge a deep learning surrogate model for reconstructing spatiotemporal fields. Our results indicate that federated machine learning might be a viable tool for designing highly accurate predictive decentralized digital twins relevant to fluid dynamics.
    Hide and Seek: on the Stealthiness of Attacks against Deep Learning Systems. (arXiv:2205.15944v2 [cs.CR] UPDATED)
    With the growing popularity of artificial intelligence and machine learning, a wide spectrum of attacks against deep learning models have been proposed in the literature. Both the evasion attacks and the poisoning attacks attempt to utilize adversarially altered samples to fool the victim model to misclassify the adversarial sample. While such attacks claim to be or are expected to be stealthy, i.e., imperceptible to human eyes, such claims are rarely evaluated. In this paper, we present the first large-scale study on the stealthiness of adversarial samples used in the attacks against deep learning. We have implemented 20 representative adversarial ML attacks on six popular benchmarking datasets. We evaluate the stealthiness of the attack samples using two complementary approaches: (1) a numerical study that adopts 24 metrics for image similarity or quality assessment; and (2) a user study of 3 sets of questionnaires that has collected 20,000+ annotations from 1,000+ responses. Our results show that the majority of the existing attacks introduce nonnegligible perturbations that are not stealthy to human eyes. We further analyze the factors that contribute to attack stealthiness. We further examine the correlation between the numerical analysis and the user studies, and demonstrate that some image quality metrics may provide useful guidance in attack designs, while there is still a significant gap between assessed image quality and visual stealthiness of attacks.
    Learning Physics-Consistent Particle Interactions. (arXiv:2202.00299v2 [cs.LG] UPDATED)
    Interacting particle systems play a key role in science and engineering. Access to the governing particle interaction law is fundamental for a complete understanding of such systems. However, the inherent system complexity keeps the particle interaction hidden in many cases. Machine learning methods have the potential to learn the behavior of interacting particle systems by combining experiments with data analysis methods. However, most existing algorithms focus on learning the kinetics at the particle level. Learning pairwise interaction, e.g., pairwise force or pairwise potential energy, remains an open challenge. Here, we propose an algorithm that adapts the Graph Networks framework, which contains an edge part to learn the pairwise interaction and a node part to model the dynamics at particle level. Different from existing approaches that use neural networks in both parts, we design a deterministic operator in the node part that allows to precisely infer the pairwise interactions that are consistent with underlying physical laws by only being trained to predict the particle acceleration. We test the proposed methodology on multiple datasets and demonstrate that it achieves superior performance in inferring correctly the pairwise interactions while also being consistent with the underlying physics on all the datasets. The proposed framework is scalable to larger systems and transferable to any type of particle interactions. The developed methodology can support a better understanding and discovery of the underlying particle interaction laws, and hence guide the design of materials with targeted properties.
    Deception for Cyber Defence: Challenges and Opportunities. (arXiv:2208.07127v1 [cs.CR])
    Deception is rapidly growing as an important tool for cyber defence, complementing existing perimeter security measures to rapidly detect breaches and data theft. One of the factors limiting the use of deception has been the cost of generating realistic artefacts by hand. Recent advances in Machine Learning have, however, created opportunities for scalable, automated generation of realistic deceptions. This vision paper describes the opportunities and challenges involved in developing models to mimic many common elements of the IT stack for deception effects.
    A Hybrid Model and Learning-Based Adaptive Navigation Filter. (arXiv:2207.12082v2 [eess.SY] UPDATED)
    The fusion between an inertial navigation system and global navigation satellite systems is regularly used in many platforms such as drones, land vehicles, and marine vessels. The fusion is commonly carried out in a model-based extended Kalman filter framework. One of the critical parameters of the filter is the process noise covariance. It is responsible for the real-time solution accuracy, as it considers both vehicle dynamics uncertainty and the inertial sensors quality. In most situations, the process noise is covariance assumed to be constant. Yet, due to vehicle dynamics and sensor measurement variations throughout the trajectory, the process noise covariance is subject to change. To cope with such situations, several adaptive model-based Kalman filters were suggested in the literature. In this paper, we propose a hybrid model and learning-based adaptive navigation filter. We rely on the model-based Kalman filter and design a deep neural network model to tune the momentary system noise covariance matrix, based only on the inertial sensor readings. Once the process noise covariance is learned, it is plugged into the well-established, model-based Kalman filter. After deriving the proposed hybrid framework, field experiment results using a quadrotor are presented and a comparison to model-based adaptive approaches is given. We show that the proposed method obtained an improvement of 25% in the position error. Furthermore, the proposed hybrid learning method can be used in any navigation filter and also in any relevant estimation problem.
    Defense against Backdoor Attacks via Identifying and Purifying Bad Neurons. (arXiv:2208.06537v1 [cs.LG])
    The opacity of neural networks leads their vulnerability to backdoor attacks, where hidden attention of infected neurons is triggered to override normal predictions to the attacker-chosen ones. In this paper, we propose a novel backdoor defense method to mark and purify the infected neurons in the backdoored neural networks. Specifically, we first define a new metric, called benign salience. By combining the first-order gradient to retain the connections between neurons, benign salience can identify the infected neurons with higher accuracy than the commonly used metric in backdoor defense. Then, a new Adaptive Regularization (AR) mechanism is proposed to assist in purifying these identified infected neurons via fine-tuning. Due to the ability to adapt to different magnitudes of parameters, AR can provide faster and more stable convergence than the common regularization mechanism in neuron purifying. Extensive experimental results demonstrate that our method can erase the backdoor in neural networks with negligible performance degradation.
    Z-BERT-A: a zero-shot Pipeline for Unknown Intent detection. (arXiv:2208.07084v1 [cs.CL])
    Intent discovery is a fundamental task in NLP, and it is increasingly relevant for a variety of industrial applications (Quarteroni 2018). The main challenge resides in the need to identify from input utterances novel unseen in-tents. Herein, we propose Z-BERT-A, a two-stage method for intent discovery relying on a Transformer architecture (Vaswani et al. 2017; Devlin et al. 2018), fine-tuned with Adapters (Pfeiffer et al. 2020), initially trained for Natural Language Inference (NLI), and later applied for unknown in-tent classification in a zero-shot setting. In our evaluation, we firstly analyze the quality of the model after adaptive fine-tuning on known classes. Secondly, we evaluate its performance casting intent classification as an NLI task. Lastly, we test the zero-shot performance of the model on unseen classes, showing how Z-BERT-A can effectively perform in-tent discovery by generating intents that are semantically similar, if not equal, to the ground truth ones. Our experiments show how Z-BERT-A is outperforming a wide variety of baselines in two zero-shot settings: known intents classification and unseen intent discovery. The proposed pipeline holds the potential to be widely applied in a variety of application for customer care. It enables automated dynamic triage using a lightweight model that, unlike large language models, can be easily deployed and scaled in a wide variety of business scenarios. Especially when considering a setting with limited hardware availability and performance whereon-premise or low resource cloud deployments are imperative. Z-BERT-A, predicting novel intents from a single utterance, represents an innovative approach for intent discovery, enabling online generation of novel intents. The pipeline is available as an installable python package at the following link: https://github.com/GT4SD/zberta.
    UAV-CROWD: Violent and non-violent crowd activity simulator from the perspective of UAV. (arXiv:2208.06702v1 [cs.CV])
    Unmanned Aerial Vehicle (UAV) has gained significant traction in the recent years, particularly the context of surveillance. However, video datasets that capture violent and non-violent human activity from aerial point-of-view is scarce. To address this issue, we propose a novel, baseline simulator which is capable of generating sequences of photo-realistic synthetic images of crowds engaging in various activities that can be categorized as violent or non-violent. The crowd groups are annotated with bounding boxes that are automatically computed using semantic segmentation. Our simulator is capable of generating large, randomized urban environments and is able to maintain an average of 25 frames per second on a mid-range computer with 150 concurrent crowd agents interacting with each other. We also show that when synthetic data from the proposed simulator is augmented with real world data, binary video classification accuracy is improved by 5% on average across two different models.
    Transformer-Empowered 6G Intelligent Networks: From Massive MIMO Processing to Semantic Communication. (arXiv:2205.03770v2 [cs.IT] UPDATED)
    6G wireless networks are foreseen to speed up the convergence of the physical and cyber worlds and to enable a paradigm-shift in the way we deploy and exploit communication networks. Machine learning, in particular deep learning (DL), is going to be one of the key technological enablers of 6G by offering a new paradigm for the design and optimization of networks with a high level of intelligence. In this article, we introduce an emerging DL architecture, known as the transformer, and discuss its potential impact on 6G network design. We first discuss the differences between the transformer and classical DL architectures, and emphasize the transformer's self-attention mechanism and strong representation capabilities, which make it particularly appealing in tackling various challenges in wireless network design. Specifically, we propose transformer-based solutions for massive multiple-input multiple-output (MIMO) systems and various semantic communication problems in 6G networks. Finally, we discuss key challenges and open issues in transformer-based solutions, and identify future research directions for their deployment in intelligent 6G networks.
    Theoretical Exploration of Solutions of Feedforward ReLU Networks. (arXiv:2202.01919v6 [cs.LG] UPDATED)
    This paper aims to interpret the mechanism of feedforward ReLU networks by exploring their solutions for piecewise linear functions, through the deduction from basic rules. The constructed solution should be universal enough to explain some network architectures of engineering; in order for that, several ways are provided to enhance the solution universality. Some of the consequences of our theories include: Under affine-geometry background, the solutions of both three-layer networks and deep-layer networks are given, particularly for those architectures applied in practice, such as multilayer feedforward neural networks and decoders; We give clear and intuitive interpretations of each component of network architectures; The parameter-sharing mechanism for multi-outputs is investigated; We provide an explanation of overparameterization solutions in terms of affine transforms; Under our framework, an advantage of deep layers compared to shallower ones is natural to be obtained. Some intermediate results are the basic knowledge for the modeling or understanding of neural networks, such as the classification of data embedded in higher-dimensional space, the generalization of affine transforms, the probabilistic model of matrix ranks, the concepts of distinguishable data sets and interference among hyperplanes, and so on.
    Fast & Furious: Modelling Malware Detection as Evolving Data Streams. (arXiv:2205.12311v2 [cs.CR] UPDATED)
    Malware is a major threat to computer systems and imposes many challenges to cyber security. Targeted threats, such as ransomware, cause millions of dollars in losses every year. The constant increase of malware infections has been motivating popular antiviruses (AVs) to develop dedicated detection strategies, which include meticulously crafted machine learning (ML) pipelines. However, malware developers unceasingly change their samples' features to bypass detection. This constant evolution of malware samples causes changes to the data distribution (i.e., concept drifts) that directly affect ML model detection rates, something not considered in the majority of the literature work. In this work, we evaluate the impact of concept drift on malware classifiers for two Android datasets: DREBIN (about 130K apps) and a subset of AndroZoo (about 350K apps). We used these datasets to train an Adaptive Random Forest (ARF) classifier, as well as a Stochastic Gradient Descent (SGD) classifier. We also ordered all datasets samples using their VirusTotal submission timestamp and then extracted features from their textual attributes using two algorithms (Word2Vec and TF-IDF). Then, we conducted experiments comparing both feature extractors, classifiers, as well as four drift detectors (DDM, EDDM, ADWIN, and KSWIN) to determine the best approach for real environments. Finally, we compare some possible approaches to mitigate concept drift and propose a novel data stream pipeline that updates both the classifier and the feature extractor. To do so, we conducted a longitudinal evaluation by (i) classifying malware samples collected over nine years (2009-2018), (ii) reviewing concept drift detection algorithms to attest its pervasiveness, (iii) comparing distinct ML approaches to mitigate the issue, and (iv) proposing an ML data stream pipeline that outperformed literature approaches.
    A Near-Optimal Algorithm for Univariate Zeroth-Order Budget Convex Optimization. (arXiv:2208.06720v1 [math.OC])
    This paper studies a natural generalization of the problem of minimizing a univariate convex function $f$ by querying its values sequentially. At each time-step $t$, the optimizer can invest a budget $b_t$ in a query point $X_t$ of their choice to obtain a fuzzy evaluation of $f$ at $X_t$ whose accuracy depends on the amount of budget invested in $X_t$ across times. This setting is motivated by the minimization of objectives whose values can only be determined approximately through lengthy or expensive computations. We design an any-time parameter-free algorithm called Dyadic Search, for which we prove near-optimal optimization error guarantees. As a byproduct of our analysis, we show that the classical dependence on the global Lipschitz constant in the error bounds is an artifact of the granularity of the budget. Finally, we illustrate our theoretical findings with numerical simulations.
    An Efficient and Reliable Asynchronous Federated Learning Scheme for Smart Public Transportation. (arXiv:2208.07194v1 [cs.LG])
    Machine Learning (ML) is a distributed approach for training predictive models on the Internet of Vehicles (IoV) to enable smart public transportation. Since the traffic conditions change over time, the ML model that predicts traffic flows and the time passengers wait at stops must be updated continuously and efficiently. Federated learning (FL) is a distributed machine learning scheme that allows vehicles to receive continuous model updates without having to upload raw data to the cloud and wait for models to be trained. However, FL in smart public transportation is vulnerable to poisoning or DDoS attacks since vehicles travel in public. Besides, due to device heterogeneity and imbalanced data distributions, the synchronized aggregation strategy that collects local models from specific vehicles before aggregation is inefficient. Although Asynchronous Federated Learning (AFL) schemes are developed to improve efficiency by aggregating local models as soon as they are received, the stale local models remain unreasonably weighted, resulting in poor learning performance. To enable smarter public transportation, this paper offers a blockchain-based asynchronous federated learning scheme with a dynamic scaling factor (DBAFL). Specifically, the novel committee-based consensus algorithm for blockchain improves reliability at the lowest possible cost of time. Meanwhile, the devised dynamic scaling factor allows AFL to assign reasonable weight to stale local models. Extensive experiments conducted on heterogeneous devices validate outperformed learning performance, efficiency, and reliability of DBAFL.
    GANzilla: User-Driven Direction Discovery in Generative Adversarial Networks. (arXiv:2207.08320v2 [cs.HC] UPDATED)
    Generative Adversarial Network (GAN) is widely adopted in numerous application areas, such as data preprocessing, image editing, and creativity support. However, GAN's 'black box' nature prevents non-expert users from controlling what data a model generates, spawning a plethora of prior work that focused on algorithm-driven approaches to extract editing directions to control GAN. Complementarily, we propose a GANzilla: a user-driven tool that empowers a user with the classic scatter/gather technique to iteratively discover directions to meet their editing goals. In a study with 12 participants, GANzilla users were able to discover directions that (i) edited images to match provided examples (closed-ended tasks) and that (ii) met a high-level goal, e.g., making the face happier, while showing diversity across individuals (open-ended tasks).
    Analysis of impact of emotions on target speech extraction and speech separation. (arXiv:2208.07091v1 [cs.SD])
    Recently, the performance of blind speech separation (BSS) and target speech extraction (TSE) has greatly progressed. Most works, however, focus on relatively well-controlled conditions using, e.g., read speech. The performance may degrade in more realistic situations. One of the factors causing such degradation may be intrinsic speaker variability, such as emotions, occurring commonly in realistic speech. In this paper, we investigate the influence of emotions on TSE and BSS. We create a new test dataset of emotional mixtures for the evaluation of TSE and BSS. This dataset combines LibriSpeech and Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS). Through controlled experiments, we can analyze the impact of different emotions on the performance of BSS and TSE. We observe that BSS is relatively robust to emotions, while TSE, which requires identifying and extracting the speech of a target speaker, is much more sensitive to emotions. On comparative speaker verification experiments we show that identifying the target speaker may be particularly challenging when dealing with emotional speech. Using our findings, we outline potential future directions that could improve the robustness of BSS and TSE systems toward emotional speech.
    Continuous Active Learning Using Pretrained Transformers. (arXiv:2208.06955v1 [cs.IR])
    Pre-trained and fine-tuned transformer models like BERT and T5 have improved the state of the art in ad-hoc retrieval and question-answering, but not as yet in high-recall information retrieval, where the objective is to retrieve substantially all relevant documents. We investigate whether the use of transformer-based models for reranking and/or featurization can improve the Baseline Model Implementation of the TREC Total Recall Track, which represents the current state of the art for high-recall information retrieval. We also introduce CALBERT, a model that can be used to continuously fine-tune a BERT-based model based on relevance feedback.
    Novel Ordering-based Approaches for Causal Structure Learning in the Presence of Unobserved Variables. (arXiv:2208.06935v1 [cs.LG])
    We propose ordering-based approaches for learning the maximal ancestral graph (MAG) of a structural equation model (SEM) up to its Markov equivalence class (MEC) in the presence of unobserved variables. Existing ordering-based methods in the literature recover a graph through learning a causal order (c-order). We advocate for a novel order called removable order (r-order) as they are advantageous over c-orders for structure learning. This is because r-orders are the minimizers of an appropriately defined optimization problem that could be either solved exactly (using a reinforcement learning approach) or approximately (using a hill-climbing search). Moreover, the r-orders (unlike c-orders) are invariant among all the graphs in a MEC and include c-orders as a subset. Given that set of r-orders is often significantly larger than the set of c-orders, it is easier for the optimization problem to find an r-order instead of a c-order. We evaluate the performance and the scalability of our proposed approaches on both real-world and randomly generated networks.
    Graph Neural Networks as Gradient Flows. (arXiv:2206.10991v2 [cs.LG] UPDATED)
    Dynamical systems minimizing an energy are ubiquitous in geometry and physics. We propose a novel framework for GNNs where we parametrize (and {\em learn}) an energy functional and then take the GNN equations to be the gradient flow of such energy. This approach allows to analyse the GNN evolution from a multi-particle perspective as learning attractive and repulsive forces in feature space via the positive and negative eigenvalues of a symmetric `channel-mixing' matrix. We conduct spectral analysis of the solutions and provide a better understanding of the role of the channel-mixing in (residual) graph convolutional models and of its ability to steer the diffusion away from over-smoothing. We perform thorough ablation studies corroborating our theory and show competitive performance of simple models on homophilic and heterophilic datasets.
    Inductive Biases for Object-Centric Representations in the Presence of Complex Textures. (arXiv:2204.08479v3 [cs.CV] UPDATED)
    Understanding which inductive biases could be helpful for the unsupervised learning of object-centric representations of natural scenes is challenging. In this paper, we systematically investigate the performance of two models on datasets where neural style transfer was used to obtain objects with complex textures while still retaining ground-truth annotations. We find that by using a single module to reconstruct both the shape and visual appearance of each object, the model learns more useful representations and achieves better object separation. In addition, we observe that adjusting the latent space size is insufficient to improve segmentation performance. Finally, the downstream usefulness of the representations is significantly more strongly correlated with segmentation quality than with reconstruction accuracy.
    A Novel Regularization Approach to Fair ML. (arXiv:2208.06557v1 [cs.LG])
    A number of methods have been introduced for the fair ML issue, most of them complex and many of them very specific to the underlying ML moethodology. Here we introduce a new approach that is simple, easily explained, and potentially applicable to a number of standard ML algorithms. Explicitly Deweighted Features (EDF) reduces the impact of each feature among the proxies of sensitive variables, allowing a different amount of deweighting applied to each such feature. The user specifies the deweighting hyperparameters, to achieve a given point in the Utility/Fairness tradeoff spectrum. We also introduce a new, simple criterion for evaluating the degree of protection afforded by any fair ML method.
    Feasibility Layer Aided Machine Learning Approach for Day-Ahead Operations. (arXiv:2208.06742v1 [eess.SY])
    Day-ahead operations involves a complex and computationally intensive optimization process to determine the generator commitment schedule and dispatch. The optimization process is a mixed-integer linear program (MILP) also known as security-constrained unit commitment (SCUC). Independent system operators (ISOs) run SCUC daily and require state-of-the-art algorithms to speed up the process. Existing patterns in historical information can be leveraged for model reduction of SCUC, which can provide significant time savings. In this paper, machine learning (ML) based classification approaches, namely logistic regression, neural networks, random forest and K-nearest neighbor, were studied for model reduction of SCUC. The ML was then aided with a feasibility layer (FL) and post-process technique to ensure high-quality solutions. The proposed approach is validated on several test systems namely, IEEE 24-Bus system, IEEE-73 Bus system, IEEE 118-Bus system, 500-Bus system, and Polish 2383-Bus system. Moreover, model reduction of a stochastic SCUC (SSCUC) was demonstrated utilizing a modified IEEE 24-Bus system with renewable generation. Simulation results demonstrate a high training accuracy to identify commitment schedule while FL and post-process ensure ML predictions do not lead to infeasible solutions with minimal loss in solution quality.
    A Theory for Knowledge Transfer in Continual Learning. (arXiv:2208.06931v1 [cs.LG])
    Continual learning of a stream of tasks is an active area in deep neural networks. The main challenge investigated has been the phenomenon of catastrophic forgetting or interference of newly acquired knowledge with knowledge from previous tasks. Recent work has investigated forward knowledge transfer to new tasks. Backward transfer for improving knowledge gained during previous tasks has received much less attention. There is in general limited understanding of how knowledge transfer could aid tasks learned continually. We present a theory for knowledge transfer in continual supervised learning, which considers both forward and backward transfer. We aim at understanding their impact for increasingly knowledgeable learners. We derive error bounds for each of these transfer mechanisms. These bounds are agnostic to specific implementations (e.g. deep neural networks). We demonstrate that, for a continual learner that observes related tasks, both forward and backward transfer can contribute to an increasing performance as more tasks are observed.
    Memory-Driven Text-to-Image Generation. (arXiv:2208.07022v1 [cs.CV])
    We introduce a memory-driven semi-parametric approach to text-to-image generation, which is based on both parametric and non-parametric techniques. The non-parametric component is a memory bank of image features constructed from a training set of images. The parametric component is a generative adversarial network. Given a new text description at inference time, the memory bank is used to selectively retrieve image features that are provided as basic information of target images, which enables the generator to produce realistic synthetic results. We also incorporate the content information into the discriminator, together with semantic features, allowing the discriminator to make a more reliable prediction. Experimental results demonstrate that the proposed memory-driven semi-parametric approach produces more realistic images than purely parametric approaches, in terms of both visual fidelity and text-image semantic consistency.
    Opinion Market Model: Stemming Far-Right Opinion Spread using Positive Interventions. (arXiv:2208.06620v1 [cs.SI])
    Recent years have seen the rise of extremist views in the opinion ecosystem we call social media. Allowing online extremism to persist has dire societal consequences, and efforts to mitigate it are continuously explored. Positive interventions, controlled signals that add attention to the opinion ecosystem with the aim of boosting certain opinions, are one such pathway for mitigation. This work proposes a platform to test the effectiveness of positive interventions, through the Opinion Market Model (OMM), a two-tier model of the online opinion ecosystem jointly accounting for both inter-opinion interactions and the role of positive interventions. The first tier models the size of the opinion attention market using the multivariate discrete-time Hawkes process; the second tier leverages the market share attraction model to model opinions cooperating and competing for market share given limited attention. On a synthetic dataset, we show the convergence of our proposed estimation scheme. On a dataset of Facebook and Twitter discussions containing moderate and far-right opinions about bushfires and climate change, we show superior predictive performance over the state-of-the-art and the ability to uncover latent opinion interactions. Lastly, we use OMM to demonstrate the effectiveness of mainstream media coverage as a positive intervention in suppressing far-right opinions.
    Fundamental limitations on optimization in variational quantum algorithms. (arXiv:2205.05056v2 [quant-ph] UPDATED)
    Exploring quantum applications of near-term quantum devices is a rapidly growing field of quantum information science with both theoretical and practical interests. A leading paradigm to establish such near-term quantum applications is variational quantum algorithms (VQAs). These algorithms use a classical optimizer to train a parameterized quantum circuit to accomplish certain tasks, where the circuits are usually randomly initialized. In this work, we prove that for a broad class of such random circuits, the variation range of the cost function via adjusting any local quantum gate within the circuit vanishes exponentially in the number of qubits with a high probability. This result can unify the restrictions on gradient-based and gradient-free optimizations in a natural manner and reveal extra harsh constraints on the training landscapes of VQAs. Hence a fundamental limitation on the trainability of VQAs is unraveled, indicating the essential mechanism of the optimization hardness in the Hilbert space with exponential dimension. We further showcase the validity of our results with numerical simulations of representative VQAs. We believe that these results would deepen our understanding of the scalability of VQAs and shed light on the search for near-term quantum applications with advantages.
    Towards out of distribution generalization for problems in mechanics. (arXiv:2206.14917v2 [stat.ML] UPDATED)
    There has been a massive increase in research interest towards applying data driven methods to problems in mechanics. While traditional machine learning (ML) methods have enabled many breakthroughs, they rely on the assumption that the training (observed) data and testing (unseen) data are independent and identically distributed (i.i.d). Thus, traditional ML approaches often break down when applied to real world mechanics problems with unknown test environments and data distribution shifts. In contrast, out-of-distribution (OOD) generalization assumes that the test data may shift (i.e., violate the i.i.d. assumption). To date, multiple methods have been proposed to improve the OOD generalization of ML methods. However, because of the lack of benchmark datasets for OOD regression problems, the efficiency of these OOD methods on regression problems, which dominate the mechanics field, remains unknown. To address this, we investigate the performance of OOD generalization methods for regression problems in mechanics. Specifically, we identify three OOD problems: covariate shift, mechanism shift, and sampling bias. For each problem, we create two benchmark examples that extend the Mechanical MNIST dataset collection, and we investigate the performance of popular OOD generalization methods on these mechanics-specific regression problems. Our numerical experiments show that in most cases, while the OOD generalization algorithms perform better compared to traditional ML methods on these OOD problems, there is a compelling need to develop more robust OOD generalization methods that are effective across multiple OOD scenarios. Overall, we expect that this study, as well as the associated open access benchmark datasets, will enable further development of OOD generalization methods for mechanics specific regression problems.
    Guided Evolutionary Neural Architecture Search With Efficient Performance Estimation. (arXiv:2208.06475v1 [cs.NE])
    Neural Architecture Search (NAS) methods have been successfully applied to image tasks with excellent results. However, NAS methods are often complex and tend to converge to local minima as soon as generated architectures seem to yield good results. This paper proposes GEA, a novel approach for guided NAS. GEA guides the evolution by exploring the search space by generating and evaluating several architectures in each generation at initialisation stage using a zero-proxy estimator, where only the highest-scoring architecture is trained and kept for the next generation. Subsequently, GEA continuously extracts knowledge about the search space without increased complexity by generating several off-springs from an existing architecture at each generation. More, GEA forces exploitation of the most performant architectures by descendant generation while simultaneously driving exploration through parent mutation and favouring younger architectures to the detriment of older ones. Experimental results demonstrate the effectiveness of the proposed method, and extensive ablation studies evaluate the importance of different parameters. Results show that GEA achieves state-of-the-art results on all data sets of NAS-Bench-101, NAS-Bench-201 and TransNAS-Bench-101 benchmarks.
    Predictive Data Calibration for Linear Correlation Significance Testing. (arXiv:2208.07081v1 [stat.ME])
    Inferring linear relationships lies at the heart of many empirical investigations. A measure of linear dependence should correctly evaluate the strength of the relationship as well as qualify whether it is meaningful for the population. Pearson's correlation coefficient (PCC), the \textit{de-facto} measure for bivariate relationships, is known to lack in both regards. The estimated strength $r$ maybe wrong due to limited sample size, and nonnormality of data. In the context of statistical significance testing, erroneous interpretation of a $p$-value as posterior probability leads to Type I errors -- a general issue with significance testing that extends to PCC. Such errors are exacerbated when testing multiple hypotheses simultaneously. To tackle these issues, we propose a machine-learning-based predictive data calibration method which essentially conditions the data samples on the expected linear relationship. Calculating PCC using calibrated data yields a calibrated $p$-value that can be interpreted as posterior probability together with a calibrated $r$ estimate, a desired outcome not provided by other methods. Furthermore, the ensuing independent interpretation of each test might eliminate the need for multiple testing correction. We provide empirical evidence favouring the proposed method using several simulations and application to real-world data.
    Deep Neural Networks with ReLU-Sine-Exponential Activations Break Curse of Dimensionality in Approximation on H\"older Class. (arXiv:2103.00542v6 [cs.LG] UPDATED)
    In this paper, we construct neural networks with ReLU, sine and $2^x$ as activation functions. For general continuous $f$ defined on $[0,1]^d$ with continuity modulus $\omega_f(\cdot)$, we construct ReLU-sine-$2^x$ networks that enjoy an approximation rate $\mathcal{O}(\omega_f(\sqrt{d})\cdot2^{-M}+\omega_{f}\left(\frac{\sqrt{d}}{N}\right))$, where $M,N\in \mathbb{N}^{+}$ denote the hyperparameters related to widths of the networks. As a consequence, we can construct ReLU-sine-$2^x$ network with the depth $5$ and width $\max\left\{\left\lceil2d^{3/2}\left(\frac{3\mu}{\epsilon}\right)^{1/{\alpha}}\right\rceil,2\left\lceil\log_2\frac{3\mu d^{\alpha/2}}{2\epsilon}\right\rceil+2\right\}$ that approximates $f\in \mathcal{H}_{\mu}^{\alpha}([0,1]^d)$ within a given tolerance $\epsilon >0$ measured in $L^p$ norm $p\in[1,\infty)$, where $\mathcal{H}_{\mu}^{\alpha}([0,1]^d)$ denotes the H\"older continuous function class defined on $[0,1]^d$ with order $\alpha \in (0,1]$ and constant $\mu > 0$. Therefore, the ReLU-sine-$2^x$ networks overcome the curse of dimensionality on $\mathcal{H}_{\mu}^{\alpha}([0,1]^d)$. In addition to its supper expressive power, functions implemented by ReLU-sine-$2^x$ networks are (generalized) differentiable, enabling us to apply SGD to train.
    WiFi Based Distance Estimation Using Supervised Machine Learning. (arXiv:2208.07190v1 [cs.LG])
    In recent years WiFi became the primary source of information to locate a person or device indoor. Collecting RSSI values as reference measurements with known positions, known as WiFi fingerprinting, is commonly used in various positioning methods and algorithms that appear in literature. However, measuring the spatial distance between given set of WiFi fingerprints is heavily affected by the selection of the signal distance function used to model signal space as geospatial distance. In this study, the authors proposed utilization of machine learning to improve the estimation of geospatial distance between fingerprints. This research examined data collected from 13 different open datasets to provide a broad representation aiming for general model that can be used in any indoor environment. The proposed novel approach extracted data features by examining a set of commonly used signal distance metrics via feature selection process that includes feature analysis and genetic algorithm. To demonstrate that the output of this research is venue independent, all models were tested on datasets previously excluded during the training and validation phase. Finally, various machine learning algorithms were compared using wide variety of evaluation metrics including ability to scale out the test bed to real world unsolicited datasets.
    RLang: A Declarative Language for Expression Prior Knowledge for Reinforcement Learning. (arXiv:2208.06448v1 [cs.AI])
    Communicating useful background knowledge to reinforcement learning (RL) agents is an important and effective method for accelerating learning. We introduce RLang, a domain-specific language (DSL) for communicating domain knowledge to an RL agent. Unlike other existing DSLs proposed by the RL community that ground to single elements of a decision-making formalism (e.g., the reward function or policy function), RLang can specify information about every element of a Markov decision process. We define precise syntax and grounding semantics for RLang, and provide a parser implementation that grounds RLang programs to an algorithm-agnostic partial world model and policy that can be exploited by an RL agent. We provide a series of example RLang programs, and demonstrate how different RL methods can exploit the resulting knowledge, including model-free and model-based tabular algorithms, hierarchical approaches, and deep RL algorithms (including both policy gradient and value-based methods).
    Cost-effective Framework for Gradual Domain Adaptation with Multifidelity. (arXiv:2202.04359v2 [stat.ML] UPDATED)
    In domain adaptation, when there is a large distance between the source and target domains, the prediction performance will degrade. Gradual domain adaptation is one of the solutions to such an issue, assuming that we have access to intermediate domains, which shift gradually from the source to the target domain. In previous works, it was assumed that the number of samples in the intermediate domains was sufficiently large; hence, self-training was possible without the need for labeled data. If the number of accessible intermediate domains is restricted, the distances between domains become large, and self-training will fail. Practically, the cost of samples in intermediate domains will vary, and it is natural to consider that the closer an intermediate domain is to the target domain, the higher the cost of obtaining samples from the intermediate domain is. To solve the trade-off between cost and accuracy, we propose a framework that combines multifidelity and active domain adaptation. The effectiveness of the proposed method is evaluated by experiments with real-world datasets.
    A Deep Reinforcement Learning Approach to Supply Chain Inventory Management. (arXiv:2204.09603v2 [cs.LG] UPDATED)
    This paper leverages recent developments in reinforcement learning and deep learning to solve the supply chain inventory management (SCIM) problem, a complex sequential decision-making problem consisting of determining the optimal quantity of products to produce and ship to different warehouses over a given time horizon. A mathematical formulation of the stochastic two-echelon supply chain environment is given, which allows an arbitrary number of warehouses and product types to be managed. Additionally, an open-source library that interfaces with deep reinforcement learning (DRL) algorithms is developed and made publicly available for solving the SCIM problem. Performances achieved by state-of-the-art DRL algorithms are compared through a rich set of numerical experiments on synthetically generated data. The experimental plan is designed and performed, including different structures, topologies, demands, capacities, and costs of the supply chain. Results show that the PPO algorithm adapts very well to different characteristics of the environment. The VPG algorithm almost always converges to a local maximum, even if it typically achieves an acceptable performance level. Finally, A3C is the fastest algorithm, but just like VPG, it never achieves the best performance when compared to PPO. In conclusion, numerical experiments show that DRL performs consistently better than standard reorder policies, such as the static (s, Q)-policy. Thus, it can be considered a practical and effective option for solving real-world instances of the stochastic two-echelon SCIM problem.
    DuETA: Traffic Congestion Propagation Pattern Modeling via Efficient Graph Learning for ETA Prediction at Baidu Maps. (arXiv:2208.06979v1 [cs.LG])
    Estimated time of arrival (ETA) prediction, also known as travel time estimation, is a fundamental task for a wide range of intelligent transportation applications, such as navigation, route planning, and ride-hailing services. To accurately predict the travel time of a route, it is essential to take into account both contextual and predictive factors, such as spatial-temporal interaction, driving behavior, and traffic congestion propagation inference. The ETA prediction models previously deployed at Baidu Maps have addressed the factors of spatial-temporal interaction (ConSTGAT) and driving behavior (SSML). In this work, we focus on modeling traffic congestion propagation patterns to improve ETA performance. Traffic congestion propagation pattern modeling is challenging, and it requires accounting for impact regions over time and cumulative effect of delay variations over time caused by traffic events on the road network. In this paper, we present a practical industrial-grade ETA prediction framework named DuETA. Specifically, we construct a congestion-sensitive graph based on the correlations of traffic patterns, and we develop a route-aware graph transformer to directly learn the long-distance correlations of the road segments. This design enables DuETA to capture the interactions between the road segment pairs that are spatially distant but highly correlated with traffic conditions. Extensive experiments are conducted on large-scale, real-world datasets collected from Baidu Maps. Experimental results show that ETA prediction can significantly benefit from the learned traffic congestion propagation patterns. In addition, DuETA has already been deployed in production at Baidu Maps, serving billions of requests every day. This demonstrates that DuETA is an industrial-grade and robust solution for large-scale ETA prediction services.
    MM-GNN: Mix-Moment Graph Neural Network towards Modeling Neighborhood Feature Distribution. (arXiv:2208.07012v1 [cs.LG])
    Graph Neural Networks (GNNs) have shown expressive performance on graph representation learning by aggregating information from neighbors. Recently, some studies have discussed the importance of modeling neighborhood distribution on the graph. However, most existing GNNs aggregate neighbors' features through single statistic (e.g., mean, max, sum), which loses the information related to neighbor's feature distribution and therefore degrades the model performance. In this paper, inspired by the method of moment in statistical theory, we propose to model neighbor's feature distribution with multi-order moments. We design a novel GNN model, namely Mix-Moment Graph Neural Network (MM-GNN), which includes a Multi-order Moment Embedding (MME) module and an Element-wise Attention-based Moment Adaptor module. MM-GNN first calculates the multi-order moments of the neighbors for each node as signatures, and then use an Element-wise Attention-based Moment Adaptor to assign larger weights to important moments for each node and update node representations. We conduct extensive experiments on 15 real-world graphs (including social networks, citation networks and web-page networks etc.) to evaluate our model, and the results demonstrate the superiority of MM-GNN over existing state-of-the-art models.
    Learning Controllable 3D Level Generators. (arXiv:2206.13623v3 [cs.AI] UPDATED)
    Procedural Content Generation via Reinforcement Learning (PCGRL) foregoes the need for large human-authored data-sets and allows agents to train explicitly on functional constraints, using computable, user-defined measures of quality instead of target output. We explore the application of PCGRL to 3D domains, in which content-generation tasks naturally have greater complexity and potential pertinence to real-world applications. Here, we introduce several PCGRL tasks for the 3D domain, Minecraft (Mojang Studios, 2009). These tasks will challenge RL-based generators using affordances often found in 3D environments, such as jumping, multiple dimensional movement, and gravity. We train an agent to optimize each of these tasks to explore the capabilities of previous research in PCGRL. This agent is able to generate relatively complex and diverse levels, and generalize to random initial states and control targets. Controllability tests in the presented tasks demonstrate their utility to analyze success and failure for 3D generators.
    Overcoming Oversmoothness in Graph Convolutional Networks via Hybrid Scattering Networks. (arXiv:2201.08932v2 [stat.ML] UPDATED)
    Geometric deep learning has made great strides towards generalizing the design of structure-aware neural networks from traditional domains to non-Euclidean ones, giving rise to graph neural networks (GNN) that can be applied to graph-structured data arising in, e.g., social networks, biochemistry, and material science. Graph convolutional networks (GCNs) in particular, inspired by their Euclidean counterparts, have been successful in processing graph data by extracting structure-aware features. However, current GNN models are often constrained by various phenomena that limit their expressive power and ability to generalize to more complex graph datasets. Most models essentially rely on low-pass filtering of graph signals via local averaging operations, leading to oversmoothing. Moreover, to avoid severe oversmoothing, most popular GCN-style networks tend to be shallow, with narrow receptive fields, leading to underreaching. Here, we propose a hybrid GNN framework that combines traditional GCN filters with band-pass filters defined via geometric scattering. We further introduce an attention framework that allows the model to locally attend over combined information from different filters at the node level. Our theoretical results establish the complementary benefits of the scattering filters to leverage structural information from the graph, while our experiments show the benefits of our method on various learning tasks.
    Evaluating Dense Passage Retrieval using Transformers. (arXiv:2208.06959v1 [cs.IR])
    Although representational retrieval models based on Transformers have been able to make major advances in the past few years, and despite the widely accepted conventions and best-practices for testing such models, a $\textit{standardized}$ evaluation framework for testing them has not been developed. In this work, we formalize the best practices and conventions followed by researchers in the literature, paving the path for more standardized evaluations - and therefore more fair comparisons between the models. Our framework (1) embeds the documents and queries; (2) for each query-document pair, computes the relevance score based on the dot product of the document and query embedding; (3) uses the $\texttt{dev}$ set of the MSMARCO dataset to evaluate the models; (4) uses the $\texttt{trec_eval}$ script to calculate MRR@100, which is the primary metric used to evaluate the models. Most importantly, we showcase the use of this framework by experimenting on some of the most well-known dense retrieval models.
    Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models. (arXiv:2208.06677v1 [cs.LG])
    Adaptive gradient algorithms borrow the moving average idea of heavy ball acceleration to estimate accurate first- and second-order moments of gradient for accelerating convergence. However, Nesterov acceleration which converges faster than heavy ball acceleration in theory and also in many empirical cases is much less investigated under the adaptive gradient setting. In this work, we propose the ADAptive Nesterov momentum algorithm, Adan for short, to effectively speedup the training of deep neural networks. Adan first reformulates the vanilla Nesterov acceleration to develop a new Nesterov momentum estimation (NME) method, which avoids the extra computation and memory overhead of computing gradient at the extrapolation point. Then Adan adopts NME to estimate the first- and second-order moments of the gradient in adaptive gradient algorithms for convergence acceleration. Besides, we prove that Adan finds an $\epsilon$-approximate first-order stationary point within $O(\epsilon^{-3.5})$ stochastic gradient complexity on the nonconvex stochastic problems (e.g., deep learning problems), matching the best-known lower bound. Extensive experimental results show that Adan surpasses the corresponding SoTA optimizers on both vision transformers (ViTs) and CNNs, and sets new SoTAs for many popular networks, e.g., ResNet, ConvNext, ViT, Swin, MAE, LSTM, Transformer-XL, and BERT. More surprisingly, Adan can use half of the training cost (epochs) of SoTA optimizers to achieve higher or comparable performance on ViT and ResNet, e.t.c., and also shows great tolerance to a large range of minibatch size, e.g., from 1k to 32k. We hope Adan can contribute to the development of deep learning by reducing training cost and relieving engineering burden of trying different optimizers on various architectures. Code will be released at https://github.com/sail-sg/Adan.
    Cloud-Based Real-Time Molecular Screening Platform with MolFormer. (arXiv:2208.06665v1 [cs.LG])
    With the prospect of automating a number of chemical tasks with high fidelity, chemical language processing models are emerging at a rapid speed. Here, we present a cloud-based real-time platform that allows users to virtually screen molecules of interest. For this purpose, molecular embeddings inferred from a recently proposed large chemical language model, named MolFormer, are leveraged. The platform currently supports three tasks: nearest neighbor retrieval, chemical space visualization, and property prediction. Based on the functionalities of this platform and results obtained, we believe that such a platform can play a pivotal role in automating chemistry and chemical engineering research, as well as assist in drug discovery and material design tasks. A demo of our platform is provided at \url{www.ibm.biz/molecular_demo}.
    Applying Regularized Schr\"odinger-Bridge-Based Stochastic Process in Generative Modeling. (arXiv:2208.07131v1 [cs.LG])
    Compared to the existing function-based models in deep generative modeling, the recently proposed diffusion models have achieved outstanding performance with a stochastic-process-based approach. But a long sampling time is required for this approach due to many timesteps for discretization. Schr\"odinger bridge (SB)-based models attempt to tackle this problem by training bidirectional stochastic processes between distributions. However, they still have a slow sampling speed compared to generative models such as generative adversarial networks. And due to the training of the bidirectional stochastic processes, they require a relatively long training time. Therefore, this study tried to reduce the number of timesteps and training time required and proposed regularization terms to the existing SB models to make the bidirectional stochastic processes consistent and stable with a reduced number of timesteps. Each regularization term was integrated into a single term to enable more efficient training in computation time and memory usage. Applying this regularized stochastic process to various generation tasks, the desired translations between different distributions were obtained, and accordingly, the possibility of generative modeling based on a stochastic process with faster sampling speed could be confirmed. The code is available at https://github.com/KiUngSong/RSB.
    Grasping Core Rules of Time Series through Pure Models. (arXiv:2208.07105v1 [cs.LG])
    Time series underwent the transition from statistics to deep learning, as did many other machine learning fields. Although it appears that the accuracy has been increasing as the model is updated in a number of publicly available datasets, it typically only increases the scale by several times in exchange for a slight difference in accuracy. Through this experiment, we point out a different line of thinking, time series, especially long-term forecasting, may differ from other fields. It is not necessary to use extensive and complex models to grasp all aspects of time series, but to use pure models to grasp the core rules of time series changes. With this simple but effective idea, we created PureTS, a network with three pure linear layers that achieved state-of-the-art in 80% of the long sequence prediction tasks while being nearly the lightest model and having the fastest running speed. On this basis, we discuss the potential of pure linear layers in both phenomena and essence. The ability to understand the core law contributes to the high precision of long-distance prediction, and reasonable fluctuation prevents it from distorting the curve in multi-step prediction like mainstream deep learning models, which is summarized as a pure linear neural network that avoids over-fluctuating. Finally, we suggest the fundamental design standards for lightweight long-step time series tasks: input and output should try to have the same dimension, and the structure avoids fragmentation and complex operations.
    Double Auctions with Two-sided Bandit Feedback. (arXiv:2208.06536v1 [cs.LG])
    Double Auction enables decentralized transfer of goods between multiple buyers and sellers, thus underpinning functioning of many online marketplaces. Buyers and sellers compete in these markets through bidding, but do not often know their own valuation a-priori. As the allocation and pricing happens through bids, the profitability of participants, hence sustainability of such markets, depends crucially on learning respective valuations through repeated interactions. We initiate the study of Double Auction markets under bandit feedback on both buyers' and sellers' side. We show with confidence bound based bidding, and `Average Pricing' there is an efficient price discovery among the participants. In particular, the buyers and sellers exchanging goods attain $O(\sqrt{T})$ regret in $T$ rounds. The buyers and sellers who do not benefit from exchange in turn only experience $O(\log{T}/ \Delta)$ regret in $T$ rounds where $\Delta$ is the minimum price gap. We augment our upper bound by showing that even with a known fixed price of the good -- a simpler learning problem than Double Auction -- $\omega(\sqrt{T})$ regret is unattainable in certain markets.
    Teacher Guided Training: An Efficient Framework for Knowledge Transfer. (arXiv:2208.06825v1 [cs.LG])
    The remarkable performance gains realized by large pretrained models, e.g., GPT-3, hinge on the massive amounts of data they are exposed to during training. Analogously, distilling such large models to compact models for efficient deployment also necessitates a large amount of (labeled or unlabeled) training data. In this paper, we propose the teacher-guided training (TGT) framework for training a high-quality compact model that leverages the knowledge acquired by pretrained generative models, while obviating the need to go through a large volume of data. TGT exploits the fact that the teacher has acquired a good representation of the underlying data domain, which typically corresponds to a much lower dimensional manifold than the input space. Furthermore, we can use the teacher to explore input space more efficiently through sampling or gradient-based methods; thus, making TGT especially attractive for limited data or long-tail settings. We formally capture this benefit of proposed data-domain exploration in our generalization bounds. We find that TGT can improve accuracy on several image classification benchmarks as well as a range of text classification and retrieval tasks.
    Overcoming the Long Horizon Barrier for Sample-Efficient Reinforcement Learning with Latent Low-Rank Structure. (arXiv:2206.03569v2 [cs.LG] UPDATED)
    The practicality of reinforcement learning algorithms has been limited due to poor scaling with respect to the problem size, as the sample complexity of learning an $\epsilon$-optimal policy is $\Tilde{\Omega}\left(|S||A|H^3 / \eps^2\right)$ over worst case instances of an MDP with state space $S$, action space $A$, and horizon $H$. We consider a class of MDPs that exhibit low rank structure, where the latent features are unknown. We argue that a natural combination of value iteration and low-rank matrix estimation results in an estimation error that grows doubly exponentially in the horizon $H$. We then provide a new algorithm along with statistical guarantees that efficiently exploits low rank structure given access to a generative model, achieving a sample complexity of $\Tilde{O}\left(d^5(|S|+|A|)\mathrm{poly}(H)/\eps^2\right)$ for a rank $d$ setting, which is minimax optimal with respect to the scaling of $|S|, |A|$, and $\eps$. In contrast to literature on linear and low-rank MDPs, we do not require a known feature mapping, our algorithm is computationally simple, and our results hold for long time horizons. Our results provide insights on the minimal low-rank structural assumptions required on the MDP with respect to the transition kernel versus the optimal action-value function.
    Robust Contrastive Active Learning with Feature-guided Query Strategies. (arXiv:2109.06873v2 [cs.LG] UPDATED)
    We introduce supervised contrastive active learning (SCAL) and propose efficient query strategies in active learning based on the feature similarity (featuresim) and principal component analysis based feature-reconstruction error (fre) to select informative data samples with diverse feature representations. We demonstrate our proposed method achieves state-of-the-art accuracy, model calibration and reduces sampling bias in an active learning setup for balanced and imbalanced datasets on image classification tasks. We also evaluate robustness of model to distributional shift derived from different query strategies in active learning setting. Using extensive experiments, we show that our proposed approach outperforms high performing compute-intensive methods by a big margin resulting in 9.9% lower mean corruption error, 7.2% lower expected calibration error under dataset shift and 8.9% higher AUROC for out-of-distribution detection.
    Expert Aggregation for Financial Forecasting. (arXiv:2111.15365v3 [q-fin.ST] UPDATED)
    Machine learning algorithms dedicated to financial time series forecasting have gained a lot of interest over the last few years. One difficulty lies in the choice between several algorithms, as their estimation accuracy may be unstable over time. Aggregation combines a finite set of forecasting models, called experts, without making assumptions about the models and dynamically adapts to market conditions. We apply expert aggregation to the construction of long-short strategies, built from the individual stock return forecasts. The online mixture outperforms individual algorithms in terms of both portfolio performance and stability. Extensions to both expert and aggregation specializations are also proposed and improve the overall mixture on portfolio evaluation metrics.
    Lifelong Neural Predictive Coding: Learning Cumulatively Online without Forgetting. (arXiv:1905.10696v4 [cs.LG] UPDATED)
    In lifelong learning systems based on artificial neural networks, one of the biggest obstacles is the inability to retain old knowledge as new information is encountered. This phenomenon is known as catastrophic forgetting. In this paper, we propose a new kind of connectionist architecture, the Sequential Neural Coding Network, that is robust to forgetting when learning from streams of data points and, unlike networks of today, does not learn via the popular back-propagation of errors. Grounded in the neurocognitive theory of predictive processing, our model adapts synapses in a biologically-plausible fashion while another neural system learns to direct and control this cortex-like structure, mimicking some of the task-executive control functionality of the basal ganglia. In our experiments, we demonstrate that our self-organizing system experiences significantly less forgetting compared to standard neural models, outperforming a swath of previously proposed methods, including rehearsal/data buffer-based methods, on both standard (SplitMNIST, Split Fashion MNIST, etc.) and custom benchmarks even though it is trained in a stream-like fashion. Our work offers evidence that emulating mechanisms in real neuronal systems, e.g., local learning, lateral competition, can yield new directions and possibilities for tackling the grand challenge of lifelong machine learning.
    Towards Interpretable Sleep Stage Classification Using Cross-Modal Transformers. (arXiv:2208.06991v1 [cs.LG])
    Accurate sleep stage classification is significant for sleep health assessment. In recent years, several deep learning and machine learning based sleep staging algorithms have been developed and they have achieved performance on par with human annotation. Despite improved performance, a limitation of most deep-learning based algorithms is their Black-box behavior, which which have limited their use in clinical settings. Here, we propose Cross-Modal Transformers, which is a transformer-based method for sleep stage classification. Our models achieve both competitive performance with the state-of-the-art approaches and eliminates the Black-box behavior of deep-learning models by utilizing the interpretability aspect of the attention modules. The proposed cross-modal transformers consist of a novel cross-modal transformer encoder architecture along with a multi-scale 1-dimensional convolutional neural network for automatic representation learning. Our sleep stage classifier based on this design was able to achieve sleep stage classification performance on par with or better than the state-of-the-art approaches, along with interpretability, a fourfold reduction in the number of parameters and a reduced training time compared to the current state-of-the-art. Our code is available at https://github.com/Jathurshan0330/Cross-Modal-Transformer.
    Near Real-Time Social Distance Estimation in London. (arXiv:2012.07751v4 [cs.CY] UPDATED)
    During the COVID-19 pandemic, policy makers at the Greater London Authority, the regional governance body of London, UK, are reliant upon prompt and accurate data sources. Large well-defined heterogeneous compositions of activity throughout the city are sometimes difficult to acquire, yet are a necessity in order to learn 'busyness' and consequently make safe policy decisions. One component of our project within this space is to utilise existing infrastructure to estimate social distancing adherence by the general public. Our method enables near immediate sampling and contextualisation of activity and physical distancing on the streets of London via live traffic camera feeds. We introduce a framework for inspecting and improving upon existing methods, whilst also describing its active deployment on over 900 real-time feeds.
    A Unified Causal View of Domain Invariant Representation Learning. (arXiv:2208.06987v1 [stat.ML])
    Machine learning methods can be unreliable when deployed in domains that differ from the domains on which they were trained. To address this, we may wish to learn representations of data that are domain-invariant in the sense that we preserve data structure that is stable across domains, but throw out spuriously-varying parts. There are many representation-learning approaches of this type, including methods based on data augmentation, distributional invariances, and risk invariance. Unfortunately, when faced with any particular real-world domain shift, it is unclear which, if any, of these methods might be expected to work. The purpose of this paper is to show how the different methods relate to each other, and clarify the real-world circumstances under which each is expected to succeed. The key tool is a new notion of domain shift relying on the idea that causal relationships are invariant, but non-causal relationships (e.g., due to confounding) may vary.
    Generalization Bounds for Gradient Methods via Discrete and Continuous Prior. (arXiv:2205.13799v3 [cs.LG] UPDATED)
    Proving algorithm-dependent generalization error bounds for gradient-type optimization methods has attracted significant attention recently in learning theory. However, most existing trajectory-based analyses require either restrictive assumptions on the learning rate (e.g., fast decreasing learning rate), or continuous injected noise (such as the Gaussian noise in Langevin dynamics). In this paper, we introduce a new discrete data-dependent prior to the PAC-Bayesian framework, and prove a high probability generalization bound of order $O(\frac{1}{n}\cdot \sum_{t=1}^T(\gamma_t/\varepsilon_t)^2\left\|{\mathbf{g}_t}\right\|^2)$ for Floored GD (i.e. a version of gradient descent with precision level $\varepsilon_t$), where $n$ is the number of training samples, $\gamma_t$ is the learning rate at step $t$, $\mathbf{g}_t$ is roughly the difference of the gradient computed using all samples and that using only prior samples. $\left\|{\mathbf{g}_t}\right\|$ is upper bounded by and and typical much smaller than the gradient norm $\left\|{\nabla f(W_t)}\right\|$. We remark that our bound holds for nonconvex and nonsmooth scenarios. Moreover, our theoretical results provide numerically favorable upper bounds of testing errors (e.g., $0.037$ on MNIST). Using a similar technique, we can also obtain new generalization bounds for certain variants of SGD. Furthermore, we study the generalization bounds for gradient Langevin Dynamics (GLD). Using the same framework with a carefully constructed continuous prior, we show a new high probability generalization bound of order $O(\frac{1}{n} + \frac{L^2}{n^2}\sum_{t=1}^T(\gamma_t/\sigma_t)^2)$ for GLD. The new $1/n^2$ rate is due to the concentration of the difference between the gradient of training samples and that of the prior.
    Revocable Deep Reinforcement Learning with Affinity Regularization for Outlier-Robust Graph Matching. (arXiv:2012.08950v4 [cs.CV] UPDATED)
    Graph matching (GM) has been a building block in many areas including computer vision and pattern recognition. Despite the recent impressive progress, existing deep GM methods often have difficulty in handling outliers in both graphs, which are ubiquitous in practice. We propose a deep reinforcement learning (RL) based approach RGM for weighted graph matching, whose sequential node matching scheme naturally fits with the strategy for selective inlier matching against outliers. A revocable action scheme is devised to improve the agent's flexibility against the complex constrained matching task. Moreover, we propose a quadratic approximation technique to regularize the affinity matrix, in the presence of outliers. As such, the RL agent can finish inlier matching timely when the objective score stops growing, for which otherwise an additional hyperparameter i.e. the number of common inliers is needed to avoid matching outliers. In this paper, we focus on learning the back-end solver for the most general form of GM: the Lawler's QAP, whose input is the affinity matrix. Our approach can also boost other solvers using the affinity input. Experimental results on both synthetic and real-world datasets showcase its superior performance regarding both matching accuracy and robustness.
    Virgo: Scalable Unsupervised Classification of Cosmological Shock Waves. (arXiv:2208.06859v1 [astro-ph.IM])
    Cosmological shock waves are essential to understanding the formation of cosmological structures. To study them, scientists run computationally expensive high-resolution 3D hydrodynamic simulations. Interpreting the simulation results is challenging because the resulting data sets are enormous, and the shock wave surfaces are hard to separate and classify due to their complex morphologies and multiple shock fronts intersecting. We introduce a novel pipeline, Virgo, combining physical motivation, scalability, and probabilistic robustness to tackle this unsolved unsupervised classification problem. To this end, we employ kernel principal component analysis with low-rank matrix approximations to denoise data sets of shocked particles and create labeled subsets. We perform supervised classification to recover full data resolution with stochastic variational deep kernel learning. We evaluate on three state-of-the-art data sets with varying complexity and achieve good results. The proposed pipeline runs automatically, has only a few hyperparameters, and performs well on all tested data sets. Our results are promising for large-scale applications, and we highlight now enabled future scientific work.
    Compositional Clustering for Multi-Label Few-Shot Learning. (arXiv:2109.04160v3 [cs.LG] UPDATED)
    We consider a new kind of clustering problem in which clusters need not be independent of each other, but rather can have compositional relationships with other clusters (e.g., a dataset contains images of rectangles, images of circles, and images of both). This task is motivated by recent work on compositional few-shot learning and embedding models that are optimized to distinguish the label sets, not just the individual labels, assigned to the examples. To tackle this clustering problem, we propose three new algorithms: Compositional Affinity Propagation (CAP), Compositional k-means (CKM), and Greedy Compositional Reassignment (GCR). These new methods can both partition examples into coherent groups and infer the compositional structure among the groups automatically. We show promising results, compared to popular algorithms such as Gaussian mixtures, Fuzzy c-means, and Agglomerative Clustering, on the OmniGlot and LibriSpeech datasets that are widely used in few-shot learning research. Our work has applications to open-world multi-object image recognition and speaker diarization with simultaneous speech from multiple speakers.
    Explainable Artificial Intelligence for Assault Sentence Prediction in New Zealand. (arXiv:2208.06981v1 [cs.LG])
    The judiciary has historically been conservative in its use of Artificial Intelligence, but recent advances in machine learning have prompted scholars to reconsider such use in tasks like sentence prediction. This paper investigates by experimentation the potential use of explainable artificial intelligence for predicting imprisonment sentences in assault cases in New Zealand's courts. We propose a proof-of-concept explainable model and verify in practice that it is fit for purpose, with predicted sentences accurate to within one year. We further analyse the model to understand the most influential phrases in sentence length prediction. We conclude the paper with an evaluative discussion of the future benefits and risks of different ways of using such an AI model in New Zealand's courts.
    Distributed Robust Principal Component Analysis. (arXiv:2207.11669v2 [cs.DC] UPDATED)
    We study the robust principal component analysis (RPCA) problem in a distributed setting. The goal of RPCA is to find an underlying low-rank estimation for a raw data matrix when the data matrix is subject to the corruption of gross sparse errors. Previous studies have developed RPCA algorithms that provide stable solutions with fast convergence. However, these algorithms are typically hard to scale and cannot be implemented distributedly, due to the use of either SVD or large matrix multiplication. In this paper, we propose the first distributed robust principal analysis algorithm based on consensus factorization, dubbed DCF-PCA. We prove the convergence of DCF-PCA and evaluate DCF-PCA on various problem setting
    Succinct Differentiation of Disparate Boosting Ensemble Learning Methods for Prognostication of Polycystic Ovary Syndrome Diagnosis. (arXiv:2201.00418v2 [cs.LG] UPDATED)
    Prognostication of medical problems using the clinical data by leveraging the Machine Learning techniques with stellar precision is one of the most important real world challenges at the present time. Considering the medical problem of Polycystic Ovary Syndrome also known as PCOS is an emerging problem in women aged from 15 to 49. Diagnosing this disorder by using various Boosting Ensemble Methods is something we have presented in this paper. A detailed and compendious differentiation between Adaptive Boost, Gradient Boosting Machine, XGBoost and CatBoost with their respective performance metrics highlighting the hidden anomalies in the data and its effects on the result is something we have presented in this paper. Metrics like Confusion Matrix, Precision, Recall, F1 Score, FPR, RoC Curve and AUC have been used in this paper.
    Riemannian accelerated gradient methods via extrapolation. (arXiv:2208.06619v1 [math.OC])
    In this paper, we propose a simple acceleration scheme for Riemannian gradient methods by extrapolating iterates on manifolds. We show when the iterates are generated from Riemannian gradient descent method, the accelerated scheme achieves the optimal convergence rate asymptotically and is computationally more favorable than the recently proposed Riemannian Nesterov accelerated gradient methods. Our experiments verify the practical benefit of the novel acceleration strategy.
    A Hybrid Approach on Conditional GAN for Portfolio Analysis. (arXiv:2208.07159v1 [q-fin.PM])
    Over the decades, the Markowitz framework has been used extensively in portfolio analysis though it puts too much emphasis on the analysis of the market uncertainty rather than on the trend prediction. While generative adversarial network (GAN), conditional GAN (CGAN), and autoencoding CGAN (ACGAN) have been explored to generate financial time series and extract features that can help portfolio analysis. The limitation of the CGAN or ACGAN framework stands in putting too much emphasis on generating series and finding the internal trends of the series rather than predicting the future trends. In this paper, we introduce a hybrid approach on conditional GAN based on deep generative models that learns the internal trend of historical data while modeling market uncertainty and future trends. We evaluate the model on several real-world datasets from both the US and Europe markets, and show that the proposed HybridCGAN and HybridACGAN models lead to better portfolio allocation compared to the existing Markowitz, CGAN, and ACGAN approaches.
    Deep Reinforcement Learning Approach for Trading Automation in The Stock Market. (arXiv:2208.07165v1 [q-fin.TR])
    Deep Reinforcement Learning (DRL) algorithms can scale to previously intractable problems. The automation of profit generation in the stock market is possible using DRL, by combining the financial assets price "prediction" step and the "allocation" step of the portfolio in one unified process to produce fully autonomous systems capable of interacting with their environment to make optimal decisions through trial and error. This work represents a DRL model to generate profitable trades in the stock market, effectively overcoming the limitations of supervised learning approaches. We formulate the trading problem as a Partially Observed Markov Decision Process (POMDP) model, considering the constraints imposed by the stock market, such as liquidity and transaction costs. We then solve the formulated POMDP problem using the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm reporting a 2.68 Sharpe Ratio on unseen data set (test data). From the point of view of stock market forecasting and the intelligent decision-making mechanism, this paper demonstrates the superiority of DRL in financial markets over other types of machine learning and proves its credibility and advantages of strategic decision-making.
    Rethinking Graph Neural Networks for the Graph Coloring Problem. (arXiv:2208.06975v1 [cs.LG])
    Graph coloring, a classical and critical NP-hard problem, is the problem of assigning connected nodes as different colors as possible. However, we observe that state-of-the-art GNNs are less successful in the graph coloring problem. We analyze the reasons from two perspectives. First, most GNNs fail to generalize the task under homophily to heterophily, i.e., graphs where connected nodes are assigned different colors. Second, GNNs are bounded by the network depth, making them possible to be a local method, which has been demonstrated to be non-optimal in Maximum Independent Set (MIS) problem. In this paper, we focus on the aggregation-combine GNNs (AC-GNNs), a popular class of GNNs. We first define the power of AC-GNNs in the coloring problem as the capability to assign nodes different colors. The definition is different with previous one that is based on the assumption of homophily. We identify node pairs that AC-GNNs fail to discriminate. Furthermore, we show that any AC-GNN is a local coloring method, and any local coloring method is non-optimal by exploring the limits of local methods over sparse random graphs, thereby demonstrating the non-optimality of AC-GNNs due to its local property. We then prove the positive correlation between model depth and its coloring power. Moreover, we discuss the color equivariance of graphs to tackle some practical constraints such as the pre-fixing constraints. Following the discussions above, we summarize a series of rules a series of rules that make a GNN color equivariant and powerful in the coloring problem. Then, we propose a simple AC-GNN variation satisfying these rules. We empirically validate our theoretical findings and demonstrate that our simple model substantially outperforms state-of-the-art heuristic algorithms in both quality and runtime.
    Acceleration of Subspace Learning Machine via Particle Swarm Optimization and Parallel Processing. (arXiv:2208.07023v1 [cs.LG])
    Built upon the decision tree (DT) classification and regression idea, the subspace learning machine (SLM) has been recently proposed to offer higher performance in general classification and regression tasks. Its performance improvement is reached at the expense of higher computational complexity. In this work, we investigate two ways to accelerate SLM. First, we adopt the particle swarm optimization (PSO) algorithm to speed up the search of a discriminant dimension that is expressed as a linear combination of current dimensions. The search of optimal weights in the linear combination is computationally heavy. It is accomplished by probabilistic search in original SLM. The acceleration of SLM by PSO requires 10-20 times fewer iterations. Second, we leverage parallel processing in the SLM implementation. Experimental results show that the accelerated SLM method achieves a speed up factor of 577 in training time while maintaining comparable classification/regression performance of original SLM.
    Efficient Adaptive Regret Minimization. (arXiv:2207.00646v3 [cs.LG] UPDATED)
    In online convex optimization the player aims to minimize her regret against a fixed comparator over the entire repeated game. Algorithms that minimize standard regret may converge to a fixed decision, which is undesireable in changing or dynamic environments. This motivates the stronger metric of adaptive regret, or the maximum regret over any continuous sub-interval in time. Existing adaptive regret algorithms suffer from a computational penalty - typically on the order of a multiplicative factor that grows logarithmically in the number of game iterations. In this paper we show how to reduce this computational penalty to be doubly logarithmic in the number of game iterations, and with minimal degradation to the optimal attainable adaptive regret bounds.
    HyP$^2$ Loss: Beyond Hypersphere Metric Space for Multi-label Image Retrieval. (arXiv:2208.06866v1 [cs.CV])
    Image retrieval has become an increasingly appealing technique with broad multimedia application prospects, where deep hashing serves as the dominant branch towards low storage and efficient retrieval. In this paper, we carried out in-depth investigations on metric learning in deep hashing for establishing a powerful metric space in multi-label scenarios, where the pair loss suffers high computational overhead and converge difficulty, while the proxy loss is theoretically incapable of expressing the profound label dependencies and exhibits conflicts in the constructed hypersphere space. To address the problems, we propose a novel metric learning framework with Hybrid Proxy-Pair Loss (HyP$^2$ Loss) that constructs an expressive metric space with efficient training complexity w.r.t. the whole dataset. The proposed HyP$^2$ Loss focuses on optimizing the hypersphere space by learnable proxies and excavating data-to-data correlations of irrelevant pairs, which integrates sufficient data correspondence of pair-based methods and high-efficiency of proxy-based methods. Extensive experiments on four standard multi-label benchmarks justify the proposed method outperforms the state-of-the-art, is robust among different hash bits and achieves significant performance gains with a faster, more stable convergence speed. Our code is available at https://github.com/JerryXu0129/HyP2-Loss.
    An Edge-Cloud Integrated Framework for Flexible and Dynamic Stream Analytics. (arXiv:2205.04622v3 [cs.DC] UPDATED)
    With the popularity of Internet of Things (IoT), edge computing and cloud computing, more and more stream analytics applications are being developed including real-time trend prediction and object detection on top of IoT sensing data. One popular type of stream analytics is the recurrent neural network (RNN) deep learning model based time series or sequence data prediction and forecasting. Different from traditional analytics that assumes data are available ahead of time and will not change, stream analytics deals with data that are being generated continuously and data trend/distribution could change (a.k.a. concept drift), which will cause prediction/forecasting accuracy to drop over time. One other challenge is to find the best resource provisioning for stream analytics to achieve good overall latency. In this paper, we study how to best leverage edge and cloud resources to achieve better accuracy and latency for stream analytics using a type of RNN model called long short-term memory (LSTM). We propose a novel edge-cloud integrated framework for hybrid stream analytics that supports low latency inference on the edge and high capacity training on the cloud. To achieve flexible deployment, we study different approaches of deploying our hybrid learning framework including edge-centric, cloud-centric and edge-cloud integrated. Further, our hybrid learning framework can dynamically combine inference results from an LSTM model pre-trained based on historical data and another LSTM model re-trained periodically based on the most recent data. Using real-world and simulated stream datasets, our experiments show the proposed edge-cloud deployment is the best among all three deployment types in terms of latency. For accuracy, the experiments show our dynamic learning approach performs the best among all learning approaches for all three concept drift scenarios.
    Conformalized Online Learning: Online Calibration Without a Holdout Set. (arXiv:2205.09095v3 [cs.LG] UPDATED)
    We develop a framework for constructing uncertainty sets with a valid coverage guarantee in an online setting, in which the underlying data distribution can drastically -- and even adversarially -- shift over time. The technique we propose is highly flexible as it can be integrated with any online learning algorithm, requiring minimal implementation effort and computational cost. A key advantage of our method over existing alternatives -- which also build on conformal inference -- is that we do not need to split the data into training and holdout calibration sets. This allows us to fit the predictive model in a fully online manner, utilizing the most recent observation for constructing calibrated uncertainty sets. Consequently, and in contrast with existing techniques, (i) the sets we build can quickly adapt to new changes in the distribution; and (ii) our procedure does not require refitting the model at each time step. Using synthetic and real-world benchmark data sets, we demonstrate the validity of our theory and the improved performance of our proposal over existing techniques. To demonstrate the greater flexibility of the proposed method, we show how to construct valid intervals for a multiple-output regression problem that previous sequential calibration methods cannot handle due to impractical computational and memory requirements.
    Active Learning with Label Comparisons. (arXiv:2204.04670v2 [cs.LG] UPDATED)
    Supervised learning typically relies on manual annotation of the true labels. When there are many potential classes, searching for the best one can be prohibitive for a human annotator. On the other hand, comparing two candidate labels is often much easier. We focus on this type of pairwise supervision and ask how it can be used effectively in learning, and in particular in active learning. We obtain several insightful results in this context. In principle, finding the best of $k$ labels can be done with $k-1$ active queries. We show that there is a natural class where this approach is sub-optimal, and that there is a more comparison-efficient active learning scheme. A key element in our analysis is the "label neighborhood graph" of the true distribution, which has an edge between two classes if they share a decision boundary. We also show that in the PAC setting, pairwise comparisons cannot provide improved sample complexity in the worst case. We complement our theoretical results with experiments, clearly demonstrating the effect of the neighborhood graph on sample complexity.
    Covert Message Passing over Public Internet Platforms Using Model-Based Format-Transforming Encryption. (arXiv:2110.07009v2 [cs.CR] UPDATED)
    We introduce a new type of format-transforming encryption where the format of ciphertexts is implicitly encoded within a machine-learned generative model. Around this primitive, we build a system for covert messaging over large, public internet platforms (e.g., Twitter). Loosely, our system composes an authenticated encryption scheme, with a method for encoding random ciphertext bits into samples from the generative model's family of seed-indexed token-distributions. By fixing a deployment scenario, we are forced to consider system-level and algorithmic solutions to real challenges -- ~such as receiver-side parsing ambiguities, and the low information-carrying capacity of actual token-distributions~ -- that were elided in prior work. We use GPT-2 as our generative model so that our system cryptographically transforms plaintext bitstrings into natural-language covertexts suitable for posting to public platforms. We consider adversaries with full view of the internet platform's content, whose goal is to surface posts that are using our system for covert messaging. We carry out a suite of experiments to provide heuristic evidence of security and to explore tradeoffs between operational efficiency and detectability.
    Learning Linear Non-Gaussian Polytree Models. (arXiv:2208.06701v1 [stat.ML])
    In the context of graphical causal discovery, we adapt the versatile framework of linear non-Gaussian acyclic models (LiNGAMs) to propose new algorithms to efficiently learn graphs that are polytrees. Our approach combines the Chow--Liu algorithm, which first learns the undirected tree structure, with novel schemes to orient the edges. The orientation schemes assess algebraic relations among moments of the data-generating distribution and are computationally inexpensive. We establish high-dimensional consistency results for our approach and compare different algorithmic versions in numerical experiments.
    MaskBlock: Transferable Adversarial Examples with Bayes Approach. (arXiv:2208.06538v1 [cs.LG])
    The transferability of adversarial examples (AEs) across diverse models is of critical importance for black-box adversarial attacks, where attackers cannot access the information about black-box models. However, crafted AEs always present poor transferability. In this paper, by regarding the transferability of AEs as generalization ability of the model, we reveal that vanilla black-box attacks craft AEs via solving a maximum likelihood estimation (MLE) problem. For MLE, the results probably are model-specific local optimum when available data is small, i.e., limiting the transferability of AEs. By contrast, we re-formulate crafting transferable AEs as the maximizing a posteriori probability estimation problem, which is an effective approach to boost the generalization of results with limited available data. Because Bayes posterior inference is commonly intractable, a simple yet effective method called MaskBlock is developed to approximately estimate. Moreover, we show that the formulated framework is a generalization version for various attack methods. Extensive experiments illustrate MaskBlock can significantly improve the transferability of crafted adversarial examples by up to about 20%.
    Locating disparities in machine learning. (arXiv:2208.06680v1 [cs.LG])
    Machine learning was repeatedly proven to provide predictions with disparate outcomes, in which subgroups of the population (e.g., defined by age, gender, or other sensitive attributes) are systematically disadvantaged. Previous literature has focused on detecting such disparities through statistical procedures for when the sensitive attribute is specified a priori. However, this limits applicability in real-world settings where datasets are high dimensional and, on top of that, sensitive attributes may be unknown. As a remedy, we propose a data-driven framework called Automatic Location of Disparities (ALD) which aims at locating disparities in machine learning. ALD meets several demands from machine learning practice: ALD (1) is applicable to arbitrary machine learning classifiers; (2) operates on different definitions of disparities (e.g., statistical parity or equalized odds); (3) deals with both categorical and continuous predictors; (4) is suitable to handle high-dimensional settings; and (5) even identifies disparities due to intersectionality where disparities arise from complex and multi-way interactions (e.g., age above 60 and female). ALD produces interpretable fairness reports as output. We demonstrate the effectiveness of ALD based on both synthetic and real-world datasets. As a result, ALD helps practitioners and researchers of algorithmic fairness to detect disparities in machine learning algorithms, so that disparate -- or even unfair -- outcomes can be mitigated. Moreover, ALD supports practitioners in conducting algorithmic audits and protecting individuals from discrimination.
    Comparison of Forecasting Methods of House Electricity Consumption for Honda Smart Home. (arXiv:2208.07217v1 [cs.LG])
    The electricity consumption of buildings composes a major part of the city's energy consumption. Electricity consumption forecasting enables the development of home energy management systems resulting in the future design of more sustainable houses and a decrease in total energy consumption. Energy performance in buildings is influenced by many factors like ambient temperature, humidity, and a variety of electrical devices. Therefore, multivariate prediction methods are preferred rather than univariate. The Honda Smart Home US data set was selected to compare three methods for minimizing forecasting errors, MAE and RMSE: Artificial Neural Networks, Support Vector Regression, and Fuzzy Rule-Based Systems for Regression by constructing many models for each method on a multivariate data set in different time terms. The comparison shows that SVR is a superior method over the alternatives.
    An Adam-adjusting-antennae BAS Algorithm for Refining Latent Factors. (arXiv:2208.06603v1 [cs.LG])
    Extracting the latent information in high-dimensional and incomplete matrices is an important and challenging issue. The Latent Factor Analysis (LFA) model can well handle the high-dimensional matrices analysis. Recently, Particle Swarm Optimization (PSO)-incorporated LFA models have been proposed to tune the hyper-parameters adaptively with high efficiency. However, the incorporation of PSO causes the premature problem. To address this issue, we propose a sequential Adam-adjusting-antennae BAS (A2BAS) optimization algorithm, which refines the latent factors obtained by the PSO-incorporated LFA model. The A2BAS algorithm consists of two sub-algorithms. First, we design an improved BAS algorithm which adjusts beetles' antennae and step-size with Adam; Second, we implement the improved BAS algorithm to optimize all the row and column latent factors sequentially. With experimental results on two real high-dimensional matrices, we demonstrate that our algorithm can effectively solve the premature convergence issue.
    Confidence-Guided Learning Process for Continuous Classification of Time Series. (arXiv:2208.06883v1 [cs.LG])
    In the real world, the class of a time series is usually labeled at the final time, but many applications require to classify time series at every time point. e.g. the outcome of a critical patient is only determined at the end, but he should be diagnosed at all times for timely treatment. Thus, we propose a new concept: Continuous Classification of Time Series (CCTS). It requires the model to learn data in different time stages. But the time series evolves dynamically, leading to different data distributions. When a model learns multi-distribution, it always forgets or overfits. We suggest that meaningful learning scheduling is potential due to an interesting observation: Measured by confidence, the process of model learning multiple distributions is similar to the process of human learning multiple knowledge. Thus, we propose a novel Confidence-guided method for CCTS (C3TS). It can imitate the alternating human confidence described by the Dunning-Kruger Effect. We define the objective- confidence to arrange data, and the self-confidence to control the learning duration. Experiments on four real-world datasets show that C3TS is more accurate than all baselines for CCTS.
    PAC Generalization via Invariant Representations. (arXiv:2205.15196v3 [cs.LG] UPDATED)
    One method for obtaining generalizable solutions to machine learning tasks when presented with diverse training environments is to find \textit{invariant representations} of the data. These are representations of the covariates such that the best model on top of the representation is invariant across training environments. In the context of linear Structural Equation Models (SEMs), invariant representations might allow us to learn models with out-of-distribution guarantees, i.e., models that are robust to interventions in the SEM. To address the invariant representation problem in a {\em finite sample} setting, we consider the notion of $\epsilon$-approximate invariance. We study the following question: If a representation is approximately invariant with respect to a given number of training interventions, will it continue to be approximately invariant on a larger collection of unseen SEMs? This larger collection of SEMs is generated through a parameterized family of interventions. Inspired by PAC learning, we obtain finite-sample out-of-distribution generalization guarantees for approximate invariance that holds \textit{probabilistically} over a family of linear SEMs without faithfulness assumptions. Our results show bounds that do not scale in ambient dimension when intervention sites are restricted to lie in a constant size subset of in-degree bounded nodes. We also show how to extend our results to a linear indirect observation model that incorporates latent variables.
    Agreement or Disagreement in Noise-tolerant Mutual Learning?. (arXiv:2203.15317v2 [cs.CV] UPDATED)
    Deep learning has made many remarkable achievements in many fields but suffers from noisy labels in datasets. The state-of-the-art learning with noisy label method Co-teaching and Co-teaching+ confronts the noisy label by mutual-information between dual-network. However, the dual network always tends to convergent which would weaken the dual-network mechanism to resist the noisy labels. In this paper, we proposed a noise-tolerant framework named MLC in an end-to-end manner. It adjusts the dual-network with divergent regularization to ensure the effectiveness of the mechanism. In addition, we correct the label distribution according to the agreement between dual-networks. The proposed method can utilize the noisy data to improve the accuracy, generalization, and robustness of the network. We test the proposed method on the simulate noisy dataset MNIST, CIFAR-10, and the real-world noisy dataset Clothing1M. The experimental result shows that our method outperforms the previous state-of-the-art method. Besides, our method is network-free thus it is applicable to many tasks. Our code can be found at https://github.com/JiarunLiu/MLC.
    Accelerated and instance-optimal policy evaluation with linear function approximation. (arXiv:2112.13109v2 [stat.ML] UPDATED)
    We study the problem of policy evaluation with linear function approximation and present efficient and practical algorithms that come with strong optimality guarantees. We begin by proving lower bounds that establish baselines on both the deterministic error and stochastic error in this problem. In particular, we prove an oracle complexity lower bound on the deterministic error in an instance-dependent norm associated with the stationary distribution of the transition kernel, and use the local asymptotic minimax machinery to prove an instance-dependent lower bound on the stochastic error in the i.i.d. observation model. Existing algorithms fail to match at least one of these lower bounds: To illustrate, we analyze a variance-reduced variant of temporal difference learning, showing in particular that it fails to achieve the oracle complexity lower bound. To remedy this issue, we develop an accelerated, variance-reduced fast temporal difference algorithm (VRFTD) that simultaneously matches both lower bounds and attains a strong notion of instance-optimality. Finally, we extend the VRFTD algorithm to the setting with Markovian observations, and provide instance-dependent convergence results. Our theoretical guarantees of optimality are corroborated by numerical experiments.
    Frouros: A Python library for drift detection in Machine Learning problems. (arXiv:2208.06868v1 [cs.LG])
    Frouros is a Python library capable of detecting drift in machine learning problems. It provides a combination of classical and more recent algorithms for drift detection: both supervised and unsupervised, as well as some capable of acting in a semi-supervised manner. We have designed it with the objective of being easily integrated with the scikit-learn library, implementing the same application programming interface. The library is developed following a set of best development and continuous integration practices to ensure ease of maintenance and extensibility. The source code is available at https://github.com/IFCA/frouros.
    When Does Differentially Private Learning Not Suffer in High Dimensions?. (arXiv:2207.00160v3 [cs.LG] UPDATED)
    Large pretrained models can be privately fine-tuned to achieve performance approaching that of non-private models. A common theme in these results is the surprising observation that high-dimensional models can achieve favorable privacy-utility trade-offs. This seemingly contradicts known results on the model-size dependence of differentially private convex learning and raises the following research question: When does the performance of differentially private learning not degrade with increasing model size? We identify that the magnitudes of gradients projected onto subspaces is a key factor that determines performance. To precisely characterize this for private convex learning, we introduce a condition on the objective that we term \emph{restricted Lipschitz continuity} and derive improved bounds for the excess empirical and population risks that are dimension-independent under additional conditions. We empirically show that in private fine-tuning of large language models, gradients obtained during fine-tuning are mostly controlled by a few principal components. This behavior is similar to conditions under which we obtain dimension-independent bounds in convex settings. Our theoretical and empirical results together provide a possible explanation for recent successes in large-scale private fine-tuning. Code to reproduce our results can be found at \url{https://github.com/lxuechen/private-transformers/tree/main/examples/classification/spectral_analysis}.
    Orthogonal Gated Recurrent Unit with Neumann-Cayley Transformation. (arXiv:2208.06496v1 [cs.LG])
    In recent years, using orthogonal matrices has been shown to be a promising approach in improving Recurrent Neural Networks (RNNs) with training, stability, and convergence, particularly, to control gradients. While Gated Recurrent Unit (GRU) and Long Short Term Memory (LSTM) architectures address the vanishing gradient problem by using a variety of gates and memory cells, they are still prone to the exploding gradient problem. In this work, we analyze the gradients in GRU and propose the usage of orthogonal matrices to prevent exploding gradient problems and enhance long-term memory. We study where to use orthogonal matrices and we propose a Neumann series-based Scaled Cayley transformation for training orthogonal matrices in GRU, which we call Neumann-Cayley Orthogonal GRU, or simply NC-GRU. We present detailed experiments of our model on several synthetic and real-world tasks, which show that NC-GRU significantly outperforms GRU as well as several other RNNs.
    A Sparse Expansion For Deep Gaussian Processes. (arXiv:2112.05888v2 [stat.ML] UPDATED)
    In this work, we use Deep Gaussian Processes (DGPs) as statistical surrogates for stochastic processes with complex distributions. Conventional inferential methods for DGP models can suffer from high computational complexity as they require large-scale operations with kernel matrices for training and inference. In this work, we propose an efficient scheme for accurate inference and efficient training based on a range of Gaussian Processes, called the Tensor Markov Gaussian Processes (TMGP). We construct an induced approximation of TMGP referred to as the hierarchical expansion. Next, we develop a deep TMGP (DTMGP) model as the composition of multiple hierarchical expansion of TMGPs. The proposed DTMGP model has the following properties: (1) the outputs of each activation function are deterministic while the weights are chosen independently from standard Gaussian distribution; (2) in training or prediction, only polylog(M) (out of M) activation functions have non-zero outputs, which significantly boosts the computational efficiency. Our numerical experiments on synthetic models and real datasets show the superior computational efficiency of DTMGP over existing DGP models.
    PAC Reinforcement Learning for Predictive State Representations. (arXiv:2207.05738v3 [cs.LG] UPDATED)
    In this paper we study online Reinforcement Learning (RL) in partially observable dynamical systems. We focus on the Predictive State Representations (PSRs) model, which is an expressive model that captures other well-known models such as Partially Observable Markov Decision Processes (POMDP). PSR represents the states using a set of predictions of future observations and is defined entirely using observable quantities. We develop a novel model-based algorithm for PSRs that can learn a near optimal policy in sample complexity scaling polynomially with respect to all the relevant parameters of the systems. Our algorithm naturally works with function approximation to extend to systems with potentially large state and observation spaces. We show that given a realizable model class, the sample complexity of learning the near optimal policy only scales polynomially with respect to the statistical complexity of the model class, without any explicit polynomial dependence on the size of the state and observation spaces. Notably, our work is the first work that shows polynomial sample complexities to compete with the globally optimal policy in PSRs. Finally, we demonstrate how our general theorem can be directly used to derive sample complexity bounds for special models including $m$-step weakly revealing and $m$-step decodable tabular POMDPs, POMDPs with low-rank latent transition, and POMDPs with linear emission and latent transition.
    \b{eta}-Divergence-Based Latent Factorization of Tensors model for QoS prediction. (arXiv:2208.06778v1 [cs.LG])
    A nonnegative latent factorization of tensors (NLFT) model can well model the temporal pattern hidden in nonnegative quality-of-service (QoS) data for predicting the unobserved ones with high accuracy. However, existing NLFT models' objective function is based on Euclidean distance, which is only a special case of \b{eta}-divergence. Hence, can we build a generalized NLFT model via adopting \b{eta}-divergence to achieve prediction accuracy gain? To tackle this issue, this paper proposes a \b{eta}-divergence-based NLFT model (\b{eta}-NLFT). Its ideas are two-fold 1) building a learning objective with \b{eta}-divergence to achieve higher prediction accuracy, and 2) implementing self-adaptation of hyper-parameters to improve practicability. Empirical studies on two dynamic QoS datasets demonstrate that compared with state-of-the-art models, the proposed \b{eta}-NLFT model achieves the higher prediction accuracy for unobserved QoS data.
    Sharp Frequency Bounds for Sample-Based Queries. (arXiv:2208.06753v1 [cs.LG])
    A data sketch algorithm scans a big data set, collecting a small amount of data -- the sketch, which can be used to statistically infer properties of the big data set. Some data sketch algorithms take a fixed-size random sample of a big data set, and use that sample to infer frequencies of items that meet various criteria in the big data set. This paper shows how to statistically infer probably approximately correct (PAC) bounds for those frequencies, efficiently, and precisely enough that the frequency bounds are either sharp or off by only one, which is the best possible result without exact computation.
    USB: A Unified Semi-supervised Learning Benchmark. (arXiv:2208.07204v1 [cs.LG])
    Semi-supervised learning (SSL) improves model generalization by leveraging massive unlabeled data to augment limited labeled samples. However, currently, popular SSL evaluation protocols are often constrained to computer vision (CV) tasks. In addition, previous work typically trains deep neural networks from scratch, which is time-consuming and environmentally unfriendly. To address the above issues, we construct a Unified SSL Benchmark (USB) by selecting 15 diverse, challenging, and comprehensive tasks from CV, natural language processing (NLP), and audio processing (Audio), on which we systematically evaluate dominant SSL methods, and also open-source a modular and extensible codebase for fair evaluation on these SSL methods. We further provide pre-trained versions of the state-of-the-art neural models for CV tasks to make the cost affordable for further tuning. USB enables the evaluation of a single SSL algorithm on more tasks from multiple domains but with less cost. Specifically, on a single NVIDIA V100, only 37 GPU days are required to evaluate FixMatch on 15 tasks in USB while 335 GPU days (279 GPU days on 4 CV datasets except for ImageNet) are needed on 5 CV tasks with the typical protocol.
    Direct Advantage Estimation. (arXiv:2109.06093v2 [cs.LG] UPDATED)
    The predominant approach in reinforcement learning is to assign credit to actions based on the expected return. However, we show that the return may depend on the policy in a way which could lead to excessive variance in value estimation and slow down learning. Instead, we show that the advantage function can be interpreted as causal effects and shares similar properties with causal representations. Based on this insight, we propose Direct Advantage Estimation (DAE), a novel method that can model the advantage function and estimate it directly from on-policy data while simultaneously minimizing the variance of the return without requiring the (action-)value function. We also relate our method to Temporal Difference methods by showing how value functions can be seamlessly integrated into DAE. The proposed method is easy to implement and can be readily adapted by modern actor-critic methods. We evaluate DAE empirically on three discrete control domains and show that it can outperform generalized advantage estimation (GAE), a strong baseline for advantage estimation, on a majority of the environments when applied to policy optimization.
    Incoporating Weighted Board Learning System for Accurate Occupational Pneumoconiosis Staging. (arXiv:2208.06607v1 [cs.LG])
    Occupational pneumoconiosis (OP) staging is a vital task concerning the lung healthy of a subject. The staging result of a patient is depended on the staging standard and his chest X-ray. It is essentially an image classification task. However, the distribution of OP data is commonly imbalanced, which largely reduces the effect of classification models which are proposed under the assumption that data follow a balanced distribution and causes inaccurate staging results. To achieve accurate OP staging, we proposed an OP staging model who is able to handle imbalance data in this work. The proposed model adopts gray level co-occurrence matrix (GLCM) to extract texture feature of chest X-ray and implements classification with a weighted broad learning system (WBLS). Empirical studies on six data cases provided by a hospital indicate that proposed model can perform better OP staging than state-of-the-art classifiers with imbalanced data.
    DeepScalper: A Risk-Aware Reinforcement Learning Framework to Capture Fleeting Intraday Trading Opportunities. (arXiv:2201.09058v2 [q-fin.TR] UPDATED)
    Reinforcement learning (RL) techniques have shown great success in many challenging quantitative trading tasks, such as portfolio management and algorithmic trading. Especially, intraday trading is one of the most profitable and risky tasks because of the intraday behaviors of the financial market that reflect billions of rapidly fluctuating capitals. However, a vast majority of existing RL methods focus on the relatively low frequency trading scenarios (e.g., day-level) and fail to capture the fleeting intraday investment opportunities due to two major challenges: 1) how to effectively train profitable RL agents for intraday investment decision-making, which involves high-dimensional fine-grained action space; 2) how to learn meaningful multi-modality market representation to understand the intraday behaviors of the financial market at tick-level. Motivated by the efficient workflow of professional human intraday traders, we propose DeepScalper, a deep reinforcement learning framework for intraday trading to tackle the above challenges. Specifically, DeepScalper includes four components: 1) a dueling Q-network with action branching to deal with the large action space of intraday trading for efficient RL optimization; 2) a novel reward function with a hindsight bonus to encourage RL agents making trading decisions with a long-term horizon of the entire trading day; 3) an encoder-decoder architecture to learn multi-modality temporal market embedding, which incorporates both macro-level and micro-level market information; 4) a risk-aware auxiliary task to maintain a striking balance between maximizing profit and minimizing risk. Through extensive experiments on real-world market data spanning over three years on six financial futures, we demonstrate that DeepScalper significantly outperforms many state-of-the-art baselines in terms of four financial criteria.
    Privacy-Preserving Decentralized Inference with Graph Neural Networks in Wireless Networks. (arXiv:2208.06963v1 [cs.IT])
    As an efficient neural network model for graph data, graph neural networks (GNNs) recently find successful applications for various wireless optimization problems. Given that the inference stage of GNNs can be naturally implemented in a decentralized manner, GNN is a potential enabler for decentralized control/management in the next-generation wireless communications. Privacy leakage, however, may occur due to the information exchanges among neighbors during decentralized inference with GNNs. To deal with this issue, in this paper, we analyze and enhance the privacy of decentralized inference with GNNs in wireless networks. Specifically, we adopt local differential privacy as the metric, and design novel privacy-preserving signals as well as privacy-guaranteed training algorithms to achieve privacy-preserving inference. We also define the SNR-privacy trade-off function to analyze the performance upper bound of decentralized inference with GNNs in wireless networks. To further enhance the communication and computation efficiency, we adopt the over-the-air computation technique and theoretically demonstrate its advantage in privacy preservation. Through extensive simulations on the synthetic graph data, we validate our theoretical analysis, verify the effectiveness of proposed privacy-preserving wireless signaling and privacy-guaranteed training algorithm, and offer some guidance on practical implementation.
    Policy Gradient Methods Find the Nash Equilibrium in N-player General-sum Linear-quadratic Games. (arXiv:2107.13090v2 [math.OC] UPDATED)
    We consider a general-sum N-player linear-quadratic game with stochastic dynamics over a finite horizon and prove the global convergence of the natural policy gradient method to the Nash equilibrium. In order to prove the convergence of the method, we require a certain amount of noise in the system. We give a condition, essentially a lower bound on the covariance of the noise in terms of the model parameters, in order to guarantee convergence. We illustrate our results with numerical experiments to show that even in situations where the policy gradient method may not converge in the deterministic setting, the addition of noise leads to convergence.
    Class Prior Estimation under Covariate Shift: No Problem?. (arXiv:2206.02449v2 [stat.ML] UPDATED)
    We show that in the context of classification the property of source and target distributions to be related by covariate shift may be lost if the information content captured in the covariates is reduced, for instance by dropping components or mapping into a lower-dimensional or finite space. As a consequence, under covariate shift simple approaches to class prior estimation in the style of classify and count with or without adjustment are infeasible. We prove that transformations of the covariates that preserve the covariate shift property are necessarily sufficient in the statistical sense for the full set of covariates. A probing algorithm as alternative approach to class prior estimation under covariate shift is proposed.
    Self-Supervised Transformers for fMRI representation. (arXiv:2112.05761v2 [eess.IV] UPDATED)
    We present TFF, which is a Transformer framework for the analysis of functional Magnetic Resonance Imaging (fMRI) data. TFF employs a two-phase training approach. First, self-supervised training is applied to a collection of fMRI scans, where the model is trained to reconstruct 3D volume data. Second, the pre-trained model is fine-tuned on specific tasks, utilizing ground truth labels. Our results show state-of-the-art performance on a variety of fMRI tasks, including age and gender prediction, as well as schizophrenia recognition. Our code for the training, network architecture, and results is attached as supplementary material.
    Reduced Implication-bias Logic Loss for Neuro-Symbolic Learning. (arXiv:2208.06838v1 [cs.AI])
    Integrating logical reasoning and machine learning by approximating logical inference with differentiable operators is a widely used technique in Neuro-Symbolic systems. However, some differentiable operators could bring a significant bias during backpropagation and degrade the performance of Neuro-Symbolic learning. In this paper, we reveal that this bias, named \textit{Implication Bias} is common in loss functions derived from fuzzy logic operators. Furthermore, we propose a simple yet effective method to transform the biased loss functions into \textit{Reduced Implication-bias Logic Loss (RILL)} to address the above problem. Empirical study shows that RILL can achieve significant improvements compared with the biased logic loss functions, especially when the knowledge base is incomplete, and keeps more robust than the compared methods when labelled data is insufficient.
    Towards Theoretical Understandings of Robust Markov Decision Processes: Sample Complexity and Asymptotics. (arXiv:2105.03863v3 [stat.ML] UPDATED)
    In this paper, we study the non-asymptotic and asymptotic performances of the optimal robust policy and value function of robust Markov Decision Processes(MDPs), where the optimal robust policy and value function are solved only from a generative model. While prior work focusing on non-asymptotic performances of robust MDPs is restricted in the setting of the KL uncertainty set and $(s,a)$-rectangular assumption, we improve their results and also consider other uncertainty sets, including $L_1$ and $\chi^2$ balls. Our results show that when we assume $(s,a)$-rectangular on uncertainty sets, the sample complexity is about $\widetilde{O}\left(\frac{|\mathcal{S}|^2|\mathcal{A}|}{\varepsilon^2\rho^2(1-\gamma)^4}\right)$. In addition, we extend our results from $(s,a)$-rectangular assumption to $s$-rectangular assumption. In this scenario, the sample complexity varies with the choice of uncertainty sets and is generally larger than the case under $(s,a)$-rectangular assumption. Moreover, we also show that the optimal robust value function is asymptotic normal with a typical rate $\sqrt{n}$ under $(s,a)$ and $s$-rectangular assumptions from both theoretical and empirical perspectives.
    Accelerating hydrodynamic simulations of urban drainage systems with physics-guided machine learning. (arXiv:2206.01538v2 [cs.LG] UPDATED)
    We propose and demonstrate a new approach for fast and accurate surrogate modelling of urban drainage system hydraulics based on physics-guided machine learning. The surrogates are trained against a limited set of simulation results from a hydrodynamic (HiFi) model. Our approach reduces simulation times by one to two orders of magnitude compared to a HiFi model. It is thus slower than e.g. conceptual hydrological models, but it enables simulations of water levels, flows and surcharges in all nodes and links of a drainage network and thus largely preserves the level of detail provided by HiFi models. Comparing time series simulated by the surrogate and the HiFi model, R2 values in the order of 0.9 are achieved. Surrogate training times are currently in the order of one hour. However, they can likely be reduced through the application of transfer learning and graph neural networks. Our surrogate approach will be useful for interactive workshops in initial design phases of urban drainage systems, as well as for real time applications. In addition, our model formulation is generic and future research should investigate its application for simulating other water systems.
    AI for Global Climate Cooperation: Modeling Global Climate Negotiations, Agreements, and Long-Term Cooperation in RICE-N. (arXiv:2208.07004v1 [cs.LG])
    Comprehensive global cooperation is essential to limit global temperature increases while continuing economic development, e.g., reducing severe inequality or achieving long-term economic growth. Achieving long-term cooperation on climate change mitigation with n strategic agents poses a complex game-theoretic problem. For example, agents may negotiate and reach climate agreements, but there is no central authority to enforce adherence to those agreements. Hence, it is critical to design negotiation and agreement frameworks that foster cooperation, allow all agents to meet their individual policy objectives, and incentivize long-term adherence. This is an interdisciplinary challenge that calls for collaboration between researchers in machine learning, economics, climate science, law, policy, ethics, and other fields. In particular, we argue that machine learning is a critical tool to address the complexity of this domain. To facilitate this research, here we introduce RICE-N, a multi-region integrated assessment model that simulates the global climate and economy, and which can be used to design and evaluate the strategic outcomes for different negotiation and agreement frameworks. We also describe how to use multi-agent reinforcement learning to train rational agents using RICE-N. This framework underpinsAI for Global Climate Cooperation, a working group collaboration and competition on climate negotiation and agreement design. Here, we invite the scientific community to design and evaluate their solutions using RICE-N, machine learning, economic intuition, and other domain knowledge. More information can be found on www.ai4climatecoop.org.
    Combining deep learning and crowdsourcing geo-images to predict housing quality in rural China. (arXiv:2208.06997v1 [cs.LG])
    Housing quality is an essential proxy for regional wealth, security and health. Understanding the distribution of housing quality is crucial for unveiling rural development status and providing political proposals. However,present rural house quality data highly depends on a top-down, time-consuming survey at the national or provincial level but fails to unpack the housing quality at the village level. To fill the gap between accurately depicting rural housing quality conditions and deficient data,we collect massive rural images and invite users to assess their housing quality at scale. Furthermore, a deep learning framework is proposed to automatically and efficiently predict housing quality based on crowd-sourcing rural images.
    Towards Spatio-Temporal Cross-Platform Graph Embedding Fusion for Urban Traffic Flow Prediction. (arXiv:2208.06947v1 [cs.LG])
    In this paper, we have proposed STC-GEF, a novel Spatio-Temporal Cross-platform Graph Embedding Fusion approach for the urban traffic flow prediction. We have designed a spatial embedding module based on graph convolutional networks (GCN) to extract the complex spatial features within traffic flow data. Furthermore, to capture the temporal dependencies between the traffic flow data from various time intervals, we have designed a temporal embedding module based on recurrent neural networks. Based on the observations that different transportation platforms trip data (e.g., taxis, Uber, and Lyft) can be correlated, we have designed an effective fusion mechanism that combines the trip data from different transportation platforms and further uses them for cross-platform traffic flow prediction (e.g., integrating taxis and ride-sharing platforms for taxi traffic flow prediction). We have conducted extensive real-world experimental studies based on real-world trip data of yellow taxis and ride-sharing (Lyft) from the New York City (NYC), and validated the accuracy and effectiveness of STC-GEF in fusing different transportation platform data and predicting traffic flows.
    Syntax-driven Data Augmentation for Named Entity Recognition. (arXiv:2208.06957v1 [cs.CL])
    In low resource settings, data augmentation strategies are commonly leveraged to improve performance. Numerous approaches have attempted document-level augmentation (e.g., text classification), but few studies have explored token-level augmentation. Performed naively, data augmentation can produce semantically incongruent and ungrammatical examples. In this work, we compare simple masked language model replacement and an augmentation method using constituency tree mutations to improve the performance of named entity recognition in low-resource settings with the aim of preserving linguistic cohesion of the augmented sentences.
    Modeling Network-level Traffic Flow Transitions on Sparse Data. (arXiv:2208.06646v1 [cs.LG])
    Modeling how network-level traffic flow changes in the urban environment is useful for decision-making in transportation, public safety and urban planning. The traffic flow system can be viewed as a dynamic process that transits between states (e.g., traffic volumes on each road segment) over time. In the real-world traffic system with traffic operation actions like traffic signal control or reversible lane changing, the system's state is influenced by both the historical states and the actions of traffic operations. In this paper, we consider the problem of modeling network-level traffic flow under a real-world setting, where the available data is sparse (i.e., only part of the traffic system is observed). We present DTIGNN, an approach that can predict network-level traffic flows from sparse data. DTIGNN models the traffic system as a dynamic graph influenced by traffic signals, learns the transition models grounded by fundamental transition equations from transportation, and predicts future traffic states with imputation in the process. Through comprehensive experiments, we demonstrate that our method outperforms state-of-the-art methods and can better support decision-making in transportation.
    Deep-Learning-Aided Path Planning and Map Construction for Expediting Indoor Mapping. (arXiv:2011.02043v2 [cs.LG] UPDATED)
    The problem of autonomous indoor mapping is addressed. The goal is to minimize the time to achieve a predefined percentage of exposure with some desired level of certainty. The use of a pre-trained generative deep neural network, acting as a map predictor, in both the path planning and the map construction is proposed in order to expedite the mapping process. This method is examined in combination with several frontier-based path planners for two distinct floorplan datasets. Simulations are run for several configurations of the integrated map predictor, the results of which reveal that by utilizing the prediction a significant reduction in mapping time is possible. When the prediction is integrated in both path planning and map construction processes it is shown that the mapping time may in some cases be cut by over 50%.
    Asset Allocation: From Markowitz to Deep Reinforcement Learning. (arXiv:2208.07158v1 [q-fin.PM])
    Asset allocation is an investment strategy that aims to balance risk and reward by constantly redistributing the portfolio's assets according to certain goals, risk tolerance, and investment horizon. Unfortunately, there is no simple formula that can find the right allocation for every individual. As a result, investors may use different asset allocations' strategy to try to fulfil their financial objectives. In this work, we conduct an extensive benchmark study to determine the efficacy and reliability of a number of optimization techniques. In particular, we focus on traditional approaches based on Modern Portfolio Theory, and on machine-learning approaches based on deep reinforcement learning. We assess the model's performance under different market tendency, i.e., both bullish and bearish markets. For reproducibility, we provide the code implementation code in this repository.
    On the Limitations of Continual Learning for Malware Classification. (arXiv:2208.06568v1 [cs.CR])
    Malicious software (malware) classification offers a unique challenge for continual learning (CL) regimes due to the volume of new samples received on a daily basis and the evolution of malware to exploit new vulnerabilities. On a typical day, antivirus vendors receive hundreds of thousands of unique pieces of software, both malicious and benign, and over the course of the lifetime of a malware classifier, more than a billion samples can easily accumulate. Given the scale of the problem, sequential training using continual learning techniques could provide substantial benefits in reducing training and storage overhead. To date, however, there has been no exploration of CL applied to malware classification tasks. In this paper, we study 11 CL techniques applied to three malware tasks covering common incremental learning scenarios, including task, class, and domain incremental learning (IL). Specifically, using two realistic, large-scale malware datasets, we evaluate the performance of the CL methods on both binary malware classification (Domain-IL) and multi-class malware family classification (Task-IL and Class-IL) tasks. To our surprise, continual learning methods significantly underperformed naive Joint replay of the training data in nearly all settings -- in some cases reducing accuracy by more than 70 percentage points. A simple approach of selectively replaying 20% of the stored data achieves better performance, with 50% of the training time compared to Joint replay. Finally, we discuss potential reasons for the unexpectedly poor performance of the CL techniques, with the hope that it spurs further research on developing techniques that are more effective in the malware classification domain.
    Magnetic Resonance Spectroscopy Deep Learning Denoising Using Few In Vivo Data. (arXiv:2101.11442v3 [physics.med-ph] UPDATED)
    Magnetic Resonance Spectroscopy (MRS) is a noninvasive tool to reveal metabolic information. One challenge of 1H-MRS is the low Signal-Noise Ratio (SNR). To improve the SNR, a typical approach is to perform Signal Averaging (SA) with M repeated samples. The data acquisition time, however, is increased by M times accordingly, and a complete clinical MRS scan takes approximately 10 minutes at a common setting M=128. Recently, deep learning has been introduced to improve the SNR but most of them use the simulated data as the training set. This may hinder the MRS applications since some potential differences, such as acquisition system imperfections, and physiological and psychologic conditions may exist between the simulated and in vivo data. Here, we proposed a new scheme that purely used the repeated samples of realistic data. A deep learning model, Refusion Long Short-Term Memory (ReLSTM), was designed to learn the mapping from the low SNR time-domain data (24 SA) to the high SNR one (128 SA). Experiments on the in vivo brain spectra of 7 healthy subjects, 2 brain tumor patients and 1 cerebral infarction patient showed that only using 20% repeated samples, the denoised spectra by ReLSTM could provide comparable estimated concentrations of metabolites to 128 SA. Compared with the state-of-the-art low-rank denoising method, the ReLSTM achieved the lower relative error and the Cram\'er-Rao lower bounds in quantifying some important biomarkers. In summary, ReLSTM can perform high-fidelity denoising of the spectra under fast acquisition (24 SA), which would be valuable to MRS clinical studies.
    On a Mechanism Framework of Autoencoders. (arXiv:2208.06995v1 [cs.LG])
    This paper proposes a theoretical framework on the mechanism of autoencoders. To the encoder part, under the main use of dimensionality reduction, we investigate its two fundamental properties: bijective maps and data disentangling. The general construction methods of an encoder that satisfies either or both of the above two properties are given. To the decoder part, as a consequence of the encoder constructions, we present a new basic principle of the solution, without using affine transforms. The generalization mechanism of autoencoders is modeled. The results of ReLU autoencoders are generalized to some non-ReLU cases, particularly for the sigmoid-unit autoencoder. Based on the theoretical framework above, we explain some experimental results of variational autoencoders, denoising autoencoders, and linear-unit autoencoders, with emphasis on the interpretation of the lower-dimensional representation of data via encoders; and the mechanism of image restoration through autoencoders is natural to be understood by those explanations. Compared to PCA and decision trees, the advantages of (generalized) autoencoders on dimensionality reduction and classification are demonstrated, respectively. Convolutional neural networks and randomly weighted neural networks are also interpreted by this framework.
    BED: A Real-Time Object Detection System for Edge Devices. (arXiv:2202.07503v3 [cs.CV] UPDATED)
    Deploying deep neural networks~(DNNs) on edge devices provides efficient and effective solutions for the real-world tasks. Edge devices have been used for collecting a large volume of data efficiently in different domains. DNNs have been an effective tool for data processing and analysis. However, designing DNNs on edge devices is challenging due to the limited computational resources and memory. To tackle this challenge, we demonstrate Object Detection System for Edge Devices~(BED) on the MAX78000 DNN accelerator. It integrates on-device DNN inference with a camera and an LCD display for image acquisition and detection exhibition, respectively. BED is a concise, effective and detailed solution, including model training, quantization, synthesis and deployment. The entire repository is open-sourced on Github, including a Graphical User Interface~(GUI) for on-chip debugging. Experiment results indicate that BED can produce accurate detection with a 300-KB tiny DNN model, which takes only 91.9 ms of inference time and 1.845 mJ of energy. The real-time detection is available at YouTube.
    GUARD: Graph Universal Adversarial Defense. (arXiv:2204.09803v3 [cs.LG] UPDATED)
    Graph convolutional networks (GCNs) have shown to be vulnerable to small adversarial perturbations, which becomes a severe threat and largely limits their applications in security-critical scenarios. To mitigate such a threat, considerable research efforts have been devoted to increasing the robustness of GCNs against adversarial attacks. However, current approaches for defense are typically designed for the whole graph and consider the global performance, posing challenges in protecting important local nodes from stronger adversarial targeted attacks. In this work, we present a simple yet effective method, named Graph Universal Adversarial Defense (GUARD). Unlike previous works, GUARD protects each individual node from attacks with a universal defensive patch, which is generated once and can be applied to any node (node-agnostic) in a graph. Extensive experiments on four benchmark datasets demonstrate that our method significantly improves robustness for several established GCNs against multiple adversarial attacks and outperforms state-of-the-art defense methods by large margins. Our code is publicly available at https://github.com/EdisonLeeeee/GUARD.
    Enhancing Graph Contrastive Learning with Node Similarity. (arXiv:2208.06743v1 [cs.LG])
    Graph Neural Networks (GNNs) have achieved great success in learning graph representations and thus facilitating various graph-related tasks. However, most GNN methods adopt a supervised learning setting, which is not always feasible in real-world applications due to the difficulty to obtain labeled data. Hence, graph self-supervised learning has been attracting increasing attention. Graph contrastive learning (GCL) is a representative framework for self-supervised learning. In general, GCL learns node representations by contrasting semantically similar nodes (positive samples) and dissimilar nodes (negative samples) with anchor nodes. Without access to labels, positive samples are typically generated by data augmentation, and negative samples are uniformly sampled from the entire graph, which leads to a sub-optimal objective. Specifically, data augmentation naturally limits the number of positive samples that involve in the process (typically only one positive sample is adopted). On the other hand, the random sampling process would inevitably select false-negative samples (samples sharing the same semantics with the anchor). These issues limit the learning capability of GCL. In this work, we propose an enhanced objective that addresses the aforementioned issues. We first introduce an unachievable ideal objective that contains all positive samples and no false-negative samples. This ideal objective is then transformed into a probabilistic form based on the distributions for sampling positive and negative samples. We then model these distributions with node similarity and derive the enhanced objective. Comprehensive experiments on various datasets demonstrate the effectiveness of the proposed enhanced objective under different settings.
    A View Independent Classification Framework for Yoga Postures. (arXiv:2206.13577v2 [cs.CV] UPDATED)
    Yoga is a globally acclaimed and widely recommended practice for a healthy living. Maintaining correct posture while performing a Yogasana is of utmost importance. In this work, we employ transfer learning from Human Pose Estimation models for extracting 136 key-points spread all over the body to train a Random Forest classifier which is used for estimation of the Yogasanas. The results are evaluated on an in-house collected extensive yoga video database of 51 subjects recorded from 4 different camera angles. We propose a 3 step scheme for evaluating the generalizability of a Yoga classifier by testing it on 1) unseen frames, 2) unseen subjects, and 3) unseen camera angles. We argue that for most of the applications, validation accuracies on unseen subjects and unseen camera angles would be most important. We empirically analyze over three public datasets, the advantage of transfer learning and the possibilities of target leakage. We further demonstrate that the classification accuracies critically depend on the cross validation method employed and can often be misleading. To promote further research, we have made key-points dataset and code publicly available.
    RG-Flow: A hierarchical and explainable flow model based on renormalization group and sparse prior. (arXiv:2010.00029v5 [cs.LG] UPDATED)
    Flow-based generative models have become an important class of unsupervised learning approaches. In this work, we incorporate the key ideas of renormalization group (RG) and sparse prior distribution to design a hierarchical flow-based generative model, RG-Flow, which can separate information at different scales of images and extract disentangled representations at each scale. We demonstrate our method on synthetic multi-scale image datasets and the CelebA dataset, showing that the disentangled representations enable semantic manipulation and style mixing of the images at different scales. To visualize the latent representations, we introduce receptive fields for flow-based models and show that the receptive fields of RG-Flow are similar to those of convolutional neural networks. In addition, we replace the widely adopted isotropic Gaussian prior distribution by the sparse Laplacian distribution to further enhance the disentanglement of representations. From a theoretical perspective, our proposed method has $O(\log L)$ complexity for inpainting of an image with edge length $L$, compared to previous generative models with $O(L^2)$ complexity.
    BinBert: Binary Code Understanding with a Fine-tunable and Execution-aware Transformer. (arXiv:2208.06692v1 [cs.CR])
    A recent trend in binary code analysis promotes the use of neural solutions based on instruction embedding models. An instruction embedding model is a neural network that transforms sequences of assembly instructions into embedding vectors. If the embedding network is trained such that the translation from code to vectors partially preserves the semantic, the network effectively represents an assembly code model. In this paper we present BinBert, a novel assembly code model. BinBert is built on a transformer pre-trained on a huge dataset of both assembly instruction sequences and symbolic execution information. BinBert can be applied to assembly instructions sequences and it is fine-tunable, i.e. it can be re-trained as part of a neural architecture on task-specific data. Through fine-tuning, BinBert learns how to apply the general knowledge acquired with pre-training to the specific task. We evaluated BinBert on a multi-task benchmark that we specifically designed to test the understanding of assembly code. The benchmark is composed of several tasks, some taken from the literature, and a few novel tasks that we designed, with a mix of intrinsic and downstream tasks. Our results show that BinBert outperforms state-of-the-art models for binary instruction embedding, raising the bar for binary code understanding.
    The FEDHC Bayesian network learning algorithm. (arXiv:2012.00113v6 [stat.ML] UPDATED)
    The paper proposes a new hybrid Bayesian network learning algorithm, termed Forward Early Dropping Hill Climbing (FEDHC), devised to work with either continuous or categorical variables. Further, the paper manifests that the only implementation of MMHC in the statistical software \textit{R}, is prohibitively expensive and a new implementation is offered. Further, specifically for the case of continuous data, a robust to outliers version of FEDHC, that can be adopted by other BN learning algorithms, is proposed. The FEDHC is tested via Monte Carlo simulations that distinctly show it is computationally efficient, and produces Bayesian networks of similar to, or of higher accuracy than MMHC and PCHC. Finally, an application of FEDHC, PCHC and MMHC algorithms to real data, from the field of economics, is demonstrated using the statistical software \textit{R}.
    Forecasting Question Answering over Temporal Knowledge Graphs. (arXiv:2208.06501v1 [cs.AI])
    Question answering over temporal knowledge graphs (TKGQA) has recently found increasing interest. TKGQA requires temporal reasoning techniques to extract the relevant information from temporal knowledge bases. The only existing TKGQA dataset, i.e., CronQuestions, consists of temporal questions based on the facts from a fixed time period, where a temporal knowledge graph (TKG) spanning the same period can be fully used for answer inference, allowing the TKGQA models to use even the future knowledge to answer the questions based on the past facts. In real-world scenarios, however, it is also common that given the knowledge until now, we wish the TKGQA systems to answer the questions asking about the future. As humans constantly seek plans for the future, building TKGQA systems for answering such forecasting questions is important. Nevertheless, this has still been unexplored in previous research. In this paper, we propose a novel task: forecasting question answering over temporal knowledge graphs. We also propose a large-scale TKGQA benchmark dataset, i.e., ForecastTKGQuestions, for this task. It includes three types of questions, i.e., entity prediction, yes-no, and fact reasoning questions. For every forecasting question in our dataset, QA models can only have access to the TKG information before the timestamp annotated in the given question for answer inference. We find that the state-of-the-art TKGQA methods perform poorly on forecasting questions, and they are unable to answer yes-no questions and fact reasoning questions. To this end, we propose ForecastTKGQA, a TKGQA model that employs a TKG forecasting module for future inference, to answer all three types of questions. Experimental results show that ForecastTKGQA outperforms recent TKGQA methods on the entity prediction questions, and it also shows great effectiveness in answering the other two types of questions.
    On the Estimation of Derivatives Using Plug-in KRR Estimators. (arXiv:2006.01350v3 [stat.ML] UPDATED)
    We study the problem of estimating the derivatives of a regression function, which has a wide range of applications as a key nonparametric functional of unknown functions. Standard analysis may be tailored to specific derivative orders, and parameter tuning remains a daunting challenge particularly for high-order derivatives. In this article, we propose a simple plug-in kernel ridge regression (KRR) estimator in nonparametric regression with random design that is broadly applicable for multi-dimensional support and arbitrary mixed-partial derivatives. We provide a non-asymptotic analysis to study the behavior of the proposed estimator in a unified manner that encompasses the regression function and its derivatives, leading to two error bounds for a general class of kernels under the strong $L_\infty$ norm. In a concrete example specialized to kernels with polynomially decaying eigenvalues, the proposed estimator recovers the minimax optimal rate up to a logarithmic factor for estimating derivatives of functions in H\"older and Sobolev classes. Interestingly, the proposed estimator achieves the optimal rate of convergence with the same choice of tuning parameter for any order of derivatives. Hence, the proposed estimator enjoys a \textit{plug-in property} for derivatives in that it automatically adapts to the order of derivatives to be estimated, enabling easy tuning in practice. Our simulation studies show favorable finite sample performance of the proposed method relative to several existing methods blue and corroborate the theoretical findings on its minimax optimality.
    Imputation Strategies Under Clinical Presence: Impact on Algorithmic Fairness. (arXiv:2208.06648v1 [cs.AI])
    Biases have marked medical history, leading to unequal care affecting marginalised groups. The patterns of missingness in observational data often reflect these group discrepancies, but the algorithmic fairness implications of group-specific missingness are not well understood. Despite its potential impact, imputation is too often a forgotten preprocessing step. At best, practitioners guide imputation choice by optimising overall performance, ignoring how this preprocessing can reinforce inequities. Our work questions this choice by studying how imputation affects downstream algorithmic fairness. First, we provide a structured view of the relationship between clinical presence mechanisms and group-specific missingness patterns. Then, through simulations and real-world experiments, we demonstrate that the imputation choice influences marginalised group performance and that no imputation strategy consistently reduces disparities. Importantly, our results show that current practices may endanger health equity as similarly performing imputation strategies at the population level can affect marginalised groups in different ways. Finally, we propose recommendations for mitigating inequity stemming from a neglected step of the machine learning pipeline.
    Demo: RhythmEdge: Enabling Contactless Heart Rate Estimation on the Edge. (arXiv:2208.06572v1 [cs.LG])
    In this demo paper, we design and prototype RhythmEdge, a low-cost, deep-learning-based contact-less system for regular HR monitoring applications. RhythmEdge benefits over existing approaches by facilitating contact-less nature, real-time/offline operation, inexpensive and available sensing components, and computing devices. Our RhythmEdge system is portable and easily deployable for reliable HR estimation in moderately controlled indoor or outdoor environments. RhythmEdge measures HR via detecting changes in blood volume from facial videos (Remote Photoplethysmography; rPPG) and provides instant assessment using off-the-shelf commercially available resource-constrained edge platforms and video cameras. We demonstrate the scalability, flexibility, and compatibility of the RhythmEdge by deploying it on three resource-constrained platforms of differing architectures (NVIDIA Jetson Nano, Google Coral Development Board, Raspberry Pi) and three heterogeneous cameras of differing sensitivity, resolution, properties (web camera, action camera, and DSLR). RhythmEdge further stores longitudinal cardiovascular information and provides instant notification to the users. We thoroughly test the prototype stability, latency, and feasibility for three edge computing platforms by profiling their runtime, memory, and power usage.
    Learning Contact Dynamics using Physically Structured Neural Networks. (arXiv:2102.11206v2 [cs.LG] UPDATED)
    Learning physically structured representations of dynamical systems that include contact between different objects is an important problem for learning-based approaches in robotics. Black-box neural networks can learn to approximately represent discontinuous dynamics, but they typically require large quantities of data and often suffer from pathological behaviour when forecasting for longer time horizons. In this work, we use connections between deep neural networks and differential equations to design a family of deep network architectures for representing contact dynamics between objects. We show that these networks can learn discontinuous contact events in a data-efficient manner from noisy observations in settings that are traditionally difficult for black-box approaches and recent physics inspired neural networks. Our results indicate that an idealised form of touch feedback -- which is heavily relied upon by biological systems -- is a key component of making this learning problem tractable. Together with the inductive biases introduced through the network architectures, our techniques enable accurate learning of contact dynamics from observations.
    DisenHCN: Disentangled Hypergraph Convolutional Networks for Spatiotemporal Activity Prediction. (arXiv:2208.06794v1 [cs.LG])
    Spatiotemporal activity prediction, aiming to predict user activities at a specific location and time, is crucial for applications like urban planning and mobile advertising. Existing solutions based on tensor decomposition or graph embedding suffer from the following two major limitations: 1) ignoring the fine-grained similarities of user preferences; 2) user's modeling is entangled. In this work, we propose a hypergraph neural network model called DisenHCN to bridge the above gaps. In particular, we first unify the fine-grained user similarity and the complex matching between user preferences and spatiotemporal activity into a heterogeneous hypergraph. We then disentangle the user representations into different aspects (location-aware, time-aware, and activity-aware) and aggregate corresponding aspect's features on the constructed hypergraph, capturing high-order relations from different aspects and disentangles the impact of each aspect for final prediction. Extensive experiments show that our DisenHCN outperforms the state-of-the-art methods by 14.23% to 18.10% on four real-world datasets. Further studies also convincingly verify the rationality of each component in our DisenHCN.
    SNGuess: A method for the selection of young extragalactic transients. (arXiv:2208.06534v1 [astro-ph.IM])
    With a rapidly rising number of transients detected in astronomy, classification methods based on machine learning are increasingly being employed. Their goals are typically to obtain a definitive classification of transients, and for good performance they usually require the presence of a large set of observations. However, well-designed, targeted models can reach their classification goals with fewer computing resources. This paper presents SNGuess, a model designed to find young extragalactic nearby transients with high purity. SNGuess works with a set of features that can be efficiently calculated from astronomical alert data. Some of these features are static and associated with the alert metadata, while others must be calculated from the photometric observations contained in the alert. Most of the features are simple enough to be obtained or to be calculated already at the early stages in the lifetime of a transient after its detection. We calculate these features for a set of labeled public alert data obtained over a time span of 15 months from the Zwicky Transient Facility (ZTF). The core model of SNGuess consists of an ensemble of decision trees, which are trained via gradient boosting. Approximately 88% of the candidates suggested by SNGuess from a set of alerts from ZTF spanning from April 2020 to August 2021 were found to be true relevant supernovae (SNe). For alerts with bright detections, this number ranges between 92% and 98%. Since April 2020, transients identified by SNGuess as potential young SNe in the ZTF alert stream are being published to the Transient Name Server (TNS) under the AMPEL_ZTF_NEW group identifier. SNGuess scores for any transient observed by ZTF can be accessed via a web service. The source code of SNGuess is publicly available.
    Topological Data Analysis of Neural Network Layer Representations. (arXiv:2208.06438v1 [cs.LG])
    This paper is a cursory study on how topological features are preserved within the internal representations of neural network layers. Using techniques from topological data analysis, namely persistent homology, the topological features of a simple feedforward neural network's layer representations of a modified torus with a Klein bottle-like twist were computed. The network appeared to approximate homeomorphisms in early layers, before significantly changing the topology of the data in deeper layers. The resulting noise hampered the ability of persistent homology to compute these features, however similar topological features seemed to persist longer in a network with a bijective activation function.
    Models of Music Cognition and Composition. (arXiv:2208.06878v1 [cs.SD])
    Much like most of cognition research, music cognition is an interdisciplinary field, which attempts to apply methods of cognitive science (neurological, computational and experimental) to understand the perception and process of composition of music. In this paper, we first motivate why music is relevant to cognitive scientists and give an overview of the approaches to computational modelling of music cognition. We then review literature on the various models of music perception, including non-computational models, computational non-cognitive models and computational cognitive models. Lastly, we review literature on modelling the creative behaviour and on computer systems capable of composing music. Since a lot of technical terms from music theory have been used, we have appended a list of relevant terms and their definitions at the end.
    Revisiting Adversarial Attacks on Graph Neural Networks for Graph Classification. (arXiv:2208.06651v1 [cs.SI])
    Graph neural networks (GNNs) have achieved tremendous success in the task of graph classification and diverse downstream real-world applications. Despite their success, existing approaches are either limited to structure attacks or restricted to local information. This calls for a more general attack framework on graph classification, which faces significant challenges due to the complexity of generating local-node-level adversarial examples using the global-graph-level information. To address this "global-to-local" problem, we present a general framework CAMA to generate adversarial examples by manipulating graph structure and node features in a hierarchical style. Specifically, we make use of Graph Class Activation Mapping and its variant to produce node-level importance corresponding to the graph classification task. Then through a heuristic design of algorithms, we can perform both feature and structure attacks under unnoticeable perturbation budgets with the help of both node-level and subgraph-level importance. Experiments towards attacking four state-of-the-art graph classification models on six real-world benchmarks verify the flexibility and effectiveness of our framework.
    An Empirical Comparison of Explainable Artificial Intelligence Methods for Clinical Data: A Case Study on Traumatic Brain Injury. (arXiv:2208.06717v1 [cs.AI])
    A longstanding challenge surrounding deep learning algorithms is unpacking and understanding how they make their decisions. Explainable Artificial Intelligence (XAI) offers methods to provide explanations of internal functions of algorithms and reasons behind their decisions in ways that are interpretable and understandable to human users. . Numerous XAI approaches have been developed thus far, and a comparative analysis of these strategies seems necessary to discern their relevance to clinical prediction models. To this end, we first implemented two prediction models for short- and long-term outcomes of traumatic brain injury (TBI) utilizing structured tabular as well as time-series physiologic data, respectively. Six different interpretation techniques were used to describe both prediction models at the local and global levels. We then performed a critical analysis of merits and drawbacks of each strategy, highlighting the implications for researchers who are interested in applying these methodologies. The implemented methods were compared to one another in terms of several XAI characteristics such as understandability, fidelity, and stability. Our findings show that SHAP is the most stable with the highest fidelity but falls short of understandability. Anchors, on the other hand, is the most understandable approach, but it is only applicable to tabular data and not time series data.
    Smart caching in a Data Lake for High Energy Physics analysis. (arXiv:2208.06437v1 [cs.DC])
    The continuous growth of data production in almost all scientific areas raises new problems in data access and management, especially in a scenario where the end-users, as well as the resources that they can access, are worldwide distributed. This work is focused on the data caching management in a Data Lake infrastructure in the context of the High Energy Physics field. We are proposing an autonomous method, based on Reinforcement Learning techniques, to improve the user experience and to contain the maintenance costs of the infrastructure.
    TabText: a Systematic Approach to Aggregate Knowledge Across Tabular Data Structures. (arXiv:2206.10381v2 [cs.LG] UPDATED)
    Processing and analyzing tabular data in a productive and efficient way is essential for building successful applications of machine learning in fields such as healthcare. However, the lack of a unified framework for representing and standardizing tabular information poses a significant challenge to researchers and professionals alike. In this work, we present TabText, a methodology that leverages the unstructured data format of language to encode tabular data from different table structures and time periods efficiently and accurately. We show using two healthcare datasets and four prediction tasks that features extracted via TabText outperform those extracted with traditional processing methods by 2-5%. Furthermore, we analyze the sensitivity of our framework against different choices for sentence representations of missing values, meta information and language descriptiveness, and provide insights into winning strategies that improve performance.
    IRL with Partial Observations using the Principle of Uncertain Maximum Entropy. (arXiv:2208.06988v1 [cs.LG])
    The principle of maximum entropy is a broadly applicable technique for computing a distribution with the least amount of information possible while constrained to match empirically estimated feature expectations. However, in many real-world applications that use noisy sensors computing the feature expectations may be challenging due to partial observation of the relevant model variables. For example, a robot performing apprenticeship learning may lose sight of the agent it is learning from due to environmental occlusion. We show that in generalizing the principle of maximum entropy to these types of scenarios we unavoidably introduce a dependency on the learned model to the empirical feature expectations. We introduce the principle of uncertain maximum entropy and present an expectation-maximization based solution generalized from the principle of latent maximum entropy. Finally, we experimentally demonstrate the improved robustness to noisy data offered by our technique in a maximum causal entropy inverse reinforcement learning domain.
    Learning to Infer Counterfactuals: Meta-Learning for Estimating Multiple Imbalanced Treatment Effects. (arXiv:2208.06748v1 [cs.LG])
    We regularly consider answering counterfactual questions in practice, such as "Would people with diabetes take a turn for the better had they choose another medication?". Observational studies are growing in significance in answering such questions due to their widespread accumulation and comparatively easier acquisition than Randomized Control Trials (RCTs). Recently, some works have introduced representation learning and domain adaptation into counterfactual inference. However, most current works focus on the setting of binary treatments. None of them considers that different treatments' sample sizes are imbalanced, especially data examples in some treatment groups are relatively limited due to inherent user preference. In this paper, we design a new algorithmic framework for counterfactual inference, which brings an idea from Meta-learning for Estimating Individual Treatment Effects (MetaITE) to fill the above research gaps, especially considering multiple imbalanced treatments. Specifically, we regard data episodes among treatment groups in counterfactual inference as meta-learning tasks. We train a meta-learner from a set of source treatment groups with sufficient samples and update the model by gradient descent with limited samples in target treatment. Moreover, we introduce two complementary losses. One is the supervised loss on multiple source treatments. The other loss which aligns latent distributions among various treatment groups is proposed to reduce the discrepancy. We perform experiments on two real-world datasets to evaluate inference accuracy and generalization ability. Experimental results demonstrate that the model MetaITE matches/outperforms state-of-the-art methods.
    May the force be with you. (arXiv:2208.06676v1 [cs.LG])
    Modern methods in dimensionality reduction are dominated by nonlinear attraction-repulsion force-based methods (this includes t-SNE, UMAP, ForceAtlas2, LargeVis, and many more). The purpose of this paper is to demonstrate that all such methods, by design, come with an additional feature that is being automatically computed along the way, namely the vector field associated with these forces. We show how this vector field gives additional high-quality information and propose a general refinement strategy based on ideas from Morse theory. The efficiency of these ideas is illustrated specifically using t-SNE on synthetic and real-life data sets.
    ReCo: A Dataset for Residential Community Layout Planning. (arXiv:2206.04678v2 [cs.LG] UPDATED)
    Layout planning is centrally important in the field of architecture and urban design. Among the various basic units carrying urban functions, residential community plays a vital part for supporting human life. Therefore, the layout planning of residential community has always been of concern, and has attracted particular attention since the advent of deep learning that facilitates the automated layout generation and spatial pattern recognition. However, the research circles generally suffer from the insufficiency of residential community layout benchmark or high-quality datasets, which hampers the future exploration of data-driven methods for residential community layout planning. The lack of datasets is largely due to the difficulties of large-scale real-world residential data acquisition and long-term expert screening. In order to address the issues and advance a benchmark dataset for various intelligent spatial design and analysis applications in the development of smart city, we introduce Residential Community Layout Planning (ReCo) Dataset, which is the first and largest open-source vector dataset related to real-world community to date. ReCo Dataset is presented in multiple data formats with 37,646 residential community layout plans, covering 598,728 residential buildings with height information. ReCo can be conveniently adapted for residential community layout related urban design tasks, e.g., generative layout design, morphological pattern recognition and spatial evaluation. To validate the utility of ReCo in automated residential community layout planning, two Generative Adversarial Network (GAN) based generative models are further applied to the dataset. We expect ReCo Dataset to inspire more creative and practical work in intelligent design and beyond. The ReCo Dataset is published at: https://www.kaggle.com/fdudsde/reco-dataset.  ( 3 min )
    Fast Vocabulary Projection Method via Clustering for Multilingual Machine Translation on GPU. (arXiv:2208.06874v1 [cs.CL])
    Multilingual Neural Machine Translation has been showing great success using transformer models. Deploying these models is challenging because they usually require large vocabulary (vocab) sizes for various languages. This limits the speed of predicting the output tokens in the last vocab projection layer. To alleviate these challenges, this paper proposes a fast vocabulary projection method via clustering which can be used for multilingual transformers on GPUs. First, we offline split the vocab search space into disjoint clusters given the hidden context vector of the decoder output, which results in much smaller vocab columns for vocab projection. Second, at inference time, the proposed method predicts the clusters and candidate active tokens for hidden context vectors at the vocab projection. This paper also includes analysis of different ways of building these clusters in multilingual settings. Our results show end-to-end speed gains in float16 GPU inference up to 25% while maintaining the BLEU score and slightly increasing memory cost. The proposed method speeds up the vocab projection step itself by up to 2.6x. We also conduct an extensive human evaluation to verify the proposed method preserves the quality of the translations from the original model.
    Multinomial Logistic Regression Algorithms via Quadratic Gradient. (arXiv:2208.06828v1 [cs.LG])
    Multinomial logistic regression, also known by other names such as multiclass logistic regression and softmax regression, is a fundamental classification method that generalizes binary logistic regression to multiclass problems. A recently work proposed a faster gradient called $\texttt{quadratic gradient}$ that can accelerate the binary logistic regression training, and presented an enhanced Nesterov's accelerated gradient (NAG) method for binary logistic regression. In this paper, we extend this work to multiclass logistic regression and propose an enhanced Adaptive Gradient Algorithm (Adagrad) that can accelerate the original Adagrad method. We test the enhanced NAG method and the enhanced Adagrad method on some multiclass-problem datasets. Experimental results show that both enhanced methods converge faster than their original ones respectively.  ( 2 min )
    Reverse Engineering the Neural Tangent Kernel. (arXiv:2106.03186v4 [cs.LG] UPDATED)
    The development of methods to guide the design of neural networks is an important open challenge for deep learning theory. As a paradigm for principled neural architecture design, we propose the translation of high-performing kernels, which are better-understood and amenable to first-principles design, into equivalent network architectures, which have superior efficiency, flexibility, and feature learning. To this end, we constructively prove that, with just an appropriate choice of activation function, any positive-semidefinite dot-product kernel can be realized as either the NNGP or neural tangent kernel of a fully-connected neural network with only one hidden layer. We verify our construction numerically and demonstrate its utility as a design tool for finite fully-connected networks in several experiments.  ( 2 min )
    Equivariant Finite Normalizing Flows. (arXiv:2110.08649v2 [cs.LG] UPDATED)
    Generative modeling seeks to uncover the underlying factors that give rise to observed data that can often be modeled as the natural symmetries that manifest themselves through invariances and equivariances to certain transformation laws. However, current approaches to representing these symmetries are couched in the formalism of continuous normalizing flows that require the construction of equivariant vector fields -- inhibiting their simple application to conventional higher dimensional generative modelling domains like natural images. In this paper, we focus on building equivariant normalizing flows using discrete layers. We first theoretically prove the existence of an equivariant map for compact groups whose actions are on compact spaces. We further introduce three new equivariant flows: $G$-Residual Flows, $G$-Coupling Flows, and $G$-Inverse Autoregressive Flows that elevate classical Residual, Coupling, and Inverse Autoregressive Flows with equivariant maps to a prescribed group $G$. Our construction of $G$-Residual Flows are also universal, in the sense that we prove an $G$-equivariant diffeomorphism can be exactly mapped by a $G$-residual flow. Finally, we complement our theoretical insights with demonstrative experiments -- for the first time -- on image datasets like CIFAR-10 and show $G$-Equivariant Finite Normalizing flows lead to increased data efficiency, faster convergence, and improved likelihood estimates.  ( 3 min )
    Deep Neural Network Approximation For H\"older Functions. (arXiv:2201.03747v2 [cs.LG] UPDATED)
    In this work, we explore the approximation capability of deep Rectified Quadratic Unit neural networks for H\"older-regular functions, with respect to the uniform norm. We find that theoretical approximation heavily depends on the selected activation function in the neural network.
    Sharp asymptotics on the compression of two-layer neural networks. (arXiv:2205.08199v3 [cs.IT] UPDATED)
    In this paper, we study the compression of a target two-layer neural network with N nodes into a compressed network with M<N nodes. More precisely, we consider the setting in which the weights of the target network are i.i.d. sub-Gaussian, and we minimize the population L_2 loss between the outputs of the target and of the compressed network, under the assumption of Gaussian inputs. By using tools from high-dimensional probability, we show that this non-convex problem can be simplified when the target network is sufficiently over-parameterized, and provide the error rate of this approximation as a function of the input dimension and N. In this mean-field limit, the simplified objective, as well as the optimal weights of the compressed network, does not depend on the realization of the target network, but only on expected scaling factors. Furthermore, for networks with ReLU activation, we conjecture that the optimum of the simplified optimization problem is achieved by taking weights on the Equiangular Tight Frame (ETF), while the scaling of the weights and the orientation of the ETF depend on the parameters of the target network. Numerical evidence is provided to support this conjecture.  ( 3 min )
    Quantum Boosting using Domain-Partitioning Hypotheses. (arXiv:2110.12793v3 [quant-ph] UPDATED)
    Boosting is an ensemble learning method that converts a weak learner into a strong learner in the PAC learning framework. Freund and Schapire designed the Godel prize-winning algorithm named AdaBoost that can boost learners, which output binary hypotheses. Recently, Arunachalam and Maity presented the first quantum boosting algorithm with similar theoretical guarantees. Their algorithm, which we refer to as QAdaBoost henceforth, is a quantum adaptation of AdaBoost and only works for the binary hypothesis case. QAdaBoost is quadratically faster than AdaBoost in terms of the VC-dimension of the hypothesis class of the weak learner but polynomially worse in the bias of the weak learner. Izdebski et al. posed an open question on whether we can boost quantum weak learners that output non-binary hypothesis. In this work, we address this open question by developing the QRealBoost algorithm which was motivated by the classical RealBoost algorithm. The main technical challenge was to provide provable guarantees for convergence, generalization bounds, and quantum speedup, given that quantum subroutines are noisy and probabilistic. We prove that QRealBoost retains the quadratic speedup of QAdaBoost over AdaBoost and further achieves a polynomial speedup over QAdaBoost in terms of both the bias of the learner and the time taken by the learner to learn the target concept class. Finally, we perform empirical evaluations on QRealBoost and report encouraging observations on quantum simulators by benchmarking the convergence performance of QRealBoost against QAdaBoost, AdaBoost, and RealBoost on a subset of the MNIST dataset and Breast Cancer Wisconsin dataset.  ( 3 min )
    NURD: Negative-Unlabeled Learning for Online Datacenter Straggler Prediction. (arXiv:2203.08339v2 [cs.LG] UPDATED)
    Datacenters execute large computational jobs, which are composed of smaller tasks. A job completes when all its tasks finish, so stragglers -- rare, yet extremely slow tasks -- are a major impediment to datacenter performance. Accurately predicting stragglers would enable proactive intervention, allowing datacenter operators to mitigate stragglers before they delay a job. While much prior work applies machine learning to predict computer system performance, these approaches rely on complete labels -- i.e., sufficient examples of all possible behaviors, including straggling and non-straggling -- or strong assumptions about the underlying latency distributions -- e.g., whether Gaussian or not. Within a running job, however, none of this information is available until stragglers have revealed themselves when they have already delayed the job. To predict stragglers accurately and early without labeled positive examples or assumptions on latency distributions, this paper presents NURD, a novel Negative-Unlabeled learning approach with Reweighting and Distribution-compensation that only trains on negative and unlabeled streaming data. The key idea is to train a predictor using finished tasks of non-stragglers to predict latency for unlabeled running tasks, and then reweight each unlabeled task's prediction based on a weighting function of its feature space. We evaluate NURD on two production traces from Google and Alibaba, and find that compared to the best baseline approach, NURD produces 2--11 percentage point increases in the F1 score in terms of prediction accuracy, and 2.0--8.8 percentage point improvements in job completion time.
    GEDI: A Graph-based End-to-end Data Imputation Framework. (arXiv:2208.06573v1 [cs.LG])
    Data imputation is an effective way to handle missing data, which is common in practical applications. In this study, we propose and test a novel data imputation process that achieve two important goals: (1) preserve the row-wise similarities among observations and column-wise contextual relationships among features in the feature matrix, and (2) tailor the imputation process to specific downstream label prediction task. The proposed imputation process uses Transformer network and graph structure learning to iteratively refine the contextual relationships among features and similarities among observations. Moreover, it uses a meta-learning framework to select features that are influential to the downstream prediction task of interest. We conduct experiments on real-world large data sets, and show that the proposed imputation process consistently improves imputation and label prediction performance over a variety of benchmark methods.  ( 2 min )
    Sequence-based deep learning antibody design for in silico antibody affinity maturation. (arXiv:2103.03724v2 [q-bio.BM] UPDATED)
    Antibody therapeutics has been extensively studied in drug discovery and development within the past decades. One increasingly popular focus in the antibody discovery pipeline is the optimization step for therapeutic leads. Both traditional methods and in silico approaches aim to generate candidates with high binding affinity against specific target antigens. Traditional in vitro approaches use hybridoma or phage display for candidate selection, and surface plasmon resonance (SPR) for evaluation, while in silico computational approaches aim to reduce the high cost and improve efficiency by incorporating mathematical algorithms and computational processing power in the design process. In the present study, we investigated different graph-based designs for depicting antibody-antigen interactions in terms of antibody affinity prediction using deep learning techniques. While other in silico computations require experimentally determined crystal structures, our study took interest in the capability of sequence-based models for in silico antibody maturation. Our preliminary studies achieved satisfying prediction accuracy on binding affinities comparing to conventional approaches and other deep learning approaches. To further study the antibody-antigen binding specificity, and to simulate the optimization process in real-world scenario, we introduced pairwise prediction strategy. We performed analysis based on both baseline and pairwise prediction results. The resulting prediction and efficiency prove the feasibility and computational efficiency of sequence-based method to be adapted as a scalable industry practice.  ( 3 min )
    An Analytic Framework for Robust Training of Artificial Neural Networks. (arXiv:2205.13502v2 [cs.LG] UPDATED)
    The reliability of a learning model is key to the successful deployment of machine learning in various industries. Creating a robust model, particularly one unaffected by adversarial attacks, requires a comprehensive understanding of the adversarial examples phenomenon. However, it is difficult to describe the phenomenon due to the complicated nature of the problems in machine learning. Consequently, many studies investigate the phenomenon by proposing a simplified model of how adversarial examples occur and validate it by predicting some aspect of the phenomenon. While these studies cover many different characteristics of the adversarial examples, they have not reached a holistic approach to the geometric and analytic modeling of the phenomenon. This paper propose a formal framework to study the phenomenon in learning theory and make use of complex analysis and holomorphicity to offer a robust learning rule for artificial neural networks. With the help of complex analysis, we can effortlessly move between geometric and analytic perspectives of the phenomenon and offer further insights on the phenomenon by revealing its connection with harmonic functions. Using our model, we can explain some of the most intriguing characteristics of adversarial examples, including transferability of adversarial examples, and pave the way for novel approaches to mitigate the effects of the phenomenon.  ( 3 min )
    Embedding Principle in Depth for the Loss Landscape Analysis of Deep Neural Networks. (arXiv:2205.13283v2 [cs.LG] UPDATED)
    Understanding the relation between deep and shallow neural networks is extremely important for the theoretical study of deep learning. In this work, we discover an embedding principle in depth that loss landscape of an NN "contains" all critical points of the loss landscapes for shallower NNs. The key tool for our discovery is the critical lifting operator proposed in this work that maps any critical point of a network to critical manifolds of any deeper network while preserving the outputs. This principle provides new insights to many widely observed behaviors of DNNs. Regarding the easy training of deep networks, we show that local minimum of an NN can be lifted to strict saddle points of a deeper NN. Regarding the acceleration effect of batch normalization, we demonstrate that batch normalization helps avoid the critical manifolds lifted from shallower NNs by suppressing layer linearization. We also prove that increasing training data shrinks the lifted critical manifolds, which can result in acceleration of training as demonstrated in experiments. Overall, our discovery of the embedding principle in depth uncovers the depth-wise hierarchical structure of deep learning loss landscape, which serves as a solid foundation for the further study about the role of depth for DNNs.  ( 3 min )
    TL;DW? Summarizing Instructional Videos with Task Relevance & Cross-Modal Saliency. (arXiv:2208.06773v1 [cs.CV])
    YouTube users looking for instructions for a specific task may spend a long time browsing content trying to find the right video that matches their needs. Creating a visual summary (abridged version of a video) provides viewers with a quick overview and massively reduces search time. In this work, we focus on summarizing instructional videos, an under-explored area of video summarization. In comparison to generic videos, instructional videos can be parsed into semantically meaningful segments that correspond to important steps of the demonstrated task. Existing video summarization datasets rely on manual frame-level annotations, making them subjective and limited in size. To overcome this, we first automatically generate pseudo summaries for a corpus of instructional videos by exploiting two key assumptions: (i) relevant steps are likely to appear in multiple videos of the same task (Task Relevance), and (ii) they are more likely to be described by the demonstrator verbally (Cross-Modal Saliency). We propose an instructional video summarization network that combines a context-aware temporal video encoder and a segment scoring transformer. Using pseudo summaries as weak supervision, our network constructs a visual summary for an instructional video given only video and transcribed speech. To evaluate our model, we collect a high-quality test set, WikiHow Summaries, by scraping WikiHow articles that contain video demonstrations and visual depictions of steps allowing us to obtain the ground-truth summaries. We outperform several baselines and a state-of-the-art video summarization model on this new benchmark.  ( 3 min )
    CANF-VC: Conditional Augmented Normalizing Flows for Video Compression. (arXiv:2207.05315v3 [cs.CV] UPDATED)
    This paper presents an end-to-end learning-based video compression system, termed CANF-VC, based on conditional augmented normalizing flows (CANF). Most learned video compression systems adopt the same hybrid-based coding architecture as the traditional codecs. Recent research on conditional coding has shown the sub-optimality of the hybrid-based coding and opens up opportunities for deep generative models to take a key role in creating new coding frameworks. CANF-VC represents a new attempt that leverages the conditional ANF to learn a video generative model for conditional inter-frame coding. We choose ANF because it is a special type of generative model, which includes variational autoencoder as a special case and is able to achieve better expressiveness. CANF-VC also extends the idea of conditional coding to motion coding, forming a purely conditional coding framework. Extensive experimental results on commonly used datasets confirm the superiority of CANF-VC to the state-of-the-art methods. The source code of CANF-VC is available at https://github.com/NYCU-MAPL/CANF-VC.  ( 2 min )
    Determining HEDP Foams' Quality with Multi-View Deep Learning Classification. (arXiv:2208.07196v1 [cs.CV])
    High energy density physics (HEDP) experiments commonly involve a dynamic wave-front propagating inside a low-density foam. This effect affects its density and hence, its transparency. A common problem in foam production is the creation of defective foams. Accurate information on their dimension and homogeneity is required to classify the foams' quality. Therefore, those parameters are being characterized using a 3D-measuring laser confocal microscope. For each foam, five images are taken: two 2D images representing the top and bottom surface foam planes and three images of side cross-sections from 3D scannings. An expert has to do the complicated, harsh, and exhausting work of manually classifying the foam's quality through the image set and only then determine whether the foam can be used in experiments or not. Currently, quality has two binary levels of normal vs. defective. At the same time, experts are commonly required to classify a sub-class of normal-defective, i.e., foams that are defective but might be sufficient for the needed experiment. This sub-class is problematic due to inconclusive judgment that is primarily intuitive. In this work, we present a novel state-of-the-art multi-view deep learning classification model that mimics the physicist's perspective by automatically determining the foams' quality classification and thus aids the expert. Our model achieved 86\% accuracy on upper and lower surface foam planes and 82\% on the entire set, suggesting interesting heuristics to the problem. A significant added value in this work is the ability to regress the foam quality instead of binary deduction and even explain the decision visually. The source code used in this work, as well as other relevant sources, are available at: https://github.com/Scientific-Computing-Lab-NRCN/Multi-View-Foams.git  ( 3 min )
    On the Effect of Dropping Layers of Pre-trained Transformer Models. (arXiv:2004.03844v3 [cs.CL] UPDATED)
    Transformer-based NLP models are trained using hundreds of millions or even billions of parameters, limiting their applicability in computationally constrained environments. While the number of parameters generally correlates with performance, it is not clear whether the entire network is required for a downstream task. Motivated by the recent work on pruning and distilling pre-trained models, we explore strategies to drop layers in pre-trained models, and observe the effect of pruning on downstream GLUE tasks. We were able to prune BERT, RoBERTa and XLNet models up to 40%, while maintaining up to 98% of their original performance. Additionally we show that our pruned models are on par with those built using knowledge distillation, both in terms of size and performance. Our experiments yield interesting observations such as, (i) the lower layers are most critical to maintain downstream task performance, (ii) some tasks such as paraphrase detection and sentence similarity are more robust to the dropping of layers, and (iii) models trained using a different objective function exhibit different learning patterns and w.r.t the layer dropping.  ( 3 min )
    How Does Data Freshness Affect Real-time Supervised Learning?. (arXiv:2208.06948v1 [cs.NI])
    In this paper, we analyze the impact of data freshness on real-time supervised learning, where a neural network is trained to infer a time-varying target (e.g., the position of the vehicle in front) based on features (e.g., video frames) observed at a sensing node (e.g., camera or lidar). One might expect that the performance of real-time supervised learning degrades monotonically as the feature becomes stale. Using an information-theoretic analysis, we show that this is true if the feature and target data sequence can be closely approximated as a Markov chain; it is not true if the data sequence is far from Markovian. Hence, the prediction error of real-time supervised learning is a function of the Age of Information (AoI), where the function could be non-monotonic. Several experiments are conducted to illustrate the monotonic and non-monotonic behaviors of the prediction error. To minimize the inference error in real-time, we propose a new "selection-from-buffer" model for sending the features, which is more general than the "generate-at-will" model used in earlier studies. By using Gittins and Whittle indices, low-complexity scheduling strategies are developed to minimize the inference error, where a new connection between the Gittins index theory and Age of Information (AoI) minimization is discovered. These scheduling results hold (i) for minimizing general AoI functions (monotonic or non-monotonic) and (ii) for general feature transmission time distributions. Data-driven evaluations are presented to illustrate the benefits of the proposed scheduling algorithms.  ( 3 min )
    Combating Label Distribution Shift for Active Domain Adaptation. (arXiv:2208.06604v1 [cs.LG])
    We consider the problem of active domain adaptation (ADA) to unlabeled target data, of which subset is actively selected and labeled given a budget constraint. Inspired by recent analysis on a critical issue from label distribution mismatch between source and target in domain adaptation, we devise a method that addresses the issue for the first time in ADA. At its heart lies a novel sampling strategy, which seeks target data that best approximate the entire target distribution as well as being representative, diverse, and uncertain. The sampled target data are then used not only for supervised learning but also for matching label distributions of source and target domains, leading to remarkable performance improvement. On four public benchmarks, our method substantially outperforms existing methods in every adaptation scenario.  ( 2 min )
    DendroMap: Visual Exploration of Large-Scale Image Datasets for Machine Learning with Treemaps. (arXiv:2205.06935v2 [cs.HC] UPDATED)
    In this paper, we present DendroMap, a novel approach to interactively exploring large-scale image datasets for machine learning (ML). ML practitioners often explore image datasets by generating a grid of images or projecting high-dimensional representations of images into 2-D using dimensionality reduction techniques (e.g., t-SNE). However, neither approach effectively scales to large datasets because images are ineffectively organized and interactions are insufficiently supported. To address these challenges, we develop DendroMap by adapting Treemaps, a well-known visualization technique. DendroMap effectively organizes images by extracting hierarchical cluster structures from high-dimensional representations of images. It enables users to make sense of the overall distributions of datasets and interactively zoom into specific areas of interests at multiple levels of abstraction. Our case studies with widely-used image datasets for deep learning demonstrate that users can discover insights about datasets and trained models by examining the diversity of images, identifying underperforming subgroups, and analyzing classification errors. We conducted a user study that evaluates the effectiveness of DendroMap in grouping and searching tasks by comparing it with a gridified version of t-SNE and found that participants preferred DendroMap. DendroMap is available at https://div-lab.github.io/dendromap/.  ( 3 min )
    RuDi: Explaining Behavior Sequence Models by Automatic Statistics Generation and Rule Distillation. (arXiv:2208.07211v1 [cs.LG])
    Risk scoring systems have been widely deployed in many applications, which assign risk scores to users according to their behavior sequences. Though many deep learning methods with sophisticated designs have achieved promising results, the black-box nature hinders their applications due to fairness, explainability, and compliance consideration. Rule-based systems are considered reliable in these sensitive scenarios. However, building a rule system is labor-intensive. Experts need to find informative statistics from user behavior sequences, design rules based on statistics and assign weights to each rule. In this paper, we bridge the gap between effective but black-box models and transparent rule models. We propose a two-stage method, RuDi, that distills the knowledge of black-box teacher models into rule-based student models. We design a Monte Carlo tree search-based statistics generation method that can provide a set of informative statistics in the first stage. Then statistics are composed into logical rules with our proposed neural logical networks by mimicking the outputs of teacher models. We evaluate RuDi on three real-world public datasets and an industrial dataset to demonstrate its effectiveness.  ( 2 min )
    ARIEL: Adversarial Graph Contrastive Learning. (arXiv:2208.06956v1 [cs.LG])
    Contrastive learning is an effective unsupervised method in graph representation learning, and the key component of contrastive learning lies in the construction of positive and negative samples. Previous methods usually utilize the proximity of nodes in the graph as the principle. Recently, the data augmentation based contrastive learning method has advanced to show great power in the visual domain, and some works extended this method from images to graphs. However, unlike the data augmentation on images, the data augmentation on graphs is far less intuitive and much harder to provide high-quality contrastive samples, which leaves much space for improvement. In this work, by introducing an adversarial graph view for data augmentation, we propose a simple but effective method, Adversarial Graph Contrastive Learning (ARIEL), to extract informative contrastive samples within reasonable constraints. We develop a new technique called information regularization for stable training and use subgraph sampling for scalability. We generalize our method from node-level contrastive learning to the graph-level by treating each graph instance as a supernode. ARIEL consistently outperforms the current graph contrastive learning methods for both node-level and graph-level classification tasks on real-world datasets. We further demonstrate that ARIEL is more robust in face of adversarial attacks.  ( 2 min )
    Predicting skull fractures via CNN with classification algorithms. (arXiv:2208.06756v1 [cs.CV])
    Computer Tomography (CT) images have become quite important to diagnose diseases. CT scan slice contains a vast amount of data that may not be properly examined with the requisite precision and speed using normal visual inspection. A computer-assisted skull fracture classification expert system is needed to assist physicians. Convolutional Neural Networks (CNNs) are the most extensively used deep learning models for image categorization since most often time they outperform other models in terms of accuracy and results. The CNN models were then developed and tested, and several convolutional neural network (CNN) architectures were compared. ResNet50, which was used for feature extraction combined with a gradient boosted decision tree machine learning algorithm to act as a classifier for the categorization of skull fractures from brain CT scans into three fracture categories, had the best overall F1-score of 96%, Hamming Score of 95%, Balanced accuracy Score of 94% & ROC AUC curve of 96% for the classification of skull fractures.  ( 2 min )
    InvisibiliTee: Angle-agnostic Cloaking from Person-Tracking Systems with a Tee. (arXiv:2208.06962v1 [cs.CV])
    After a survey for person-tracking system-induced privacy concerns, we propose a black-box adversarial attack method on state-of-the-art human detection models called InvisibiliTee. The method learns printable adversarial patterns for T-shirts that cloak wearers in the physical world in front of person-tracking systems. We design an angle-agnostic learning scheme which utilizes segmentation of the fashion dataset and a geometric warping process so the adversarial patterns generated are effective in fooling person detectors from all camera angles and for unseen black-box detection models. Empirical results in both digital and physical environments show that with the InvisibiliTee on, person-tracking systems' ability to detect the wearer drops significantly.  ( 2 min )
    Machine Learning Based Radiomics for Glial Tumor Classification and Comparison with Volumetric Analysis. (arXiv:2208.06739v1 [eess.IV])
    Purpose; The purpose of this study is to classify glial tumors into grade II, III and IV categories noninvasively by application of machine learning to multi-modal MRI features in comparison with volumetric analysis. Methods; We retrospectively studied 57 glioma patients with pre and postcontrast T1 weighted, T2 weighted, FLAIR images, and ADC maps acquired on a 3T MRI. The tumors were segmented into enhancing and nonenhancing portions, tumor necrosis, cyst and edema using semiautomated segmentation of ITK-SNAP open source tool. We measured total tumor volume, enhancing-nonenhancing tumor, edema, necrosis volume and the ratios to the total tumor volume. Training of a support vector machine (SVM) classifier and artificial neural network (ANN) was performed with labeled data designed to answer the question of interest. Specificity, sensitivity, and AUC of the predictions were computed by means of ROC analysis. Differences in continuous measures between groups were assessed by using Kruskall Wallis, with post hoc Dunn correction for multiple comparisons. Results; When we compared the volume ratios between groups, there was statistically significant difference between grade IV and grade II-III glial tumors. Edema and tumor necrosis volume ratios for grade IV glial tumors were higher than that of grade II and III. Volumetric ratio analysis could not distinguish grade II and III tumors successfully. However, SVM and ANN correctly classified each group with accuracies up to 98% and 96%. Conclusion; Application of machine learning methods to MRI features can be used to classify brain tumors noninvasively and more readily in clinical settings.  ( 3 min )
    Inference for BART with Multinomial Outcomes. (arXiv:2101.06823v2 [stat.ME] UPDATED)
    The multinomial probit Bayesian additive regression trees (MPBART) framework was proposed by Kindo et al. (KD), approximating the latent utilities in the multinomial probit (MNP) model with BART (Chipman et al. 2010). Compared to multinomial logistic models, MNP does not assume independent alternatives and the correlation structure among alternatives can be specified through multivariate Gaussian distributed latent utilities. We introduce two new algorithms for fitting the MPBART and show that the theoretical mixing rates of our proposals are equal or superior to the existing algorithm in KD. Through simulations, we explore the robustness of the methods to the choice of reference level, imbalance in outcome frequencies, and the specifications of prior hyperparameters for the utility error term. The work is motivated by the application of generating posterior predictive distributions for mortality and engagement in care among HIV-positive patients based on electronic health records (EHRs) from the Academic Model Providing Access to Healthcare (AMPATH) in Kenya. In both the application and simulations, we observe better performance using our proposals as compared to KD in terms of MCMC convergence rate and posterior predictive accuracy.  ( 2 min )
    Three-Player Game Training Dynamics. (arXiv:2208.06531v1 [cs.LG])
    This work explores three-player game training dynamics, under what conditions three-player games converge and the equilibria the converge on. In contrast to prior work, we examine a three-player game architecture in which all players explicitly interact with each other. Prior work analyzes games in which two of three agents interact with only one other player, constituting dual two-player games. We explore three-player game training dynamics using an extended version of a simplified bilinear smooth game, called a simplified trilinear smooth game. We find that trilinear games do not converge on the Nash equilibrium in most cases, rather converging on a fixed point which is optimal for two players, but not for the third. Further, we explore how the order of the updates influences convergence. In addition to alternating and simultaneous updates, we explore a new update order--maximizer-first--which is only possible in a three-player game. We find that three-player games can converge on a Nash equilibrium using maximizer-first updates. Finally, we experiment with differing momentum values for each player in a trilinear smooth game under all three update orders and show that maximizer-first updates achieve more optimal results in a larger set of player-specific momentum value triads than other update orders.  ( 2 min )
  • Open

    RG-Flow: A hierarchical and explainable flow model based on renormalization group and sparse prior. (arXiv:2010.00029v5 [cs.LG] UPDATED)
    Flow-based generative models have become an important class of unsupervised learning approaches. In this work, we incorporate the key ideas of renormalization group (RG) and sparse prior distribution to design a hierarchical flow-based generative model, RG-Flow, which can separate information at different scales of images and extract disentangled representations at each scale. We demonstrate our method on synthetic multi-scale image datasets and the CelebA dataset, showing that the disentangled representations enable semantic manipulation and style mixing of the images at different scales. To visualize the latent representations, we introduce receptive fields for flow-based models and show that the receptive fields of RG-Flow are similar to those of convolutional neural networks. In addition, we replace the widely adopted isotropic Gaussian prior distribution by the sparse Laplacian distribution to further enhance the disentanglement of representations. From a theoretical perspective, our proposed method has $O(\log L)$ complexity for inpainting of an image with edge length $L$, compared to previous generative models with $O(L^2)$ complexity.
    Predictive Data Calibration for Linear Correlation Significance Testing. (arXiv:2208.07081v1 [stat.ME])
    Inferring linear relationships lies at the heart of many empirical investigations. A measure of linear dependence should correctly evaluate the strength of the relationship as well as qualify whether it is meaningful for the population. Pearson's correlation coefficient (PCC), the \textit{de-facto} measure for bivariate relationships, is known to lack in both regards. The estimated strength $r$ maybe wrong due to limited sample size, and nonnormality of data. In the context of statistical significance testing, erroneous interpretation of a $p$-value as posterior probability leads to Type I errors -- a general issue with significance testing that extends to PCC. Such errors are exacerbated when testing multiple hypotheses simultaneously. To tackle these issues, we propose a machine-learning-based predictive data calibration method which essentially conditions the data samples on the expected linear relationship. Calculating PCC using calibrated data yields a calibrated $p$-value that can be interpreted as posterior probability together with a calibrated $r$ estimate, a desired outcome not provided by other methods. Furthermore, the ensuing independent interpretation of each test might eliminate the need for multiple testing correction. We provide empirical evidence favouring the proposed method using several simulations and application to real-world data.
    Accelerated and instance-optimal policy evaluation with linear function approximation. (arXiv:2112.13109v2 [stat.ML] UPDATED)
    We study the problem of policy evaluation with linear function approximation and present efficient and practical algorithms that come with strong optimality guarantees. We begin by proving lower bounds that establish baselines on both the deterministic error and stochastic error in this problem. In particular, we prove an oracle complexity lower bound on the deterministic error in an instance-dependent norm associated with the stationary distribution of the transition kernel, and use the local asymptotic minimax machinery to prove an instance-dependent lower bound on the stochastic error in the i.i.d. observation model. Existing algorithms fail to match at least one of these lower bounds: To illustrate, we analyze a variance-reduced variant of temporal difference learning, showing in particular that it fails to achieve the oracle complexity lower bound. To remedy this issue, we develop an accelerated, variance-reduced fast temporal difference algorithm (VRFTD) that simultaneously matches both lower bounds and attains a strong notion of instance-optimality. Finally, we extend the VRFTD algorithm to the setting with Markovian observations, and provide instance-dependent convergence results. Our theoretical guarantees of optimality are corroborated by numerical experiments.
    Grasping Core Rules of Time Series through Pure Models. (arXiv:2208.07105v1 [cs.LG])
    Time series underwent the transition from statistics to deep learning, as did many other machine learning fields. Although it appears that the accuracy has been increasing as the model is updated in a number of publicly available datasets, it typically only increases the scale by several times in exchange for a slight difference in accuracy. Through this experiment, we point out a different line of thinking, time series, especially long-term forecasting, may differ from other fields. It is not necessary to use extensive and complex models to grasp all aspects of time series, but to use pure models to grasp the core rules of time series changes. With this simple but effective idea, we created PureTS, a network with three pure linear layers that achieved state-of-the-art in 80% of the long sequence prediction tasks while being nearly the lightest model and having the fastest running speed. On this basis, we discuss the potential of pure linear layers in both phenomena and essence. The ability to understand the core law contributes to the high precision of long-distance prediction, and reasonable fluctuation prevents it from distorting the curve in multi-step prediction like mainstream deep learning models, which is summarized as a pure linear neural network that avoids over-fluctuating. Finally, we suggest the fundamental design standards for lightweight long-step time series tasks: input and output should try to have the same dimension, and the structure avoids fragmentation and complex operations.
    Applying Regularized Schr\"odinger-Bridge-Based Stochastic Process in Generative Modeling. (arXiv:2208.07131v1 [cs.LG])
    Compared to the existing function-based models in deep generative modeling, the recently proposed diffusion models have achieved outstanding performance with a stochastic-process-based approach. But a long sampling time is required for this approach due to many timesteps for discretization. Schr\"odinger bridge (SB)-based models attempt to tackle this problem by training bidirectional stochastic processes between distributions. However, they still have a slow sampling speed compared to generative models such as generative adversarial networks. And due to the training of the bidirectional stochastic processes, they require a relatively long training time. Therefore, this study tried to reduce the number of timesteps and training time required and proposed regularization terms to the existing SB models to make the bidirectional stochastic processes consistent and stable with a reduced number of timesteps. Each regularization term was integrated into a single term to enable more efficient training in computation time and memory usage. Applying this regularized stochastic process to various generation tasks, the desired translations between different distributions were obtained, and accordingly, the possibility of generative modeling based on a stochastic process with faster sampling speed could be confirmed. The code is available at https://github.com/KiUngSong/RSB.
    Selective Inference for Sparse Multitask Regression with Applications in Neuroimaging. (arXiv:2205.14220v2 [stat.ME] UPDATED)
    Multi-task learning is frequently used to model a set of related response variables from the same set of features, improving predictive performance and modeling accuracy relative to methods that handle each response variable separately. Despite the potential of multi-task learning to yield more powerful inference than single-task alternatives, prior work in this area has largely omitted uncertainty quantification. Our focus in this paper is a common multi-task problem in neuroimaging, where the goal is to understand the relationship between multiple cognitive task scores (or other subject-level assessments) and brain connectome data collected from imaging. We propose a framework for selective inference to address this problem, with the flexibility to: (i) jointly identify the relevant covariates for each task through a sparsity-inducing penalty, and (ii) conduct valid inference in a model based on the estimated sparsity structure. Our framework offers a new conditional procedure for inference, based on a refinement of the selection event that yields a tractable selection-adjusted likelihood. This gives an approximate system of estimating equations for maximum likelihood inference, solvable via a single convex optimization problem, and enables us to efficiently form confidence intervals with approximately the correct coverage. Applied to both simulated data and data from the Adolescent Cognitive Brain Development (ABCD) study, our selective inference methods yield tighter confidence intervals than commonly used alternatives, such as data splitting. We also demonstrate through simulations that multi-task learning with selective inference can more accurately recover true signals than single-task methods.
    Graph Neural Networks as Gradient Flows. (arXiv:2206.10991v2 [cs.LG] UPDATED)
    Dynamical systems minimizing an energy are ubiquitous in geometry and physics. We propose a novel framework for GNNs where we parametrize (and {\em learn}) an energy functional and then take the GNN equations to be the gradient flow of such energy. This approach allows to analyse the GNN evolution from a multi-particle perspective as learning attractive and repulsive forces in feature space via the positive and negative eigenvalues of a symmetric `channel-mixing' matrix. We conduct spectral analysis of the solutions and provide a better understanding of the role of the channel-mixing in (residual) graph convolutional models and of its ability to steer the diffusion away from over-smoothing. We perform thorough ablation studies corroborating our theory and show competitive performance of simple models on homophilic and heterophilic datasets.  ( 2 min )
    Towards Theoretical Understandings of Robust Markov Decision Processes: Sample Complexity and Asymptotics. (arXiv:2105.03863v3 [stat.ML] UPDATED)
    In this paper, we study the non-asymptotic and asymptotic performances of the optimal robust policy and value function of robust Markov Decision Processes(MDPs), where the optimal robust policy and value function are solved only from a generative model. While prior work focusing on non-asymptotic performances of robust MDPs is restricted in the setting of the KL uncertainty set and $(s,a)$-rectangular assumption, we improve their results and also consider other uncertainty sets, including $L_1$ and $\chi^2$ balls. Our results show that when we assume $(s,a)$-rectangular on uncertainty sets, the sample complexity is about $\widetilde{O}\left(\frac{|\mathcal{S}|^2|\mathcal{A}|}{\varepsilon^2\rho^2(1-\gamma)^4}\right)$. In addition, we extend our results from $(s,a)$-rectangular assumption to $s$-rectangular assumption. In this scenario, the sample complexity varies with the choice of uncertainty sets and is generally larger than the case under $(s,a)$-rectangular assumption. Moreover, we also show that the optimal robust value function is asymptotic normal with a typical rate $\sqrt{n}$ under $(s,a)$ and $s$-rectangular assumptions from both theoretical and empirical perspectives.
    Conformalized Online Learning: Online Calibration Without a Holdout Set. (arXiv:2205.09095v3 [cs.LG] UPDATED)
    We develop a framework for constructing uncertainty sets with a valid coverage guarantee in an online setting, in which the underlying data distribution can drastically -- and even adversarially -- shift over time. The technique we propose is highly flexible as it can be integrated with any online learning algorithm, requiring minimal implementation effort and computational cost. A key advantage of our method over existing alternatives -- which also build on conformal inference -- is that we do not need to split the data into training and holdout calibration sets. This allows us to fit the predictive model in a fully online manner, utilizing the most recent observation for constructing calibrated uncertainty sets. Consequently, and in contrast with existing techniques, (i) the sets we build can quickly adapt to new changes in the distribution; and (ii) our procedure does not require refitting the model at each time step. Using synthetic and real-world benchmark data sets, we demonstrate the validity of our theory and the improved performance of our proposal over existing techniques. To demonstrate the greater flexibility of the proposed method, we show how to construct valid intervals for a multiple-output regression problem that previous sequential calibration methods cannot handle due to impractical computational and memory requirements.  ( 3 min )
    A Unified Causal View of Domain Invariant Representation Learning. (arXiv:2208.06987v1 [stat.ML])
    Machine learning methods can be unreliable when deployed in domains that differ from the domains on which they were trained. To address this, we may wish to learn representations of data that are domain-invariant in the sense that we preserve data structure that is stable across domains, but throw out spuriously-varying parts. There are many representation-learning approaches of this type, including methods based on data augmentation, distributional invariances, and risk invariance. Unfortunately, when faced with any particular real-world domain shift, it is unclear which, if any, of these methods might be expected to work. The purpose of this paper is to show how the different methods relate to each other, and clarify the real-world circumstances under which each is expected to succeed. The key tool is a new notion of domain shift relying on the idea that causal relationships are invariant, but non-causal relationships (e.g., due to confounding) may vary.  ( 2 min )
    Diffusion Models for Video Prediction and Infilling. (arXiv:2206.07696v2 [cs.CV] UPDATED)
    Predicting and anticipating future outcomes or reasoning about missing information in a sequence are critical skills for agents to be able to make intelligent decisions. This requires strong, temporally coherent generative capabilities. Diffusion models have shown remarkable success in several generative tasks, but have not been extensively explored in the video domain. We present Random-Mask Video Diffusion (RaMViD), which extends image diffusion models to videos using 3D convolutions, and introduces a new conditioning technique during training. By varying the mask we condition on, the model is able to perform video prediction, infilling, and upsampling. Due to our simple conditioning scheme, we can utilize the same architecture as used for unconditional training, which allows us to train the model in a conditional and unconditional fashion at the same time. We evaluate the model on two benchmark datasets for video prediction, on which we achieve state-of-the-art results, and one for video generation.  ( 2 min )
    When Does Differentially Private Learning Not Suffer in High Dimensions?. (arXiv:2207.00160v3 [cs.LG] UPDATED)
    Large pretrained models can be privately fine-tuned to achieve performance approaching that of non-private models. A common theme in these results is the surprising observation that high-dimensional models can achieve favorable privacy-utility trade-offs. This seemingly contradicts known results on the model-size dependence of differentially private convex learning and raises the following research question: When does the performance of differentially private learning not degrade with increasing model size? We identify that the magnitudes of gradients projected onto subspaces is a key factor that determines performance. To precisely characterize this for private convex learning, we introduce a condition on the objective that we term \emph{restricted Lipschitz continuity} and derive improved bounds for the excess empirical and population risks that are dimension-independent under additional conditions. We empirically show that in private fine-tuning of large language models, gradients obtained during fine-tuning are mostly controlled by a few principal components. This behavior is similar to conditions under which we obtain dimension-independent bounds in convex settings. Our theoretical and empirical results together provide a possible explanation for recent successes in large-scale private fine-tuning. Code to reproduce our results can be found at \url{https://github.com/lxuechen/private-transformers/tree/main/examples/classification/spectral_analysis}.  ( 3 min )
    Scalable Gaussian-process regression and variable selection using Vecchia approximations. (arXiv:2202.12981v3 [stat.ME] UPDATED)
    Gaussian process (GP) regression is a flexible, nonparametric approach to regression that naturally quantifies uncertainty. In many applications, the number of responses and covariates are both large, and a goal is to select covariates that are related to the response. For this setting, we propose a novel, scalable algorithm, coined VGPR, which optimizes a penalized GP log-likelihood based on the Vecchia GP approximation, an ordered conditional approximation from spatial statistics that implies a sparse Cholesky factor of the precision matrix. We traverse the regularization path from strong to weak penalization, sequentially adding candidate covariates based on the gradient of the log-likelihood and deselecting irrelevant covariates via a new quadratic constrained coordinate descent algorithm. We propose Vecchia-based mini-batch subsampling, which provides unbiased gradient estimators. The resulting procedure is scalable to millions of responses and thousands of covariates. Theoretical analysis and numerical studies demonstrate the improved scalability and accuracy relative to existing methods.  ( 2 min )
    Inverse Extended Kalman Filter -- Part I: Fundamentals. (arXiv:2201.01539v2 [math.OC] UPDATED)
    Recent advances in counter-adversarial systems have garnered significant research attention to inverse filtering from a Bayesian perspective. For example, interest in estimating the adversary's Kalman filter tracked estimate with the purpose of predicting the adversary's future steps has led to recent formulations of inverse Kalman filter (I-KF). In this context of inverse filtering, we address the key challenges of non-linear process dynamics and unknown input to the forward filter by proposing an inverse extended Kalman filter (I-EKF). The purpose of this paper and the companion paper (Part II) is to develop the theory of I-EKF in detail. In this paper, we assume perfect system model information and derive I-EKF with and without an unknown input when both forward and inverse state-space models are non-linear. In the process, I-KF-with-unknown-input is also obtained. We then provide theoretical stability guarantees using both bounded non-linearity and unknown matrix approaches. Numerical experiments validate our methods for various proposed inverse filters using the recursive Cram\'{e}r-Rao lower bound as a benchmark. In the companion paper (Part II), we further generalize these formulations to highly non-linear models and propose reproducing kernel Hilbert space-based EKF to handle incomplete system model information.  ( 3 min )
    Overcoming Oversmoothness in Graph Convolutional Networks via Hybrid Scattering Networks. (arXiv:2201.08932v2 [stat.ML] UPDATED)
    Geometric deep learning has made great strides towards generalizing the design of structure-aware neural networks from traditional domains to non-Euclidean ones, giving rise to graph neural networks (GNN) that can be applied to graph-structured data arising in, e.g., social networks, biochemistry, and material science. Graph convolutional networks (GCNs) in particular, inspired by their Euclidean counterparts, have been successful in processing graph data by extracting structure-aware features. However, current GNN models are often constrained by various phenomena that limit their expressive power and ability to generalize to more complex graph datasets. Most models essentially rely on low-pass filtering of graph signals via local averaging operations, leading to oversmoothing. Moreover, to avoid severe oversmoothing, most popular GCN-style networks tend to be shallow, with narrow receptive fields, leading to underreaching. Here, we propose a hybrid GNN framework that combines traditional GCN filters with band-pass filters defined via geometric scattering. We further introduce an attention framework that allows the model to locally attend over combined information from different filters at the node level. Our theoretical results establish the complementary benefits of the scattering filters to leverage structural information from the graph, while our experiments show the benefits of our method on various learning tasks.  ( 3 min )
    Lifelong Neural Predictive Coding: Learning Cumulatively Online without Forgetting. (arXiv:1905.10696v4 [cs.LG] UPDATED)
    In lifelong learning systems based on artificial neural networks, one of the biggest obstacles is the inability to retain old knowledge as new information is encountered. This phenomenon is known as catastrophic forgetting. In this paper, we propose a new kind of connectionist architecture, the Sequential Neural Coding Network, that is robust to forgetting when learning from streams of data points and, unlike networks of today, does not learn via the popular back-propagation of errors. Grounded in the neurocognitive theory of predictive processing, our model adapts synapses in a biologically-plausible fashion while another neural system learns to direct and control this cortex-like structure, mimicking some of the task-executive control functionality of the basal ganglia. In our experiments, we demonstrate that our self-organizing system experiences significantly less forgetting compared to standard neural models, outperforming a swath of previously proposed methods, including rehearsal/data buffer-based methods, on both standard (SplitMNIST, Split Fashion MNIST, etc.) and custom benchmarks even though it is trained in a stream-like fashion. Our work offers evidence that emulating mechanisms in real neuronal systems, e.g., local learning, lateral competition, can yield new directions and possibilities for tackling the grand challenge of lifelong machine learning.  ( 3 min )
    PAC Generalization via Invariant Representations. (arXiv:2205.15196v3 [cs.LG] UPDATED)
    One method for obtaining generalizable solutions to machine learning tasks when presented with diverse training environments is to find \textit{invariant representations} of the data. These are representations of the covariates such that the best model on top of the representation is invariant across training environments. In the context of linear Structural Equation Models (SEMs), invariant representations might allow us to learn models with out-of-distribution guarantees, i.e., models that are robust to interventions in the SEM. To address the invariant representation problem in a {\em finite sample} setting, we consider the notion of $\epsilon$-approximate invariance. We study the following question: If a representation is approximately invariant with respect to a given number of training interventions, will it continue to be approximately invariant on a larger collection of unseen SEMs? This larger collection of SEMs is generated through a parameterized family of interventions. Inspired by PAC learning, we obtain finite-sample out-of-distribution generalization guarantees for approximate invariance that holds \textit{probabilistically} over a family of linear SEMs without faithfulness assumptions. Our results show bounds that do not scale in ambient dimension when intervention sites are restricted to lie in a constant size subset of in-degree bounded nodes. We also show how to extend our results to a linear indirect observation model that incorporates latent variables.  ( 3 min )
    A Sparse Expansion For Deep Gaussian Processes. (arXiv:2112.05888v2 [stat.ML] UPDATED)
    In this work, we use Deep Gaussian Processes (DGPs) as statistical surrogates for stochastic processes with complex distributions. Conventional inferential methods for DGP models can suffer from high computational complexity as they require large-scale operations with kernel matrices for training and inference. In this work, we propose an efficient scheme for accurate inference and efficient training based on a range of Gaussian Processes, called the Tensor Markov Gaussian Processes (TMGP). We construct an induced approximation of TMGP referred to as the hierarchical expansion. Next, we develop a deep TMGP (DTMGP) model as the composition of multiple hierarchical expansion of TMGPs. The proposed DTMGP model has the following properties: (1) the outputs of each activation function are deterministic while the weights are chosen independently from standard Gaussian distribution; (2) in training or prediction, only polylog(M) (out of M) activation functions have non-zero outputs, which significantly boosts the computational efficiency. Our numerical experiments on synthetic models and real datasets show the superior computational efficiency of DTMGP over existing DGP models.  ( 2 min )
    Class Prior Estimation under Covariate Shift: No Problem?. (arXiv:2206.02449v2 [stat.ML] UPDATED)
    We show that in the context of classification the property of source and target distributions to be related by covariate shift may be lost if the information content captured in the covariates is reduced, for instance by dropping components or mapping into a lower-dimensional or finite space. As a consequence, under covariate shift simple approaches to class prior estimation in the style of classify and count with or without adjustment are infeasible. We prove that transformations of the covariates that preserve the covariate shift property are necessarily sufficient in the statistical sense for the full set of covariates. A probing algorithm as alternative approach to class prior estimation under covariate shift is proposed.  ( 2 min )
    Cost-effective Framework for Gradual Domain Adaptation with Multifidelity. (arXiv:2202.04359v2 [stat.ML] UPDATED)
    In domain adaptation, when there is a large distance between the source and target domains, the prediction performance will degrade. Gradual domain adaptation is one of the solutions to such an issue, assuming that we have access to intermediate domains, which shift gradually from the source to the target domain. In previous works, it was assumed that the number of samples in the intermediate domains was sufficiently large; hence, self-training was possible without the need for labeled data. If the number of accessible intermediate domains is restricted, the distances between domains become large, and self-training will fail. Practically, the cost of samples in intermediate domains will vary, and it is natural to consider that the closer an intermediate domain is to the target domain, the higher the cost of obtaining samples from the intermediate domain is. To solve the trade-off between cost and accuracy, we propose a framework that combines multifidelity and active domain adaptation. The effectiveness of the proposed method is evaluated by experiments with real-world datasets.  ( 2 min )
    Learning Contact Dynamics using Physically Structured Neural Networks. (arXiv:2102.11206v2 [cs.LG] UPDATED)
    Learning physically structured representations of dynamical systems that include contact between different objects is an important problem for learning-based approaches in robotics. Black-box neural networks can learn to approximately represent discontinuous dynamics, but they typically require large quantities of data and often suffer from pathological behaviour when forecasting for longer time horizons. In this work, we use connections between deep neural networks and differential equations to design a family of deep network architectures for representing contact dynamics between objects. We show that these networks can learn discontinuous contact events in a data-efficient manner from noisy observations in settings that are traditionally difficult for black-box approaches and recent physics inspired neural networks. Our results indicate that an idealised form of touch feedback -- which is heavily relied upon by biological systems -- is a key component of making this learning problem tractable. Together with the inductive biases introduced through the network architectures, our techniques enable accurate learning of contact dynamics from observations.  ( 2 min )
    Convergence Rates for Stochastic Approximation on a Boundary. (arXiv:2208.07243v1 [stat.ML])
    We analyze the behavior of projected stochastic gradient descent focusing on the case where the optimum is on the boundary of the constraint set and the gradient does not vanish at the optimum. Here iterates may in expectation make progress against the objective at each step. When this and an appropriate moment condition on noise holds, we prove that the convergence rate to the optimum of the constrained stochastic gradient descent will be different and typically be faster than the unconstrained stochastic gradient descent algorithm. Our results argue that the concentration around the optimum is exponentially distributed rather than normally distributed, which typically determines the limiting convergence in the unconstrained case. The methods that we develop rely on a geometric ergodicity proof. This extends a result on Markov chains by Hajek (1982) to the area of stochastic approximation algorithms.As examples, we show how the results apply to linear programming and tabular reinforcement learning.
    Plasticity Neural Network Based on Astrocytic Influence at Critical Period, Synaptic Competition and Compensation by Current and Mnemonic Brain Plasticity and Synapse Formation. (arXiv:2203.11740v3 [cs.NE] UPDATED)
    The mechanism of our NN is very well in line with the results of the latest MIT brain plasticity study, in which researchers found that as a synapse strengthens, neighboring synapses automatically weaken themselves to compensate. Regarding the importance of this mechanism, Dr. Luo's team at Stanford University has put forward that competition regarding synapse formation for dendritic morphogenesis is crucial. We try to conduct research on the mechanism of failure in brain plasticity by model at the closure of critical period in details by contrasting with studies before. Cutting edge imaging and genetic tools are combined in their experimental studies, whereas our research lays more emphasis on the model, derivation and simulation of a new NN. In tests, which demonstrate that dendrite generation, to a certain extent, is curbed by synapse formation. Current and mnemonic brain plasticity as well as synaptic action range are also taken into account in the study. Furthermore, the frame of the new NN is based on current gradient informational and mnemonic negative and positive gradient informational synapse formation. The mnemonic gradient information needs to take into account the forgotten memory-astrocytic synapse formation memory persistence factor (including both negative and positive memories - i.e. the optimal gradient information so far and relatively inferior gradient information). We found that the astrocytic memory persistence factor, like the phagocytosis factor, produces the effect of reducing the local accumulation of synapses. The PNN in which only the synaptic phagocytosis effect is considered regardless of the gradients update, and whether the synaptic phagocytosis of different variables and synaptic positions is cancelled is determined by the correlation coefficient of the corresponding time interval, proves simple and effective.
    Convergence of a robust deep FBSDE method for stochastic control. (arXiv:2201.06854v4 [math.OC] UPDATED)
    In this paper, we propose a deep learning based numerical scheme for strongly coupled FBSDEs, stemming from stochastic control. It is a modification of the deep BSDE method in which the initial value to the backward equation is not a free parameter, and with a new loss function being the weighted sum of the cost of the control problem, and a variance term which coincides with the mean squared error in the terminal condition. We show by a numerical example that a direct extension of the classical deep BSDE method to FBSDEs, fails for a simple linear-quadratic control problem, and motivate why the new method works. Under regularity and boundedness assumptions on the exact controls of time continuous and time discrete control problems, we provide an error analysis for our method. We show empirically that the method converges for three different problems, one being the one that failed for a direct extension of the deep BSDE method.  ( 3 min )
    When do Models Generalize? A Perspective from Data-Algorithm Compatibility. (arXiv:2202.06054v2 [cs.LG] UPDATED)
    One of the major open problems in machine learning theory is to characterize generalization in the overparameterized regime, where most traditional generalization bounds become inconsistent. In many scenarios, their failure can be attributed to obscuring the crucial interplay between the training algorithm and the underlying data distribution. To address this shortcoming, we propose a concept named compatibility, which quantitatively characterizes generalization in a both data-relevant and algorithm-relevant manner. By considering the entire training trajectory and focusing on early-stopping iterates, compatibility fully exploits the algorithm information and therefore yields better generalization guarantees. We validate this by theoretically studying compatibility under the setting of overparameterized linear regression with gradient descent. Specifically, we perform a data-dependent trajectory analysis and derive a sufficient condition for compatibility under such a setting. Our theoretical results show that in the sense of compatibility, generalization holds with significantly weaker restrictions on the problem instance than the previous last iterate analysis.  ( 2 min )
    Unifying supervised learning and VAEs -- automating statistical inference in (astro-)particle physics with amortized conditional normalizing flows. (arXiv:2008.05825v3 [cs.LG] UPDATED)
    A KL-divergence objective of the joint distribution of data and labels allows to unify supervised learning and variational autoencoders (VAEs) under one umbrella of stochastic variational inference. The unification motivates an extended supervised scheme which allows to calculate a goodness-of-fit p-value for the neural network model. Conditional normalizing flows amortized with a neural network are crucial in this construction. We discuss how they allow to rigorously define coverage for posteriors defined jointly on a product space, e.g. $\mathbb{R}^n \times \mathcal{S}^m$, which encompasses posteriors over directions. Finally, systematic uncertainties are naturally included in the variational viewpoint. In classical likelihood approaches or other machine learning models, the ingredients of (1) systematics, (2) coverage and (3) goodness-of-fit are typically not all available or at least one of them strongly constrained. In contrast, the proposed extended supervised training with amortized normalizing flows accommodates all three of them for variational inference of arbitrary statistical distributions defined on product spaces like $\mathbb{R}^n \times \ldots \times \mathcal{S}^m$ and no fundamental barrier in terms of complexity of the underlying data. It therefore has great potential for the statistical toolbox of the contemporary (astro-)particle physicist.  ( 3 min )
    Sharp asymptotics on the compression of two-layer neural networks. (arXiv:2205.08199v3 [cs.IT] UPDATED)
    In this paper, we study the compression of a target two-layer neural network with N nodes into a compressed network with M<N nodes. More precisely, we consider the setting in which the weights of the target network are i.i.d. sub-Gaussian, and we minimize the population L_2 loss between the outputs of the target and of the compressed network, under the assumption of Gaussian inputs. By using tools from high-dimensional probability, we show that this non-convex problem can be simplified when the target network is sufficiently over-parameterized, and provide the error rate of this approximation as a function of the input dimension and N. In this mean-field limit, the simplified objective, as well as the optimal weights of the compressed network, does not depend on the realization of the target network, but only on expected scaling factors. Furthermore, for networks with ReLU activation, we conjecture that the optimum of the simplified optimization problem is achieved by taking weights on the Equiangular Tight Frame (ETF), while the scaling of the weights and the orientation of the ETF depend on the parameters of the target network. Numerical evidence is provided to support this conjecture.  ( 3 min )
    The FEDHC Bayesian network learning algorithm. (arXiv:2012.00113v6 [stat.ML] UPDATED)
    The paper proposes a new hybrid Bayesian network learning algorithm, termed Forward Early Dropping Hill Climbing (FEDHC), devised to work with either continuous or categorical variables. Further, the paper manifests that the only implementation of MMHC in the statistical software \textit{R}, is prohibitively expensive and a new implementation is offered. Further, specifically for the case of continuous data, a robust to outliers version of FEDHC, that can be adopted by other BN learning algorithms, is proposed. The FEDHC is tested via Monte Carlo simulations that distinctly show it is computationally efficient, and produces Bayesian networks of similar to, or of higher accuracy than MMHC and PCHC. Finally, an application of FEDHC, PCHC and MMHC algorithms to real data, from the field of economics, is demonstrated using the statistical software \textit{R}.  ( 2 min )
    Approximate Post-Selective Inference for Regression with the Group LASSO. (arXiv:2012.15664v4 [stat.ME] UPDATED)
    After selection with the Group LASSO (or generalized variants such as the overlapping, sparse, or standardized Group LASSO), inference for the selected parameters is unreliable in the absence of adjustments for selection bias. In the penalized Gaussian regression setup, existing approaches provide adjustments for selection events that can be expressed as linear inequalities in the data variables. Such a representation, however, fails to hold for selection with the Group LASSO and substantially obstructs the scope of subsequent post-selective inference. Key questions of inferential interest -- for example, inference for the effects of selected variables on the outcome -- remain unanswered. In the present paper, we develop a consistent, post-selective, Bayesian method to address the existing gaps by deriving a likelihood adjustment factor and an approximation thereof that eliminates bias from the selection of groups. Experiments on simulated data and data from the Human Connectome Project demonstrate that our method recovers the effects of parameters within the selected groups while paying only a small price for bias adjustment.  ( 2 min )
    Policy Gradient Methods Find the Nash Equilibrium in N-player General-sum Linear-quadratic Games. (arXiv:2107.13090v2 [math.OC] UPDATED)
    We consider a general-sum N-player linear-quadratic game with stochastic dynamics over a finite horizon and prove the global convergence of the natural policy gradient method to the Nash equilibrium. In order to prove the convergence of the method, we require a certain amount of noise in the system. We give a condition, essentially a lower bound on the covariance of the noise in terms of the model parameters, in order to guarantee convergence. We illustrate our results with numerical experiments to show that even in situations where the policy gradient method may not converge in the deterministic setting, the addition of noise leads to convergence.  ( 2 min )
    Riemannian accelerated gradient methods via extrapolation. (arXiv:2208.06619v1 [math.OC])
    In this paper, we propose a simple acceleration scheme for Riemannian gradient methods by extrapolating iterates on manifolds. We show when the iterates are generated from Riemannian gradient descent method, the accelerated scheme achieves the optimal convergence rate asymptotically and is computationally more favorable than the recently proposed Riemannian Nesterov accelerated gradient methods. Our experiments verify the practical benefit of the novel acceleration strategy.  ( 2 min )
    Towards out of distribution generalization for problems in mechanics. (arXiv:2206.14917v2 [stat.ML] UPDATED)
    There has been a massive increase in research interest towards applying data driven methods to problems in mechanics. While traditional machine learning (ML) methods have enabled many breakthroughs, they rely on the assumption that the training (observed) data and testing (unseen) data are independent and identically distributed (i.i.d). Thus, traditional ML approaches often break down when applied to real world mechanics problems with unknown test environments and data distribution shifts. In contrast, out-of-distribution (OOD) generalization assumes that the test data may shift (i.e., violate the i.i.d. assumption). To date, multiple methods have been proposed to improve the OOD generalization of ML methods. However, because of the lack of benchmark datasets for OOD regression problems, the efficiency of these OOD methods on regression problems, which dominate the mechanics field, remains unknown. To address this, we investigate the performance of OOD generalization methods for regression problems in mechanics. Specifically, we identify three OOD problems: covariate shift, mechanism shift, and sampling bias. For each problem, we create two benchmark examples that extend the Mechanical MNIST dataset collection, and we investigate the performance of popular OOD generalization methods on these mechanics-specific regression problems. Our numerical experiments show that in most cases, while the OOD generalization algorithms perform better compared to traditional ML methods on these OOD problems, there is a compelling need to develop more robust OOD generalization methods that are effective across multiple OOD scenarios. Overall, we expect that this study, as well as the associated open access benchmark datasets, will enable further development of OOD generalization methods for mechanics specific regression problems.  ( 3 min )
    Predicting from Predictions. (arXiv:2208.07331v1 [stat.ML])
    Predictions about people, such as their expected educational achievement or their credit risk, can be performative and shape the outcome that they aim to predict. Understanding the causal effect of these predictions on the eventual outcomes is crucial for foreseeing the implications of future predictive models and selecting which models to deploy. However, this causal estimation task poses unique challenges: model predictions are usually deterministic functions of input features and highly correlated with outcomes, which can make the causal effects of predictions impossible to disentangle from the direct effect of the covariates. We study this problem through the lens of causal identifiability, and despite the hardness of this problem in full generality, we highlight three natural scenarios where the causal effect of predictions on outcomes can be identified from observational data: randomization in predictions or prediction-based decisions, overparameterization of the predictive model deployed during data collection, and discrete prediction outputs. We show empirically that, under suitable identifiability conditions, standard variants of supervised learning that predict from predictions can find transferable functional relationships between features, predictions, and outcomes, allowing for conclusions about newly deployed prediction models. Our positive results fundamentally rely on model predictions being recorded during data collection, bringing forward the importance of rethinking standard data collection practices to enable progress towards a better understanding of social outcomes and performative feedback loops.  ( 2 min )
    Inference for BART with Multinomial Outcomes. (arXiv:2101.06823v2 [stat.ME] UPDATED)
    The multinomial probit Bayesian additive regression trees (MPBART) framework was proposed by Kindo et al. (KD), approximating the latent utilities in the multinomial probit (MNP) model with BART (Chipman et al. 2010). Compared to multinomial logistic models, MNP does not assume independent alternatives and the correlation structure among alternatives can be specified through multivariate Gaussian distributed latent utilities. We introduce two new algorithms for fitting the MPBART and show that the theoretical mixing rates of our proposals are equal or superior to the existing algorithm in KD. Through simulations, we explore the robustness of the methods to the choice of reference level, imbalance in outcome frequencies, and the specifications of prior hyperparameters for the utility error term. The work is motivated by the application of generating posterior predictive distributions for mortality and engagement in care among HIV-positive patients based on electronic health records (EHRs) from the Academic Model Providing Access to Healthcare (AMPATH) in Kenya. In both the application and simulations, we observe better performance using our proposals as compared to KD in terms of MCMC convergence rate and posterior predictive accuracy.  ( 2 min )
    Finite Sample Complexity of Sequential Monte Carlo Estimators on Multimodal Target Distributions. (arXiv:2208.06672v1 [stat.CO])
    We prove finite sample complexities for sequential Monte Carlo (SMC) algorithms which require only local mixing times of the associated Markov kernels. Our bounds are particularly useful when the target distribution is multimodal and global mixing of the Markov kernel is slow; in such cases our approach establishes the benefits of SMC over the corresponding Markov chain Monte Carlo (MCMC) estimator. The lack of global mixing is addressed by sequentially controlling the bias introduced by SMC resampling procedures. We apply these results to obtain complexity bounds for approximating expectations under mixtures of log-concave distributions and show that SMC provides a fully polynomial time randomized approximation scheme for some difficult multimodal problems where the corresponding Markov chain sampler is exponentially slow. Finally, we compare the bounds obtained by our approach to existing bounds for tempered Markov chains on the same problems.  ( 2 min )
    On the Estimation of Derivatives Using Plug-in KRR Estimators. (arXiv:2006.01350v3 [stat.ML] UPDATED)
    We study the problem of estimating the derivatives of a regression function, which has a wide range of applications as a key nonparametric functional of unknown functions. Standard analysis may be tailored to specific derivative orders, and parameter tuning remains a daunting challenge particularly for high-order derivatives. In this article, we propose a simple plug-in kernel ridge regression (KRR) estimator in nonparametric regression with random design that is broadly applicable for multi-dimensional support and arbitrary mixed-partial derivatives. We provide a non-asymptotic analysis to study the behavior of the proposed estimator in a unified manner that encompasses the regression function and its derivatives, leading to two error bounds for a general class of kernels under the strong $L_\infty$ norm. In a concrete example specialized to kernels with polynomially decaying eigenvalues, the proposed estimator recovers the minimax optimal rate up to a logarithmic factor for estimating derivatives of functions in H\"older and Sobolev classes. Interestingly, the proposed estimator achieves the optimal rate of convergence with the same choice of tuning parameter for any order of derivatives. Hence, the proposed estimator enjoys a \textit{plug-in property} for derivatives in that it automatically adapts to the order of derivatives to be estimated, enabling easy tuning in practice. Our simulation studies show favorable finite sample performance of the proposed method relative to several existing methods blue and corroborate the theoretical findings on its minimax optimality.  ( 3 min )
    Dynamic Bayesian Learning and Calibration of Spatiotemporal Mechanistic System. (arXiv:2208.06528v1 [stat.ME])
    We develop an approach for fully Bayesian learning and calibration of spatiotemporal dynamical mechanistic models based on noisy observations. Calibration is achieved by melding information from observed data with simulated computer experiments from the mechanistic system. The joint melding makes use of both Gaussian and non-Gaussian state-space methods as well as Gaussian process regression. Assuming the dynamical system is controlled by a finite collection of inputs, Gaussian process regression learns the effect of these parameters through a number of training runs, driving the stochastic innovations of the spatiotemporal state-space component. This enables efficient modeling of the dynamics over space and time. Through reduced-rank Gaussian processes and a conjugate model specification, our methodology is applicable to large-scale calibration and inverse problems. Our method is general, extensible, and capable of learning a wide range of dynamical systems with potential model misspecification. We demonstrate this flexibility through solving inverse problems arising in the analysis of ordinary and partial nonlinear differential equations and, in addition, to a black-box computer model generating spatiotemporal dynamics across a network.  ( 2 min )
    Machine learning meets false discovery rate. (arXiv:2208.06685v1 [stat.ME])
    Classical false discovery rate (FDR) controlling procedures offer strong and interpretable guarantees, while they often lack of flexibility. On the other hand, recent machine learning classification algorithms, as those based on random forests (RF) or neural networks (NN), have great practical performances but lack of interpretation and of theoretical guarantees. In this paper, we make these two meet by introducing a new adaptive novelty detection procedure with FDR control, called AdaDetect. It extends the scope of recent works of multiple testing literature to the high dimensional setting, notably the one in Yang et al. (2021). AdaDetect is shown to both control strongly the FDR and to have a power that mimics the one of the oracle in a specific sense. The interest and validity of our approach is demonstrated with theoretical results, numerical experiments on several benchmark datasets and with an application to astrophysical data. In particular, while AdaDetect can be used in combination with any classifier, it is particularly efficient on real-world datasets with RF, and on images with NN.  ( 2 min )
    Inverse Extended Kalman Filter -- Part II: Highly Non-Linear and Uncertain Systems. (arXiv:2208.06683v1 [math.OC])
    Recent counter-adversarial system design problems have motivated the development of inverse Bayesian filters. For example, inverse Kalman filter (I-KF) has been recently formulated to estimate the adversary's Kalman filter tracked estimates and hence, predict the adversary's future steps. The purpose of this paper and the companion paper (Part I) is to address the inverse filtering problem in non-linear systems by proposing an inverse extended Kalman filter (I-EKF). In a companion paper (Part I), we developed the theory of I-EKF (with and without unknown inputs) and I-KF (with unknown inputs). In this paper, we develop this theory for highly non-linear models, which employ second-order, Gaussian sum, and dithered forward EKFs. In particular, we derive theoretical stability guarantees for the inverse second-order EKF using the bounded non-linearity approach. To address the limitation of the standard I-EKFs that the system model and forward filter are perfectly known to the defender, we propose reproducing kernel Hilbert space-based EKF to learn the unknown system dynamics based on its observations, which can be employed as an inverse filter to infer the adversary's estimate. Numerical experiments demonstrate the state estimation performance of the proposed filters using recursive Cram\'{e}r-Rao lower bound as a benchmark.  ( 3 min )
    Learning Linear Non-Gaussian Polytree Models. (arXiv:2208.06701v1 [stat.ML])
    In the context of graphical causal discovery, we adapt the versatile framework of linear non-Gaussian acyclic models (LiNGAMs) to propose new algorithms to efficiently learn graphs that are polytrees. Our approach combines the Chow--Liu algorithm, which first learns the undirected tree structure, with novel schemes to orient the edges. The orientation schemes assess algebraic relations among moments of the data-generating distribution and are computationally inexpensive. We establish high-dimensional consistency results for our approach and compare different algorithmic versions in numerical experiments.  ( 2 min )

  • Open

    My third attempt at creating wallpapers: Glowing Sunrise of a Summer Day | Using MidJourney AI (Image Creator bot for Discord)
    submitted by /u/Potato_Player_BR [link] [comments]  ( 86 min )
    AI that could combine books
    I'm looking for an AI that could possibly combine multiple different texts and segments of writing to create one coherent piece that includes the ideas and discussions of both. For example, if I were to write a book on courts then I could combine one chapter from a book that discusses jails and another that discusses court etiquette and get a cohesive explanation of both jails and court etiquette. Would anyone know if some AI similar to this exists? I don't care for costs. submitted by /u/SoterDave [link] [comments]  ( 89 min )
    Computing and Visualizing Brain Topological Data Analysis
    submitted by /u/giorgiodidio [link] [comments]  ( 86 min )
    open ai funny
    submitted by /u/Fearless_Squirrel_72 [link] [comments]  ( 86 min )
    Mean Average Precision (mAP) in Object Detection
    submitted by /u/spmallick [link] [comments]  ( 87 min )
    Animating between Midjourney variations
    submitted by /u/zark23 [link] [comments]  ( 86 min )
    Weekly China AI News: Xiaomi Races Ahead of Tesla with Humanoid Robot Unveil; Baidu Secures China's 1st Permits for Fared Driverless Robotaxi; SenseTime Launches Chinese Chess Robot
    submitted by /u/trcytony [link] [comments]  ( 86 min )
    I made a conversational AI app that tutors you in math, science, history and computer science!
    submitted by /u/landongarrison [link] [comments]  ( 89 min )
    Factory in the Clouds created with Pixels AI
    submitted by /u/widgia [link] [comments]  ( 86 min )
    Workshop on AGI/AI in FinTech (AGIFT)
    submitted by /u/akolonin [link] [comments]  ( 87 min )
    I used Midjourney to create A.I. images of a redhead I named Anna Bhordana, this shows her life story so far
    submitted by /u/xXLjordSireXx [link] [comments]  ( 89 min )
    Automatic ML Model Containerization
    Containerizing machine learning models can be a pain. This talk covers a new open-source approach to building machine learning (ML) models into container images to run in production for inference. Chassis.ml and the Open Model Interface are changing the game with a standard container specification that allows for interoperability, portability, and security for models to seamlessly be integrated into production applications. https://youtu.be/vq3k9wQymss submitted by /u/modzykirsten [link] [comments]  ( 86 min )
    Ather Gattami: How to Create Products Using AI
    Wondering how to create your products using AI? That’s exactly what we’ll be talking about this at the next AI Talks! Ather Gattami will join us this Wednesday at 5:30 pm CEST to share his experience of launching products using the power of AI! Is listening boring for you? Then, join the Twitch stream and ask your questions in real-time! To get to know the speaker a little: Ather Gattami is a tech entrepreneur and expert in the field of AI. He’s founded OrganAi.se, an AI assistant for everyone, and Bitynamics, a company that develops AI-powered solutions for European businesses. At the same time, he works as a research affiliate for AI Sweden, making a significant contribution to the development of AI for the world! Don’t waste time and register for AI Talk so you don’t miss the opportunity to get the answers you’ve been looking for! Register here and get notified when we go live! Ather Gattami: How to Create Products Using AI submitted by /u/zakrzzz [link] [comments]  ( 87 min )
    A few questions about AI
    As a person who only know a thing or two about artificial intelligence, my knowledge is super limited in the field. However, I'm here to ask more philosophical questions than scientific questions. I've seen lots of talk about AI taking over the world and replacing humans, rendering them infinitely inferior and a total waste of space and energy. And, As pessimistic/defeatist/fatalistic as I feel to this technology, I'd like to get some second opinions from people who knows more than me before I form my own. Wouldn't AI defeat the will to live? If AI is superior to humans by a long shot, then why exist or continue to exist? It seems like a sort of "planned obsolescence" on a existential scale. I've seen people talk about how everyone everywhere will be replaced because I keep "AI is better…  ( 91 min )
    Reinforcement learning models are prone to membership inference attacks
    submitted by /u/bendee983 [link] [comments]  ( 86 min )
    Can some teach me how to use wordpress and do changes in a wordpress website.
    submitted by /u/DragonflySea5590 [link] [comments]  ( 86 min )
    Copyright infringement in artificial intelligence art
    submitted by /u/bartturner [link] [comments]  ( 86 min )
    Gotham City generated by Midjourney AI
    submitted by /u/cripuskas [link] [comments]  ( 86 min )
    Volcano Explosion🌋
    submitted by /u/widgia [link] [comments]  ( 86 min )
  • Open

    [D] 3D Face Reconstruction with Dense Landmarks
    Few questions about this paper, mainly with the way they attain the Dense landmark prediction. Paper: https://arxiv.org/abs/2204.02776 I understand they use a CNN first to get a Region-of-Interest (ROI), but i'm confused as to how they inscribe an expanded square around the image. I'm referring to step (c)/(d) here: https://ibb.co/vjy8TDD. I'm unsure as to how they create and adapt this sort of linear fitting of the mesh/cube structure to the face. Does this cube mesh structure matter? Why is it placed here in the first place? Is this cube mesh used in the step of predicting each landmark? They state that "We predict each landmark as a random variable with the probability density function of a circular 2D Gaussian". They then go further onto state "Our training data includes labels for landmark positions." So i'm wondering since they have predicted 703 landmarks, do they manually label 703 landmarks in the training set? They don't speak of this in the paper so i'm a little confused. I know in the dataset from https://github.com/microsoft/FaceSynthetics, they have 70 landmarks so do they use 70 and scale up to 700? If so, how are they able to scale up the landmarks? How are they able to get the curvature of the face shape so accurately. Is this a function of the circular 2D Gaussian PDF? It seems like the eyes, mouth, and nose don't have any points. If they were random variables, why are these parts of the images excluded from the prediction? https://ibb.co/XjKKpCQ. Also how do they connect the different points, and do the connections even matter? submitted by /u/trikortreat123 [link] [comments]  ( 89 min )
    [D] How long until we start talking about "human DALLEs"?
    After the invention of the calculator, those who could do complex mental math became known as "human calculators", even though they weren't as good as their electronic counterparts. How long will it be until we start calling artists "human DALLEs"? submitted by /u/FranciscoJ1618 [link] [comments]  ( 87 min )
    [R] Language Models Can Teach Themselves to Program Better - Microsoft 2022
    Paper: https://arxiv.org/abs/2207.14502 Abstract: This work shows how one can use large-scale language models (LMs) to synthesize programming problems with verified solutions, in the form of programming puzzles, which can then in turn be used to fine-tune those same models, improving their performance. This work builds on two recent developments. First, LMs have achieved breakthroughs in non-trivial reasoning and algorithm implementation, generating code that can solve some intermediate-level competitive programming problems. However, training code LMs involves curated sets of natural-language problem descriptions and source-code tests and solutions, which are limited in size. Second, a new format of programming challenge called a programming puzzle was introduced, which does not require a natural language description and is directly specified by a source-code test. In this work we show how generating synthetic programming puzzles and solutions, verified for correctness by a Python interpreter, can be used to improve performance in solving test puzzles from P3, a public benchmark set of Python Programming Puzzles. Additionally, we release a dataset of 1 million puzzles and solutions generated by the Codex model, which we show can improve smaller models through fine-tuning. https://preview.redd.it/ci0fmpm2gyh91.jpg?width=1000&format=pjpg&auto=webp&s=348f9e50cf9adace91c9f5dd1f28fd61d516740e https://preview.redd.it/nf17qpm2gyh91.jpg?width=1100&format=pjpg&auto=webp&s=479c95b2094993473abc1fe64f842443ade1913d https://preview.redd.it/dzbffqm2gyh91.jpg?width=1119&format=pjpg&auto=webp&s=39379463853a08dd357bc5040b6b039c75084ab9 submitted by /u/Singularian2501 [link] [comments]  ( 88 min )
    [N] No-code AI: Former Microsoft and Salesforce execs reveal new ‘machine teaching’ startup Intelus - Build your model and label at the same time.
    Hi! We have opened up our new labeling AI tool to try for free. Their team pioneered the machine teaching concept and the approach allows you to automatically select and generate labels and create a model for labeling more data at the same time. Try it and let us know what you think. -Intelus.ai GeekWire: https://www.geekwire.com/2021/no-code-ai-former-microsoft-and-salesforce-execs-reveal-new-machine-teaching-startup-intelus/ submitted by /u/Signal-Hall-6808 [link] [comments]  ( 88 min )
    [D] Autoencoder vs. BERT
    Hi I am struggeling to understand the relationship between BERT and autoencoders. On a high level they both create embeddings of input data that is learned in a self-supervised way. They even share the 'encoding'-part 😉. I am working on a classification task with text (product descriptions). The SOTA on public datasets for this seems to be fine tuning BERT. However, this does not give the results I was hoping for, maybe because of a different language that BERT was pretrained on, or because of domain-specific vocabulary. My idea was to look into autoencoders to generate representations with less 'noise' (the hypothesis would be that the descriptions contain a lot if irrelevant stuff) and feed these into a classifier. My question is: Is this conceptually different from BERT? How do autoencoders relate to transformer? Are transformer in NLP kind of a evolution of autoencoders? Any help/insight would be appreciated! submitted by /u/Tober447 [link] [comments]  ( 90 min )
    [D] Salary/Payroll Anomaly Detection
    If I have 200 rows of data where employee designation, years of experience, performance rating, salary as columns. How can I find if somebody is overpaid or underpaid? submitted by /u/Ag3nt-47 [link] [comments]  ( 88 min )
    [D] Help me decide what to study and job position to pursue (short)
    I don't want to give you a 20 paragraphs explanation as others do, so I'll tell you a summary :) I love maths and I'm an Information Systems Engineer (5 years university degree) I work as a Full Stack software developer, but I miss math and reasoning. I found a lot of people that say that they work on AI, but i.e. they just call IBM Watson API for chatting bots or visual recognition. I don't consider these jobs as working on AI, and I think this kind of job will be replaced by AI at the same time as any other dev jobs. I think AI is the future and I'd like to be able to get a related job that requires reasoning, maths and be safer from automation than software development. I don't know the job position/role I should pursue and the studies I should get. Any suggerence or info will be appreciated! submitted by /u/MilanesaDePosho666 [link] [comments]  ( 107 min )
    My foundational machine learning notes [R]
    Dear all, My most recent machine learning research tutorial focuses on Learning Theory and is therefore math-intensive. To cater for people whom are new to ML, I just uploaded a new set of foundational mathematics machine learning notes: https://github.com/roboticcam/machine-learning-notes containing the following topics: Model Evaluation, Decision Tree, Simple Bayes, Regression, Neural Network and Unsupervised Learning. I used simple mathematics to explain them. Hope they are useful to you! submitted by /u/MLknowledge [link] [comments]  ( 87 min )
    ML model that can detect muscle groups [P]
    I want to create a ML/DL model that takes an image of a person as input and is able to identify different muscle groups present in the image (e.g. a shirtless picture of a person as input and the model can detect the chest, shoulders etc...). Not quite sure if this is feasible but what would be the best way to approach this? submitted by /u/FaMiFisH [link] [comments]  ( 105 min )
    [D] What is supervised ranking?
    I'm reading the paper The P-Norm Push: A Simple Convex Ranking Algorithm that Concentrates at the Top of the List and the paper seems to discuss " supervised ranking" but I'm having trouble understanding this context. The paper describes the context on pages 3 - 4 (2235 - 2236) as: There are positive instances x_1, ..., x_I There are negative instances x_1, ..., x_K X is the set of all of the these labels The goal is to come up with a function f : X -> R such which minimizes the the "Height" of each negative instance which is the number of positive instances which have a lower f The way it seems described, you could just have f map all the negative instances to -1 and all the positive instances to 1 and get a perfect answer. I'm sure I'm missing something, but I'm not sure what. The author uses the examples of movies so in that context: Is the same movie allowed to show up multiple times in the positive (or negative) list? Can the same movie show up in both the positive and negative list (maybe multiple times)? Is there something else I'm missing that makes this problem non-trivial? I've tried to Google this but I can't find anything which clarifies. Thanks! submitted by /u/paradoxinmaking [link] [comments]  ( 89 min )
    [P] How to Create a Blog Post Title Optimizer with GPT-3 and Hacker News Data
    I wrote a blog post about finetuning GPT-3 on HN to determine whether a technical blog post is good or not, and also engineer prompts to GPT-3 to generate alternate titles which can then be ranked. The code + demos is available open-source on GitHub, although the finetuned model isn't due to OpenAI rules/inability to share models. (incidentally the post did well on Hacker News) submitted by /u/minimaxir [link] [comments]  ( 107 min )
    [D] What can NeurIPS 22 authors see in Reviewer-Metareviewer Discussion period?
    Greetings! This is a new comer who submitted to NeurIPS 22 and desperately hopes somebody could help me understand the review process this time. Now the Author-Reviewer Discussion period ended and we’re in the middle of Reviewer-Metareviewer Discussion period. My question is: if a reviewer makes a comment or changes his/her score in this period, will the authors be able to see? The reason for this question is, we worked very hard on a rebuttal, but received no response in the Author-Reviewer Discussion period. I’m still hoping the AC could step in and ask something from the reviewers, but so far we saw nothing, and I wonder if it could be because authors cannot see any discussion in this period. This is not just for getting a futile peace of mind, instead, if it is really that nothing happened, could I email somebody (program chairs?) for advice? Thank you all very much in advance! submitted by /u/mission205 [link] [comments]  ( 89 min )
    [P] Open-source solutions for automatic ML model containerization
    Recently announced, the Open Model Interface (OMI) and chassis.ml open source projects, provide a standard, interoperable, secure framework for containerizing AI models. OMI provides a standard specification to containerize machine learning models. Chassis.ml converts models from multiple training tools and frameworks into OMI compatible containers to run anywhere. By implementing OMI and chassis.ml, data scientists and developers gain a standard container specification allowing interoperability, portability, and security for models to seamlessly be integrated into production applications. Why OMI and Chassis? The OMI is designed to serve as a spec for wrapping models in OCI-compliant containers with a simple yet powerful interface that makes it easier to move models into production. The OMI: Creates a uniform way to convert models into portable, containerized applications that can run anywhere – in the cloud, on-premise, or at the edge Allows teams the flexibility to continue using their existing training tools, frameworks, language, etc. and adopt a standardized container to package models that is DISA compliant, ensuring a high level of security Eliminates the need to build new integrations for model types, as OMI provides a common, well-documented interface with support OMI and Chassis.ml allow data scientists to turn their models into containerized applications without having to learn dozens of new technologies. Chassis converts models from all training tools and frameworks to OMI compliant containers to run in a number of different types of AI runtime platforms. Chassis.ml can be integrated into existing MLOps pipelines, making it possible for models to be automatically containerized, scanned, and deployed to a secure container registry. To learn more about how you can get started using chassis today, watch this video on automatic ML model containerization. submitted by /u/modzykirsten [link] [comments]  ( 89 min )
    [P] skops: New library to improve scikit-learn workflows in production
    Hello 👋 We've been working on improving workflows in putting scikit-learn models to production, enable reproducibility and find secure ways to serialize the models. (try to avoid pickle 🥒 ) For this, we developed a library called "skops", and recently launched released of v0.1. This version is mostly focused on hosting models on Hugging Face Hub: - After you train your model, you can create a card for the model programmatically. The card by default comes with hyperparameter tables and plot of your pipeline (ColumnTransformer, estimator & friends). You can add your own plots, pass metrics (they are automatically parsed into a neat markdown table) and pass information related to your model (how to use the model, limitations and more). - You can push your model to Hugging Face Hub. You can initialize a local repository, which creates a config file that contains information related to your environment (library versions etc.) an example piece of dataset and more. After you push the repository, the model card is rendered on Hub and a widget is enabled so people can play with your model. For next versions, we're planning to work on security for deserialization, creating model demo UIs easily with Gradio (with only one line of code 🤯) and more. We'd be more than happy if you could try it out and let us know about the features you'd like, issues you came across or would like to contribute. See documentation, we drafted two scripts for you to try it out: https://skops.readthedocs.io/en/stable/index.html See an example repository created with skops: https://huggingface.co/scikit-learn/tabular-playground Tutorial: https://huggingface.co/blog/skops If you want to contribute or simply show some love 🌟 our github repository: https://github.com/skops-dev/skops submitted by /u/unofficialmerve [link] [comments]  ( 89 min )
    [P] Text annotator for entity extraction that runs in your notebook
    Hi! We have just open-sourced our text annotator which runs directly in your notebook. You can now select spans of text for entity extraction and do your processing & modelling all in the same place. This allows for quick iteration when getting a project started. Here is the repository: https://github.com/dataqa/jupyter-annotate. We would be very happy to hear any feedback or comments you might have! submitted by /u/dataqa_ai [link] [comments]  ( 88 min )
    How to interpret repeating image artifacts during VQGAN training? [D]
    I'm training a VQGAN model on a custom dataset and over time I notice repeating artifacts that don't look like anything in the original images. How to interpret the occurrence of these artifacts, is it some sort of partial degradation of the network which leads to a particular part of the Generator firing more often and the Discriminator doesn't penalize it? Example: https://imgur.com/cqZIJe6 Model: https://github.com/dome272/VQGAN-pytorch submitted by /u/mselivanov [link] [comments]  ( 88 min )
    [D] Experience Grounds Language - Improving language models beyond the world of text (with multimodality, embodiment, and social interaction)
    Hi r/MachineLearning, I just published this video going over (and visualizing) this paper from 2020 that I can't stop thinking about. Hope you find it interesting. https://www.youtube.com/watch?v=WQm7-X4gts4 ​ Now that language models have been trained on massive internet-scale text data, where are future improvements going to come from? Jay goes over the "Experience Grounds Language" paper which describes five "World Scopes" for learning language -- including multimodality (e.g. training on images + text) and beyond. Contents: Introduction (0:00) Experience Grounds Language (1:20) World Scopes (2:58) World Scope 1 and 2 (3:33) World Scope 3 - Multimodality (3:56) World Scope 4 - Embodiment and Action (7:00) World Scope 5 - The Social World (9:50) Reading Excerpts from the paper (10:48) World scopes encompass each other (16:50) Interesting thought experiments (19:42) Conclusion (21:09) ​ Paper: Experience Grounds Language (2020) https://arxiv.org/abs/2004.10151 Authors: Yonatan Bisk, Ari Holtzman, Jesse Thomason, Jacob Andreas, Yoshua Bengio, Joyce Chai, Mirella Lapata, Angeliki Lazaridou, Jonathan May, Aleksandr Nisnevich, Nicolas Pinto, Joseph Turian submitted by /u/jayalammar [link] [comments]  ( 88 min )
    [Discussion] Why does Midjounery look so much nicer than DALLE?
    What are the different ways in which they could have fine tuned their code to make their output better than DALL-E? submitted by /u/v-mohan [link] [comments]  ( 88 min )
    A question about hyperparameter optimization in Multinomial Logistic Regression [P]
    I have a labelled multivariate dataset with multiple labels including 0, 1, 2, 3, 4. I can build a multiple logistic regression model and use gridsearchcv to come up with the best solver, c value and penalty. Imagine I choose to convert the labels into binary by making all >1 as =1 and then use the binary logistic regression model to optimize the hyperparameters. Am I right in stating that I sacrifice some accuracy for speed of execution by simplifying it? Does it show more nuance in the overall machine learning model? submitted by /u/Positive_Community49 [link] [comments]  ( 106 min )
  • Open

    Customize your recommendations by promoting specific items using business rules with Amazon Personalize
    Today, we are excited to announce Promotions feature in Amazon Personalize that allows you to explicitly recommend specific items to your users based on rules that align with your business goals. For instance, you can have marketing partnerships that require you to promote certain brands, in-house content, or categories that you want to improve the […]  ( 8 min )
    Amazon SageMaker JumpStart solutions now support custom IAM role settings
    Amazon SageMaker JumpStart solutions are a feature within Amazon SageMaker Studio that allow a simple-click experience to set up your own machine learning (ML) workflows. When you launch a solution, various of AWS resources are set up in your account to demonstrate how the business problem can be solved using the pre-built architecture. The solutions […]  ( 5 min )
    Intelligent document processing with AWS AI services: Part 2
    Amazon’s intelligent document processing (IDP) helps you speed up your business decision cycles and reduce costs. Across multiple industries, customers need to process millions of documents per year in the course of their business. For customers who process millions of documents, this is a critical aspect for the end-user experience and a top digital transformation […]  ( 10 min )
    Intelligent document processing with AWS AI services: Part 1
    Organizations across industries such as healthcare, finance and lending, legal, retail, and manufacturing often have to deal with a lot of documents in their day-to-day business processes. These documents contain critical information that are key to making decisions on time in order to maintain the highest levels of customer satisfaction, faster customer onboarding, and lower […]  ( 9 min )
  • Open

    Preventing Data Breaches with Extended Security Posture Management
    Extended security posture management keeps your data safe by helping IT teams to strengthen the security posture of an infrastructure. The post Preventing Data Breaches with Extended Security Posture Management appeared first on Data Science Central.  ( 20 min )
    How AI Will Impact The Accounting And Finance Industry
    Artificial Intelligence (AI) has become crucial to highly demanding industries globally. The impact of AI in the accounting and finance industry is phenomenal, and it is also innovating how they operate and build products and services. Recent AI advancements are rapidly changing the face of accounting and finance in many ways.   Labor and time-consuming tasks… Read More »How AI Will Impact The Accounting And Finance Industry The post How AI Will Impact The Accounting And Finance Industry appeared first on Data Science Central.  ( 19 min )
    AI Enabled Smart Stores: The Future of Retail
    The retail sector has undergone significant upheaval recently, and this trend will continue even as retail has begun to thrive in the digital age. Businesses are growing remarkably due to utilizing IoT technology to the fullest extent possible. It has accomplished everything by taking full advantage of the world of AI. Implementing artificial intelligence technology in the… Read More »AI Enabled Smart Stores: The Future of Retail The post AI Enabled Smart Stores: The Future of Retail appeared first on Data Science Central.  ( 17 min )
    When AGI comes – will you recognise it?
    The media has brainwashed us into thinking of AGI as something akin to the ‘Terminator’ But when (and if) AGI comes – what would it really look like? It’s tempting to thinking that AGI may be human-like  But I read two books recently which point to otherwise  Recently, the economist listed five best books to… Read More »When AGI comes – will you recognise it? The post When AGI comes – will you recognise it? appeared first on Data Science Central.  ( 17 min )
  • Open

    Looking for Deepmind implementation of Player of Games
    I tried searching the internet for already existing implementations of Deepmind's Player of Games but outside of the original paper I couldn't find much in terms of libraries or existing code. Before I throw away a large amount of time writing out the code for this, does it exist somewhere else I'm not privy to? submitted by /u/AlexMarcDewey [link] [comments]  ( 86 min )
    Actor critic in bipedal walker gym
    Hellooooo !! I stuck on a RL problem and need help :'( Im doing the bipedal walker of open ai gym and I use the actor critic algorithm to solve it but I always stuck in a local minimum near zero ( one step of agent ) . I try a lot of hyper parameter with no sucess. It seems that my actor network who have in ouptut the mean and variance of a normal law learn to do the first step but after this step the variance is too low to learn how to do a second step (it's my "theory") Here my question : it is possible to solve bipedal walker with simple actor critic or it's juste my actor critic algorithm who suck ​ Ty to read and have a good day :) submitted by /u/Cauchy_Chlasse [link] [comments]  ( 98 min )
    Anyone interested in using Atlas for reinforcement learning? 🤖 https://www.linkedin.com/posts/m%C3%A5rten-sj%C3%B6-5909799_pretty-big-news-today-i-can-announce-that-activity-6964610163441831936-8rr_
    submitted by /u/krozzoz [link] [comments]  ( 87 min )
    is a grid world with a local view a pomdp or an mdp ?
    Ideally pomdps are where your actions depend on a sequence of states , but in a grid world with a partial/local view , I don't have access to the entire state space (that is the whole gridworld) . Doea this make in an mdp or a pomdp ? submitted by /u/Cool_Abbreviations_9 [link] [comments]  ( 87 min )
  • Open

    How to deal with NaN values in prediction using LSTM or RNN, when I can't delete them and can't fill them with median nor mean values?
    Hello, So basically I have realised that I have quite a big problem. Bascially I am trying out an ideas for stock prediction, and in one case I have calculated ichimoku indicator, which contains a couple of indicators. One of them is so called chikou-span which bassically is the receded price value by 26 time intervals. Basically chikou_span[i] = price_array[i - 26] For example: if the price was: 0, 1, 2, 3, 4, 5, ... 25, 26, 27, 28 chikou-span would be: 25, 26, 27, 28, NaN, NaN, ... NaN And as you see I can't really drop NaN values since they are kind of like future values and I am trying to predict the what heppens in the future - right where the first NaN value is located. So now how should I deal with it when inserting this indicator to LSTM (along with price and others)? Or is it sensless and I just should drop this one completly? Thanks in advance for any advice :) submitted by /u/skollehatti [link] [comments]  ( 87 min )
    What is the term when I use multiple models and when in each prediction I use the one that performs the best, and where I can read on it?
    Hello, Like in topic. I know there is the term for that but I have it forgotten. To make it more precise - I have now prepared my whole dataset with various "indicators" calculated using raw data. Now I want to assemble around 20 models where each model will proccess raw data and few of these indicators, and I want these models to "co-operate" so the prediction which I will be using later will be the best prediction from each of the models. Except from the term is there any whitepaper that is worth reading regarding this topic? Thank you for you help in advance :) submitted by /u/skollehatti [link] [comments]  ( 87 min )
  • Open

    AI-Based Content Vs. Human Created Content: Which Is the Best Option?
    In the world of marketing, one of the most important aspects is content. Content can be anything from a blog post to an e-book or even an…  ( 15 min )
  • Open

    Keeping data and code together with org-mode
    With org-mode you can keep data, code, and documentation in one file. Suppose you have an org-mode file containing the following table. #+NAME: mydata | Drug | Patients | |------+----------| | X | 232 | | Y | 351 | | Z | 117 | Note that there cannot be a blank line between the […] Keeping data and code together with org-mode first appeared on John D. Cook.  ( 6 min )
  • Open

    A Practical Second-order Latent Factor Model via Distributed Particle Swarm Optimization. (arXiv:2208.06125v1 [cs.LG])
    Latent Factor (LF) models are effective in representing high-dimension and sparse (HiDS) data via low-rank matrices approximation. Hessian-free (HF) optimization is an efficient method to utilizing second-order information of an LF model's objective function and it has been utilized to optimize second-order LF (SLF) model. However, the low-rank representation ability of a SLF model heavily relies on its multiple hyperparameters. Determining these hyperparameters is time-consuming and it largely reduces the practicability of an SLF model. To address this issue, a practical SLF (PSLF) model is proposed in this work. It realizes hyperparameter self-adaptation with a distributed particle swarm optimizer (DPSO), which is gradient-free and parallelized. Experiments on real HiDS data sets indicate that PSLF model has a competitive advantage over state-of-the-art models in data representation ability.  ( 2 min )
    A Knowledge Distillation-Based Backdoor Attack in Federated Learning. (arXiv:2208.06176v1 [cs.LG])
    Federated Learning (FL) is a novel framework of decentralized machine learning. Due to the decentralized feature of FL, it is vulnerable to adversarial attacks in the training procedure, e.g. , backdoor attacks. A backdoor attack aims to inject a backdoor into the machine learning model such that the model will make arbitrarily incorrect behavior on the test sample with some specific backdoor trigger. Even though a range of backdoor attack methods of FL has been introduced, there are also methods defending against them. Many of the defending methods utilize the abnormal characteristics of the models with backdoor or the difference between the models with backdoor and the regular models. To bypass these defenses, we need to reduce the difference and the abnormal characteristics. We find a source of such abnormality is that backdoor attack would directly flip the label of data when poisoning the data. However, current studies of the backdoor attack in FL are not mainly focus on reducing the difference between the models with backdoor and the regular models. In this paper, we propose Adversarial Knowledge Distillation(ADVKD), a method combine knowledge distillation with backdoor attack in FL. With knowledge distillation, we can reduce the abnormal characteristics in model result from the label flipping, thus the model can bypass the defenses. Compared to current methods, we show that ADVKD can not only reach a higher attack success rate, but also successfully bypass the defenses when other methods fails. To further explore the performance of ADVKD, we test how the parameters affect the performance of ADVKD under different scenarios. According to the experiment result, we summarize how to adjust the parameter for better performance under different scenarios. We also use several methods to visualize the effect of different attack and explain the effectiveness of ADVKD.  ( 3 min )
    LEAF: Navigating Concept Drift in Cellular Networks. (arXiv:2109.03011v4 [cs.NI] UPDATED)
    Operational networks commonly rely on machine learning models for many tasks, including detecting anomalies, inferring application performance, and forecasting demand. Yet, unfortunately, model accuracy can degrade due to concept drift, whereby the relationship between the features and the target prediction changes due to reasons ranging from software upgrades to seasonality to changes in user behavior. Mitigating concept drift is thus an essential part of operationalizing machine learning models, and yet despite its importance, concept drift has not been extensively explored in the context of networking -- or regression models in general. Thus, it is not well-understood how to detect or mitigate it for many common network management tasks that currently rely on machine learning models. Unfortunately, as we show, concept drift cannot be sufficiently mitigated by frequently retraining models using newly available data, and doing so can even degrade model accuracy further. In this paper, we characterize concept drift in a large cellular network for a major metropolitan area in the United States. We find that concept drift occurs across many important key performance indicators (KPIs), independently of the model, training set size, and time interval -- thus necessitating practical approaches to detect, explain, and mitigate it. To do so, we develop Local Error Approximation of Features (LEAF). LEAF detects drift; explains features and time intervals that most contribute to drift; and mitigates drift using forgetting and over-sampling. We evaluate LEAF against industry-standard mitigation approaches with more than four years of cellular KPI data. Our initial tests with a major cellular provider in the US show that LEAF is effective on a variety of KPIs and models. LEAF consistently outperforms periodic and triggered retraining while reducing costly retraining operations.  ( 3 min )
    Incompleteness of graph convolutional neural networks for points clouds in three dimensions. (arXiv:2201.07136v3 [stat.ML] UPDATED)
    Graph neural networks (GNN) are very popular methods in machine learning and have been applied very successfully to the prediction of the properties of molecules and materials. First-order GNNs are well known to be incomplete, i.e., there exist graphs that are distinct but appear identical when seen through the lens of the GNN. More complicated schemes have thus been designed to increase their resolving power. Applications to molecules (and more generally, point clouds), however, add a geometric dimension to the problem. The most straightforward and prevalent approach to construct graph representation for molecules regards atoms as vertices in a graph and draws a bond between each pair of atoms within a chosen cutoff. Bonds can be decorated with the distance between atoms, and the resulting "distance graph NNs" (dGNN) have empirically demonstrated excellent resolving power and are widely used in chemical ML, with all known indistinguishable graphs being resolved in the fully-connected limit. Here we show that even for the restricted case of fully-connected graphs induced by 3D atom clouds dGNNs are not complete. We construct pairs of distinct point clouds that generate graphs that, for any cutoff radius, are equivalent based on a first-order Weisfeiler-Lehman test. This class of degenerate structures includes chemically-plausible configurations, setting an ultimate limit to the expressive power of some of the well-established GNN architectures for atomistic machine learning. Models that explicitly use angular or directional information in the description of atomic environments can resolve these degeneracies.  ( 3 min )
    Local Spatiotemporal Representation Learning for Longitudinally-consistent Neuroimage Analysis. (arXiv:2206.04281v2 [cs.CV] UPDATED)
    Recent self-supervised advances in medical computer vision exploit global and local anatomical self-similarity for pretraining prior to downstream tasks such as segmentation. However, current methods assume i.i.d. image acquisition, which is invalid in clinical study designs where follow-up longitudinal scans track subject-specific temporal changes. Further, existing self-supervised methods for medically-relevant image-to-image architectures exploit only spatial or temporal self-similarity and only do so via a loss applied at a single image-scale, with naive multi-scale spatiotemporal extensions collapsing to degenerate solutions. To these ends, this paper makes two contributions: (1) It presents a local and multi-scale spatiotemporal representation learning method for image-to-image architectures trained on longitudinal images. It exploits the spatiotemporal self-similarity of learned multi-scale intra-subject features for pretraining and develops several feature-wise regularizations that avoid collapsed identity representations; (2) During finetuning, it proposes a surprisingly simple self-supervised segmentation consistency regularization to exploit intra-subject correlation. Benchmarked in the one-shot segmentation setting, the proposed framework outperforms both well-tuned randomly-initialized baselines and current self-supervised techniques designed for both i.i.d. and longitudinal datasets. These improvements are demonstrated across both longitudinal neurodegenerative adult MRI and developing infant brain MRI and yield both higher performance and longitudinal consistency.  ( 3 min )
    Mixed-Precision Neural Networks: A Survey. (arXiv:2208.06064v1 [cs.LG])
    Mixed-precision Deep Neural Networks achieve the energy efficiency and throughput needed for hardware deployment, particularly when the resources are limited, without sacrificing accuracy. However, the optimal per-layer bit precision that preserves accuracy is not easily found, especially with the abundance of models, datasets, and quantization techniques that creates an enormous search space. In order to tackle this difficulty, a body of literature has emerged recently, and several frameworks that achieved promising accuracy results have been proposed. In this paper, we start by summarizing the quantization techniques used generally in literature. Then, we present a thorough survey of the mixed-precision frameworks, categorized according to their optimization techniques such as reinforcement learning and quantization techniques like deterministic rounding. Furthermore, the advantages and shortcomings of each framework are discussed, where we present a juxtaposition. We finally give guidelines for future mixed-precision frameworks.  ( 2 min )
    Collective Obfuscation and Crowdsourcing. (arXiv:2208.06405v1 [cs.LG])
    Crowdsourcing technologies rely on groups of people to input information that may be critical for decision-making. This work examines obfuscation in the context of reporting technologies. We show that widespread use of reporting platforms comes with unique security and privacy implications, and introduce a threat model and corresponding taxonomy to outline some of the many attack vectors in this space. We then perform an empirical analysis of a dataset of call logs from a controversial, real-world reporting hotline and identify coordinated obfuscation strategies that are intended to hinder the platform's legitimacy. We propose a variety of statistical measures to quantify the strength of this obfuscation strategy with respect to the structural and semantic characteristics of the reporting attacks in our dataset.  ( 2 min )
    Figure Descriptive Text Extraction using Ontological Representation. (arXiv:2208.06040v1 [cs.CL])
    Experimental research publications provide figure form resources including graphs, charts, and any type of images to effectively support and convey methods and results. To describe figures, authors add captions, which are often incomplete, and more descriptions reside in body text. This work presents a method to extract figure descriptive text from the body of scientific articles. We adopted ontological semantics to aid concept recognition of figure-related information, which generates human- and machine-readable knowledge representations from sentences. Our results show that conceptual models bring an improvement in figure descriptive sentence classification over word-based approaches.  ( 2 min )
    An Algorithm-Hardware Co-Optimized Framework for Accelerating N:M Sparse Transformers. (arXiv:2208.06118v1 [cs.AR])
    The Transformer has been an indispensable staple in deep learning. However, for real-life applications, it is very challenging to deploy efficient Transformers due to immense parameters and operations of models. To relieve this burden, exploiting sparsity is an effective approach to accelerate Transformers. Newly emerging Ampere GPUs leverage a 2:4 sparsity pattern to achieve model acceleration, while it can hardly meet the diverse algorithm and hardware constraints when deploying models. By contrast, we propose an algorithm-hardware co-optimized framework to flexibly and efficiently accelerate Transformers by utilizing general N:M sparsity patterns. (1) From algorithm perspective, we propose a sparsity inheritance mechanism along with an inherited dynamic pruning (IDP) method to obtain a series of N:M sparse candidate Transformers rapidly. A model compression scheme is further proposed to significantly reduce the storage requirement for deployment. (2) From hardware perspective, we present a flexible and efficient hardware architecture, namely STA, to achieve significant speedup when deploying N:M sparse Transformers. STA features not only a computing engine unifying both sparse-dense and dense-dense matrix multiplications with high computational efficiency but also a scalable softmax module eliminating the latency from intermediate off-chip data communication. Experimental results show that compared to other methods, N:M sparse Transformers, generated using IDP, achieves an average of 6.7% improvement on accuracy with high training efficiency. Moreover, STA can achieve 14.47x and 11.33x speedup compared to Intel i9-9900X and NVIDIA RTX 2080 Ti, respectively, and perform 2.00-19.47x faster inference than the state-of-the-art FPGA-based accelerators for Transformers.  ( 3 min )
    Gaussian process surrogate models for neural networks. (arXiv:2208.06028v1 [cs.LG])
    The lack of insight into deep learning systems hinders their systematic design. In science and engineering, modeling is a methodology used to understand complex systems whose internal processes are opaque. Modeling replaces a complex system with a simpler surrogate that is more amenable to interpretation. Drawing inspiration from this, we construct a class of surrogate models for neural networks using Gaussian processes. Rather than deriving the kernels for certain limiting cases of neural networks, we learn the kernels of the Gaussian process empirically from the naturalistic behavior of neural networks. We first evaluate our approach with two case studies inspired by previous theoretical studies of neural network behavior in which we capture neural network preferences for learning low frequencies and identify pathological behavior in deep neural networks. In two further practical case studies, we use the learned kernel to predict the generalization properties of neural networks.  ( 2 min )
    A Ranking Game for Imitation Learning. (arXiv:2202.03481v2 [cs.LG] UPDATED)
    We propose a new framework for imitation learning -- treating imitation as a two-player ranking-based game between a policy and a reward. In this game, the reward agent learns to satisfy pairwise performance rankings between behaviors, while the policy agent learns to maximize this reward. In imitation learning, near-optimal expert data can be difficult to obtain, and even in the limit of infinite data cannot imply a total ordering over trajectories as preferences can. On the other hand, learning from preferences alone is challenging as a large number of preferences are required to infer a high-dimensional reward function, though preference data is typically much easier to collect than expert demonstrations. The classical inverse reinforcement learning (IRL) formulation learns from expert demonstrations but provides no mechanism to incorporate learning from offline preferences and vice versa. We instantiate the proposed ranking-game framework with a novel ranking loss giving an algorithm that can simultaneously learn from expert demonstrations and preferences, gaining the advantages of both modalities. Our experiments show that the proposed method achieves state-of-the-art sample efficiency and can solve previously unsolvable tasks in the Learning from Observation (LfO) setting.  ( 3 min )
    COVID-19 forecasting using new viral variants and vaccination effectiveness models. (arXiv:2201.10356v2 [cs.LG] UPDATED)
    Background: Recently, a high number of daily positive COVID-19 cases have been reported in regions with relatively high vaccination rates; hence, booster vaccination has become necessary. In addition, infections caused by the different variants and correlated factors have not been discussed in depth. With large variabilities and different co-factors, it is difficult to use conventional mathematical models to forecast the incidence of COVID-19. Methods: Machine learning based on long short-term memory was applied to forecasting the time series of new daily positive cases (DPC), serious cases, hospitalized cases, and deaths. Data acquired from regions with high rates of vaccination, such as Israel, were blended with the current data of other regions in Japan to factor in the potential effects of vaccination. The protection provided by symptomatic infection was also considered in terms of the population effectiveness of vaccination as well as the waning protection and ratio and infectivity of viral variants. To represent changes in public behavior, public mobility and interactions through social media were also included in the analysis. Findings: Comparing the observed and estimated new DPC in Tel Aviv, Israel, the parameters characterizing vaccination effectiveness and the waning protection from infection were well estimated; the vaccination effectiveness of the second dose after 5 months and the third dose after two weeks from infection by the delta variant were 0.24 and 0.95, respectively. Using the extracted parameters regarding vaccination effectiveness, new cases in three prefectures of Japan were replicated.  ( 3 min )
    EEGNN: Edge Enhanced Graph Neural Networks. (arXiv:2208.06322v1 [stat.ML])
    Training deep graph neural networks (GNNs) poses a challenging task, as the performance of GNNs may suffer from the number of hidden message-passing layers. The literature has focused on the proposals of over-smoothing and under-reaching to explain the performance deterioration of deep GNNs. In this paper, we propose a new explanation for such deteriorated performance phenomenon, mis-simplification, that is, mistakenly simplifying graphs by preventing self-loops and forcing edges to be unweighted. We show that such simplifying can reduce the potential of message-passing layers to capture the structural information of graphs. In view of this, we propose a new framework, edge enhanced graph neural network(EEGNN). EEGNN uses the structural information extracted from the proposed Dirichlet mixture Poisson graph model, a Bayesian nonparametric model for graphs, to improve the performance of various deep message-passing GNNs. Experiments over different datasets show that our method achieves considerable performance increase compared to baselines.  ( 2 min )
    Identifying Substitute and Complementary Products for Assortment Optimization with Cleora Embeddings. (arXiv:2208.06262v1 [cs.IR])
    Recent years brought an increasing interest in the application of machine learning algorithms in e-commerce, omnichannel marketing, and the sales industry. It is not only to the algorithmic advances but also to data availability, representing transactions, users, and background product information. Finding products related in different ways, i.e., substitutes and complements is essential for users' recommendations at the vendor's site and for the vendor - to perform efficient assortment optimization. The paper introduces a novel method for finding products' substitutes and complements based on the graph embedding Cleora algorithm. We also provide its experimental evaluation with regards to the state-of-the-art Shopper algorithm, studying the relevance of recommendations with surveys from industry experts. It is concluded that the new approach presented here offers suitable choices of recommended products, requiring a minimal amount of additional information. The algorithm can be used in various enterprises, effectively identifying substitute and complementary product options.  ( 2 min )
    Quantum-classical convolutional neural networks in radiological image classification. (arXiv:2204.12390v2 [quant-ph] UPDATED)
    Quantum machine learning is receiving significant attention currently, but its usefulness in comparison to classical machine learning techniques for practical applications remains unclear. However, there are indications that certain quantum machine learning algorithms might result in improved training capabilities with respect to their classical counterparts -- which might be particularly beneficial in situations with little training data available. Such situations naturally arise in medical classification tasks. Within this paper, different hybrid quantum-classical convolutional neural networks (QCCNN) with varying quantum circuit designs and encoding techniques are proposed. They are applied to two- and three-dimensional medical imaging data, e.g. featuring different, potentially malign, lesions in computed tomography scans. The performance of these QCCNNs is already similar to the one of their classical counterparts -- therefore encouraging further studies towards the direction of applying these algorithms within medical imaging tasks.  ( 2 min )
    Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning. (arXiv:2208.06193v1 [cs.LG])
    Offline reinforcement learning (RL), which aims to learn an optimal policy using a previously collected static dataset, is an important paradigm of RL. Standard RL methods often perform poorly at this task due to the function approximation errors on out-of-distribution actions. While a variety of regularization methods have been proposed to mitigate this issue, they are often constrained by policy classes with limited expressiveness and sometimes result in substantially suboptimal solutions. In this paper, we propose Diffusion-QL that utilizes a conditional diffusion model as a highly expressive policy class for behavior cloning and policy regularization. In our approach, we learn an action-value function and we add a term maximizing action-values into the training loss of a conditional diffusion model, which results in a loss that seeks optimal actions that are near the behavior policy. We show the expressiveness of the diffusion model-based policy and the coupling of the behavior cloning and policy improvement under the diffusion model both contribute to the outstanding performance of Diffusion-QL. We illustrate our method and prior work in a simple 2D bandit example with a multimodal behavior policy. We then show that our method can achieve state-of-the-art performance on the majority of the D4RL benchmark tasks for offline RL.  ( 2 min )
    Self-Distilled Vision Transformer for Domain Generalization. (arXiv:2207.12392v2 [cs.CV] UPDATED)
    In recent past, several domain generalization (DG) methods have been proposed, showing encouraging performance, however, almost all of them build on convolutional neural networks (CNNs). There is little to no progress on studying the DG performance of vision transformers (ViTs), which are challenging the supremacy of CNNs on standard benchmarks, often built on i.i.d assumption. This renders the real-world deployment of ViTs doubtful. In this paper, we attempt to explore ViTs towards addressing the DG problem. Similar to CNNs, ViTs also struggle in out-of-distribution scenarios and the main culprit is overfitting to source domains. Inspired by the modular architecture of ViTs, we propose a simple DG approach for ViTs, coined as self-distillation for ViTs. It reduces the overfitting to source domains by easing the learning of input-output mapping problem through curating non-zero entropy supervisory signals for intermediate transformer blocks. Further, it does not introduce any new parameters and can be seamlessly plugged into the modular composition of different ViTs. We empirically demonstrate notable performance gains with different DG baselines and various ViT backbones in five challenging datasets. Moreover, we report favorable performance against recent state-of-the-art DG methods. Our code along with pre-trained models are publicly available at: https://github.com/maryam089/SDViT
    Developing a Philosophical Framework for Fair Machine Learning: The Case of Algorithmic Collusion and Market Fairness. (arXiv:2208.06308v1 [cs.LG])
    Fair machine learning research has been primarily concerned with classification tasks that result in discrimination. As machine learning algorithms are applied in new contexts, however, the harms or injustices that result are qualitatively different than those presently studied. Existing research at the level of metrics and definitions cannot measure these qualitatively different types of injustice. One example of this is the problem of market fairness and algorithmic collusion. Negative consequences of algorithmic collusion affect all consumers, not only particular members of a protected class. Drawing on this case study, I develop an ethical framework for fair machine learning research in new domains. This contribution ties the development of fairness metrics to specifically scoped normative principles. This enables fairness metrics to reflect different concerns from discrimination. I develop this framework and provide the philosophical rationale for its structure, ultimately applying it to the case of algorithmic collusion. I conclude with limitations of my proposal and discuss promising avenues of future research.
    Hybrid Approach to Identify Druglikeness Leading Compounds against SARS 3CL Protease. (arXiv:2208.06362v1 [q-bio.BM])
    SARS-COV-2 is a positive single strand RNA based macromolecule that has caused the death of more than 6.3 million people since June 2022. Moreover, by disturbing global supply chains through lockdown, the virus has indirectly caused devastating damage to the global economy. It is vital to design and develop drugs for this virus and its various variants. In this paper, we have used an In-Silico study framework to repurpose existing therapeutic agents to find drug-like bioactive molecules that could cure Covid-19. We used the Lipinski rules on the molecules retrieved from ChEMBL database to find 133 drug-likeness bioactive molecules against SARS coronavirus 3CL Protease. On the basis of standard IC50, the dataset was divided into three classes of active, inactive and intermediate. Our comparative analysis demonstrated that proposed Extra Tree Regressor (ETR) ensemble model has improved results while predicting accurate bioactivity of chemical compound relative to other state-of-the-art machine learning models. Using ADMET analysis, we identified 13 novel bioactive molecules having ChEMBL IDs 187460, 190743, 222234, 222628, 222735, 222769, 222840, 222893, 225515, 358279, 363535, 365134 and 426898. We found that these molecules are highly suitable drug candidates for SARS-COV-2 3CL Protease. These candidate molecules are further investigated for binding affinities. For this purpose, we performed molecular docking and short listed six bioactive molecules having ChEMBL IDs 187460, 222769, 225515, 358279, 363535, and 365134. These molecules can be suitable drug candidates for SARS-COV-2. It is anticipated that pharmacologist community may use these promising compounds for further vitro analysis.
    A novel solution of deep learning for enhanced support vector machine for predicting the onset of type 2 diabetes. (arXiv:2208.06354v1 [cs.LG])
    Type 2 Diabetes is one of the most major and fatal diseases known to human beings, where thousands of people are subjected to the onset of Type 2 Diabetes every year. However, the diagnosis and prevention of Type 2 Diabetes are relatively costly in today's scenario; hence, the use of machine learning and deep learning techniques is gaining momentum for predicting the onset of Type 2 Diabetes. This research aims to increase the accuracy and Area Under the Curve (AUC) metric while improving the processing time for predicting the onset of Type 2 Diabetes. The proposed system consists of a deep learning technique that uses the Support Vector Machine (SVM) algorithm along with the Radial Base Function (RBF) along with the Long Short-term Memory Layer (LSTM) for prediction of onset of Type 2 Diabetes. The proposed solution provides an average accuracy of 86.31 % and an average AUC value of 0.8270 or 82.70 %, with an improvement of 3.8 milliseconds in the processing. Radial Base Function (RBF) kernel and the LSTM layer enhance the prediction accuracy and AUC metric from the current industry standard, making it more feasible for practical use without compromising the processing time.
    Hyperbolic Molecular Representation Learning for Drug Repositioning. (arXiv:2208.06361v1 [q-bio.BM])
    Learning accurate drug representations is essential for task such as computational drug repositioning. A drug hierarchy is a valuable source that encodes knowledge of relations among drugs in a tree-like structure where drugs that act on the same organs, treat the same disease, or bind to the same biological target are grouped together. However, its utility in learning drug representations has not yet been explored, and currently described drug representations cannot place novel molecules in a drug hierarchy. Here, we develop a semi-supervised drug embedding that incorporates two sources of information: (1) underlying chemical grammar that is inferred from chemical structures of drugs and drug-like molecules (unsupervised), and (2) hierarchical relations that are encoded in an expert-crafted hierarchy of approved drugs (supervised). We use the Variational Auto-Encoder (VAE) framework to encode the chemical structures of molecules and use the drug-drug similarity information obtained from the hierarchy to induce the clustering of drugs in hyperbolic space. The hyperbolic space is amenable for encoding hierarchical relations. Our qualitative results support that the learned drug embedding can induce the hierarchical relations among drugs. We demonstrate that the learned drug embedding can be used for drug repositioning.
    An Empirical Exploration of Cross-domain Alignment between Language and Electroencephalogram. (arXiv:2208.06348v1 [q-bio.NC])
    Electroencephalography (EEG) and language have been widely explored independently for many downstream tasks (e.g., sentiment analysis, relation detection, etc.). Multimodal approaches that study both domains have not been well explored, even though in recent years, multimodal learning has been seen to be more powerful than its unimodal counterparts. In this study, we want to explore the relationship and dependency between EEG and language, i.e., how one domain reflects and represents the other. To study the relationship at the representation level, we introduced MTAM, a Multimodal Transformer Alignment Model, to observe coordinated representations between the two modalities, and thus employ the transformed representations for downstream applications. We used various relationship alignment-seeking techniques, such as Canonical Correlation Analysis and Wasserstein Distance, as loss functions to transfigure low-level language and EEG features to high-level transformed features. On downstream applications, sentiment analysis, and relation detection, we achieved new state-of-the-art results on two datasets, ZuCo and K-EmoCon. Our method achieved an F1-score improvement of 16.5% on sentiment analysis for K-EmoCon, 26.6% on sentiment analysis of ZuCo, and 31.1% on relation detection of ZuCo. In addition, we provide interpretation of the performance improvement by: (1) visualizing the original feature distribution and the transformed feature distribution, showing the effectiveness of the alignment module for discovering and encoding the relationship between EEG and language; (2) visualizing word-level and sentence-level EEG-language alignment weights, showing the influence of different language semantics as well as EEG frequency features; and (3) visualizing brain topographical maps to provide an intuitive demonstration of the connectivity of EEG and language response in the brain regions.
    Causal Imitation Learning with Unobserved Confounders. (arXiv:2208.06267v1 [cs.LG])
    One of the common ways children learn is by mimicking adults. Imitation learning focuses on learning policies with suitable performance from demonstrations generated by an expert, with an unspecified performance measure, and unobserved reward signal. Popular methods for imitation learning start by either directly mimicking the behavior policy of an expert (behavior cloning) or by learning a reward function that prioritizes observed expert trajectories (inverse reinforcement learning). However, these methods rely on the assumption that covariates used by the expert to determine her/his actions are fully observed. In this paper, we relax this assumption and study imitation learning when sensory inputs of the learner and the expert differ. First, we provide a non-parametric, graphical criterion that is complete (both necessary and sufficient) for determining the feasibility of imitation from the combinations of demonstration data and qualitative assumptions about the underlying environment, represented in the form of a causal model. We then show that when such a criterion does not hold, imitation could still be feasible by exploiting quantitative knowledge of the expert trajectories. Finally, we develop an efficient procedure for learning the imitating policy from experts' trajectories.
    On establishing learning separations between classical and quantum machine learning with classical data. (arXiv:2208.06339v1 [quant-ph])
    Despite years of effort, the quantum machine learning community has only been able to show quantum learning advantages for certain contrived cryptography-inspired datasets in the case of classical data. In this note, we discuss the challenges of finding learning problems that quantum learning algorithms can learn much faster than any classical learning algorithm, and we study how to identify such learning problems. Specifically, we reflect on the main concepts in computational learning theory pertaining to this question, and we discuss how subtle changes in definitions can mean conceptually significantly different tasks, which can either lead to a separation or no separation at all. Moreover, we study existing learning problems with a provable quantum speedup to distill sets of more general and sufficient conditions (i.e., ``checklists'') for a learning problem to exhibit a separation between classical and quantum learners. These checklists are intended to streamline one's approach to proving quantum speedups for learning problems, or to elucidate bottlenecks. Finally, to illustrate its application, we analyze examples of potential separations (i.e., when the learning problem is build from computational separations, or when the data comes from a quantum experiment) through the lens of our approach.
    Optimal Extragradient-Based Bilinearly-Coupled Saddle-Point Optimization. (arXiv:2206.08573v3 [math.OC] UPDATED)
    We consider the smooth convex-concave bilinearly-coupled saddle-point problem, $\min_{\mathbf{x}}\max_{\mathbf{y}}~F(\mathbf{x}) + H(\mathbf{x},\mathbf{y}) - G(\mathbf{y})$, where one has access to stochastic first-order oracles for $F$, $G$ as well as the bilinear coupling function $H$. Building upon standard stochastic extragradient analysis for variational inequalities, we present a stochastic \emph{accelerated gradient-extragradient (AG-EG)} descent-ascent algorithm that combines extragradient and Nesterov's acceleration in general stochastic settings. This algorithm leverages scheduled restarting to admit a fine-grained nonasymptotic convergence rate that matches known lower bounds by both \citet{ibrahim2020linear} and \citet{zhang2021lower} in their corresponding settings, plus an additional statistical error term for bounded stochastic noise that is optimal up to a constant prefactor. This is the first result that achieves such a relatively mature characterization of optimality in saddle-point optimization.
    Markov Observation Models. (arXiv:2208.06368v1 [stat.ML])
    Herein, the Hidden Markov Model is expanded to allow for Markov chain observations. In particular, the observations are assumed to be a Markov chain whose one step transition probabilities depend upon the hidden Markov chain. An Expectation-Maximization analog to the Baum-Welch algorithm is developed for this more general model to estimate the transition probabilities for both the hidden state and for the observations as well as to estimate the probabilities for the initial joint hidden-state-observation distribution. A believe state or filter recursion to track the hidden state then arises from the calculations of this Expectation-Maximization algorithm. A dynamic programming analog to the Viterbi algorithm is also developed to estimate the most likely sequence of hidden states given the sequence of observations.
    Style Spectroscope: Improve Interpretability and Controllability through Fourier Analysis. (arXiv:2208.06140v1 [cs.CV])
    Universal style transfer (UST) infuses styles from arbitrary reference images into content images. Existing methods, while enjoying many practical successes, are unable of explaining experimental observations, including different performances of UST algorithms in preserving the spatial structure of content images. In addition, methods are limited to cumbersome global controls on stylization, so that they require additional spatial masks for desired stylization. In this work, we provide a systematic Fourier analysis on a general framework for UST. We present an equivalent form of the framework in the frequency domain. The form implies that existing algorithms treat all frequency components and pixels of feature maps equally, except for the zero-frequency component. We connect Fourier amplitude and phase with Gram matrices and a content reconstruction loss in style transfer, respectively. Based on such equivalence and connections, we can thus interpret different structure preservation behaviors between algorithms with Fourier phase. Given the interpretations we have, we propose two manipulations in practice for structure preservation and desired stylization. Both qualitative and quantitative experiments demonstrate the competitive performance of our method against the state-of-the-art methods. We also conduct experiments to demonstrate (1) the abovementioned equivalence, (2) the interpretability based on Fourier amplitude and phase and (3) the controllability associated with frequency components.
    LRH-Net: A Multi-Level Knowledge Distillation Approach for Low-Resource Heart Network. (arXiv:2204.08000v2 [physics.med-ph] UPDATED)
    An electrocardiogram (ECG) monitors the electrical activity generated by the heart and is used to detect fatal cardiovascular diseases (CVDs). Conventionally, to capture the precise electrical activity, clinical experts use multiple-lead ECGs (typically 12 leads). But in recent times, large-size deep learning models have been used to detect these diseases. However, such models require heavy compute resources like huge memory and long inference time. To alleviate these shortcomings, we propose a low-parameter model, named Low Resource Heart-Network (LRH-Net), which uses fewer leads to detect ECG anomalies in a resource-constrained environment. A multi-level knowledge distillation process is used on top of that to get better generalization performance on our proposed model. The multi-level knowledge distillation process distills the knowledge to LRH-Net trained on a reduced number of leads from higher parameter (teacher) models trained on multiple leads to reduce the performance gap. The proposed model is evaluated on the PhysioNet-2020 challenge dataset with constrained input. The parameters of the LRH-Net are 106x less than our teacher model for detecting CVDs. The performance of the LRH-Net was scaled up to 3.2% and the inference time scaled down by 75% compared to the teacher model. In contrast to the compute- and parameter-intensive deep learning techniques, the proposed methodology uses a subset of ECG leads using the low resource LRH-Net, making it eminently suitable for deployment on edge devices.
    ANTI-CARLA: An Adversarial Testing Framework for Autonomous Vehicles in CARLA. (arXiv:2208.06309v1 [cs.LG])
    Despite recent advances in autonomous driving systems, accidents such as the fatal Uber crash in 2018 show these systems are still susceptible to edge cases. Such systems must be thoroughly tested and validated before being deployed in the real world to avoid such events. Testing in open-world scenarios can be difficult, time-consuming, and expensive. These challenges can be addressed by using driving simulators such as CARLA instead. A key part of such tests is adversarial testing, in which the goal is to find scenarios that lead to failures of the given system. While several independent efforts in testing have been made, a well-established testing framework that enables adversarial testing has yet to be made available for CARLA. We therefore propose ANTI-CARLA, an automated testing framework in CARLA for simulating adversarial weather conditions (e.g., heavy rain) and sensor faults (e.g., camera occlusion) that fail the system. The operating conditions in which a given system should be tested are specified in a scenario description language. The framework offers an efficient search mechanism that searches for adversarial operating conditions that will fail the tested system. In this way, ANTI-CARLA extends the CARLA simulator with the capability of performing adversarial testing on any given driving pipeline. We use ANTI-CARLA to test the driving pipeline trained with Learning By Cheating (LBC) approach. The simulation results demonstrate that ANTI-CARLA can effectively and automatically find a range of failure cases despite LBC reaching an accuracy of 100% in the CARLA benchmark.
    Toward a Better Monitoring Statistic for Profile Monitoring via Variational Autoencoders. (arXiv:1911.00482v2 [cs.LG] UPDATED)
    Wide accessibility of imaging and profile sensors in modern industrial systems created an abundance of high-dimensional sensing variables. This led to a a growing interest in the research of high-dimensional process monitoring. However, most of the approaches in the literature assume the in-control population to lie on a linear manifold with a given basis (i.e., spline, wavelet, kernel, etc) or an unknown basis (i.e., principal component analysis and its variants), which cannot be used to efficiently model profiles with a nonlinear manifold which is common in many real-life cases. We propose deep probabilistic autoencoders as a viable unsupervised learning approach to model such manifolds. To do so, we formulate nonlinear and probabilistic extensions of the monitoring statistics from classical approaches as the expected reconstruction error (ERE) and the KL-divergence (KLD) based monitoring statistics. Through extensive simulation study, we provide insights on why latent-space based statistics are unreliable and why residual-space based ones typically perform much better for deep learning based approaches. Finally, we demonstrate the superiority of deep probabilistic models via both simulation study and a real-life case study involving images of defects from a hot steel rolling process.
    Data-Driven Fault Diagnosis Analysis and Open-Set Classification of Time-Series Data. (arXiv:2009.04756v2 [stat.ML] UPDATED)
    Fault diagnosis of dynamic systems is done by detecting changes in time-series data, for example residuals, caused by system degradation and faulty components. The use of general-purpose multi-class classification methods for fault diagnosis is complicated by imbalanced training data and unknown fault classes. Another complicating factor is that different fault classes can result in similar residual outputs, especially for small faults, which causes classification ambiguities. In this work, a framework for data-driven analysis and open-set classification is developed for fault diagnosis applications using the Kullback-Leibler divergence. A data-driven fault classification algorithm is proposed which can handle imbalanced datasets, class overlapping, and unknown faults. In addition, an algorithm is proposed to estimate the size of the fault when training data contains information from known fault realizations. An advantage of the proposed framework is that it can also be used for quantitative analysis of fault diagnosis performance, for example, to analyze how easy it is to classify faults of different magnitudes. To evaluate the usefulness of the proposed methods, multiple datasets from different fault scenarios have been collected from an internal combustion engine test bench to illustrate the design process of a data-driven diagnosis system, including quantitative fault diagnosis analysis and evaluation of the developed open set fault classification algorithm.
    Auto-Encoding Adversarial Imitation Learning. (arXiv:2206.11004v2 [cs.LG] UPDATED)
    Reinforcement learning (RL) provides a powerful framework for decision-making, but its application in practice often requires a carefully designed reward function. Adversarial Imitation Learning (AIL) sheds light on automatic policy acquisition without access to the reward signal from the environment. In this work, we propose Auto-Encoding Adversarial Imitation Learning (AEAIL), a robust and scalable AIL framework. To induce expert policies from demonstrations, AEAIL utilizes the reconstruction error of an auto-encoder as a reward signal, which provides more information for optimizing policies than the prior discriminator-based ones. Subsequently, we use the derived objective functions to train the auto-encoder and the agent policy. Experiments show that our AEAIL performs superior compared to state-of-the-art methods in the MuJoCo environments. More importantly, AEAIL shows much better robustness when the expert demonstrations are noisy. Specifically, our method achieves $16.4\%$ and $47.2\%$ relative improvement overall compared to the best baseline FAIRL and PWIL on clean and noisy expert data, respectively. Video results, open-source code and dataset are available in https://sites.google.com/view/auto-encoding-imitation.  ( 2 min )
    Semi-automatic tuning of coupled climate models with multiple intrinsic timescales: lessons learned from the Lorenz96 model. (arXiv:2208.06243v1 [physics.ao-ph])
    The objective of this study is to evaluate the potential for History Matching (HM) to tune a climate system with multi-scale dynamics. By considering a toy climate model, namely, the two-scale Lorenz96 model and producing experiments in perfect-model setting, we explore in detail how several built-in choices need to be carefully tested. We also demonstrate the importance of introducing physical expertise in the range of parameters, a priori to running HM. Finally we revisit a classical procedure in climate model tuning, that consists of tuning the slow and fast components separately. By doing so in the Lorenz96 model, we illustrate the non-uniqueness of plausible parameters and highlight the specificity of metrics emerging from the coupling. This paper contributes also to bridging the communities of uncertainty quantification, machine learning and climate modeling, by making connections between the terms used by each community for the same concept and presenting promising collaboration avenues that would benefit climate modeling research.
    Scalable and Sparsity-Aware Privacy-Preserving K-means Clustering with Application to Fraud Detection. (arXiv:2208.06093v1 [cs.LG])
    K-means is one of the most widely used clustering models in practice. Due to the problem of data isolation and the requirement for high model performance, how to jointly build practical and secure K-means for multiple parties has become an important topic for many applications in the industry. Existing work on this is mainly of two types. The first type has efficiency advantages, but information leakage raises potential privacy risks. The second type is provable secure but is inefficient and even helpless for the large-scale data sparsity scenario. In this paper, we propose a new framework for efficient sparsity-aware K-means with three characteristics. First, our framework is divided into a data-independent offline phase and a much faster online phase, and the offline phase allows to pre-compute almost all cryptographic operations. Second, we take advantage of the vectorization techniques in both online and offline phases. Third, we adopt a sparse matrix multiplication for the data sparsity scenario to improve efficiency further. We conduct comprehensive experiments on three synthetic datasets and deploy our model in a real-world fraud detection task. Our experimental results show that, compared with the state-of-the-art solution, our model achieves competitive performance in terms of both running time and communication size, especially on sparse datasets.
    PRIF: Primary Ray-based Implicit Function. (arXiv:2208.06143v1 [cs.CV])
    We introduce a new implicit shape representation called Primary Ray-based Implicit Function (PRIF). In contrast to most existing approaches based on the signed distance function (SDF) which handles spatial locations, our representation operates on oriented rays. Specifically, PRIF is formulated to directly produce the surface hit point of a given input ray, without the expensive sphere-tracing operations, hence enabling efficient shape extraction and differentiable rendering. We demonstrate that neural networks trained to encode PRIF achieve successes in various tasks including single shape representation, category-wise shape generation, shape completion from sparse or noisy observations, inverse rendering for camera pose estimation, and neural rendering with color.
    Non-Autoregressive Sign Language Production via Knowledge Distillation. (arXiv:2208.06183v1 [cs.LG])
    Sign Language Production (SLP) aims to translate expressions in spoken language into corresponding ones in sign language, such as skeleton-based sign poses or videos. Existing SLP models are either AutoRegressive (AR) or Non-Autoregressive (NAR). However, AR-SLP models suffer from regression to the mean and error propagation during decoding. NSLP-G, a NAR-based model, resolves these issues to some extent but engenders other problems. For example, it does not consider target sign lengths and suffers from false decoding initiation. We propose a novel NAR-SLP model via Knowledge Distillation (KD) to address these problems. First, we devise a length regulator to predict the end of the generated sign pose sequence. We then adopt KD, which distills spatial-linguistic features from a pre-trained pose encoder to alleviate false decoding initiation. Extensive experiments show that the proposed approach significantly outperforms existing SLP models in both Frechet Gesture Distance and Back-Translation evaluation.  ( 2 min )
    Deep is a Luxury We Don't Have. (arXiv:2208.06066v1 [cs.CV])
    Medical images come in high resolutions. A high resolution is vital for finding malignant tissues at an early stage. Yet, this resolution presents a challenge in terms of modeling long range dependencies. Shallow transformers eliminate this problem, but they suffer from quadratic complexity. In this paper, we tackle this complexity by leveraging a linear self-attention approximation. Through this approximation, we propose an efficient vision model called HCT that stands for High resolution Convolutional Transformer. HCT brings transformers' merits to high resolution images at a significantly lower cost. We evaluate HCT using a high resolution mammography dataset. HCT is significantly superior to its CNN counterpart. Furthermore, we demonstrate HCT's fitness for medical images by evaluating its effective receptive field.Code available at https://bit.ly/3ykBhhf
    Forecasting COVID-19 spreading trough an ensemble of classical and machine learning models: Spain's case study. (arXiv:2207.05753v2 [cs.LG] UPDATED)
    In this work we evaluate the applicability of an ensemble of population models and machine learning models to predict the near future evolution of the COVID-19 pandemic, with a particular use case in Spain. We rely solely in open and public datasets, fusing incidence, vaccination, human mobility and weather data to feed our machine learning models (Random Forest, Gradient Boosting, k-Nearest Neighbours and Kernel Ridge Regression). We use the incidence data to adjust classic population models (Gompertz, Logistic, Richards, Bertalanffy) in order to be able to better capture the trend of the data. We then ensemble these two families of models in order to obtain a more robust and accurate prediction. Furthermore, we have observed an improvement in the predictions obtained with machine learning models as we add new features (vaccines, mobility, climatic conditions), analyzing the importance of each of them using Shapley Additive Explanation values. As in any other modelling work, data and predictions quality have several limitations and therefore they must be seen from a critical standpoint, as we discuss in the text. Our work concludes that the ensemble use of these models improves the individual predictions (using only machine learning models or only population models) and can be applied, with caution, in cases when compartmental models cannot be utilized due to the lack of relevant data.
    A Fast Blockchain-based Federated Learning Framework with Compressed Communications. (arXiv:2208.06095v1 [cs.LG])
    Recently, blockchain-based federated learning (BFL) has attracted intensive research attention due to that the training process is auditable and the architecture is serverless avoiding the single point failure of the parameter server in vanilla federated learning (VFL). Nevertheless, BFL tremendously escalates the communication traffic volume because all local model updates (i.e., changes of model parameters) obtained by BFL clients will be transmitted to all miners for verification and to all clients for aggregation. In contrast, the parameter server and clients in VFL only retain aggregated model updates. Consequently, the huge communication traffic in BFL will inevitably impair the training efficiency and hinder the deployment of BFL in reality. To improve the practicality of BFL, we are among the first to propose a fast blockchain-based communication-efficient federated learning framework by compressing communications in BFL, called BCFL. Meanwhile, we derive the convergence rate of BCFL with non-convex loss. To maximize the final model accuracy, we further formulate the problem to minimize the training loss of the convergence rate subject to a limited training time with respect to the compression rate and the block generation rate, which is a bi-convex optimization problem and can be efficiently solved. To the end, to demonstrate the efficiency of BCFL, we carry out extensive experiments with standard CIFAR-10 and FEMNIST datasets. Our experimental results not only verify the correctness of our analysis, but also manifest that BCFL can remarkably reduce the communication traffic by 95-98% or shorten the training time by 90-95% compared with BFL.
    Incorporating Customer Reviews in Size and Fit Recommendation systems for Fashion E-Commerce. (arXiv:2208.06261v1 [cs.IR])
    With the huge growth in e-commerce domain, product recommendations have become an increasing field of interest amongst e-commerce companies. One of the more difficult tasks in product recommendations is size and fit predictions. There are a lot of size related returns and refunds in e-fashion domain which causes inconvenience to the customers as well as costs the company. Thus having a good size and fit recommendation system, which can predict the correct sizes for the customers will not only reduce size related returns and refunds but also improve customer experience. Early works in this field used traditional machine learning approaches to estimate customer and product sizes from purchase history. These methods suffered from cold start problem due to huge sparsity in the customer-product data. More recently, people have used deep learning to address this problem by embedding customer and product features. But none of them incorporates valuable customer feedback present on product pages along with the customer and product features. We propose a novel approach which can use information from customer reviews along with customer and product features for size and fit predictions. We demonstrate the effectiveness of our approach compared to using just product and customer features on 4 datasets. Our method shows an improvement of 1.37% - 4.31% in F1 (macro) score over the baseline across the 4 different datasets.
    fairDMS: Rapid Model Training by Data and Model Reuse. (arXiv:2204.09805v3 [cs.LG] UPDATED)
    Extracting actionable information rapidly from data produced by instruments such as the Linac Coherent Light Source (LCLS-II) and Advanced Photon Source Upgrade (APS-U) is becoming ever more challenging due to high (up to TB/s) data rates. Conventional physics-based information retrieval methods are hard-pressed to detect interesting events fast enough to enable timely focusing on a rare event or correction of an error. Machine learning~(ML) methods that learn cheap surrogate classifiers present a promising alternative, but can fail catastrophically when changes in instrument or sample result in degradation in ML performance. To overcome such difficulties, we present a new data storage and ML model training architecture designed to organize large volumes of data and models so that when model degradation is detected, prior models and/or data can be queried rapidly and a more suitable model retrieved and fine-tuned for new conditions. We show that our approach can achieve up to 100x data labelling speedup compared to the current state-of-the-art, 200x improvement in training speed, and 92x speedup in-terms of end-to-end model updating time.  ( 3 min )
    Region-Based Evidential Deep Learning to Quantify Uncertainty and Improve Robustness of Brain Tumor Segmentation. (arXiv:2208.06038v1 [eess.IV])
    Despite recent advances in the accuracy of brain tumor segmentation, the results still suffer from low reliability and robustness. Uncertainty estimation is an efficient solution to this problem, as it provides a measure of confidence in the segmentation results. The current uncertainty estimation methods based on quantile regression, Bayesian neural network, ensemble, and Monte Carlo dropout are limited by their high computational cost and inconsistency. In order to overcome these challenges, Evidential Deep Learning (EDL) was developed in recent work but primarily for natural image classification. In this paper, we proposed a region-based EDL segmentation framework that can generate reliable uncertainty maps and robust segmentation results. We used the Theory of Evidence to interpret the output of a neural network as evidence values gathered from input features. Following Subjective Logic, evidence was parameterized as a Dirichlet distribution, and predicted probabilities were treated as subjective opinions. To evaluate the performance of our model on segmentation and uncertainty estimation, we conducted quantitative and qualitative experiments on the BraTS 2020 dataset. The results demonstrated the top performance of the proposed method in quantifying segmentation uncertainty and robustly segmenting tumors. Furthermore, our proposed new framework maintained the advantages of low computational cost and easy implementation and showed the potential for clinical application.
    Multi-Agent Reinforcement Learning with Graph Convolutional Neural Networks for optimal Bidding Strategies of Generation Units in Electricity Markets. (arXiv:2208.06242v1 [cs.AI])
    Finding optimal bidding strategies for generation units in electricity markets would result in higher profit. However, it is a challenging problem due to the system uncertainty which is due to the unknown other generation units' strategies. Distributed optimization, where each entity or agent decides on its bid individually, has become state of the art. However, it cannot overcome the challenges of system uncertainties. Deep reinforcement learning is a promising approach to learn the optimal strategy in uncertain environments. Nevertheless, it is not able to integrate the information on the spatial system topology in the learning process. This paper proposes a distributed learning algorithm based on deep reinforcement learning (DRL) combined with a graph convolutional neural network (GCN). In fact, the proposed framework helps the agents to update their decisions by getting feedback from the environment so that it can overcome the challenges of the uncertainties. In this proposed algorithm, the state and connection between nodes are the inputs of the GCN, which can make agents aware of the structure of the system. This information on the system topology helps the agents to improve their bidding strategies and increase the profit. We evaluate the proposed algorithm on the IEEE 30-bus system under different scenarios. Also, to investigate the generalization ability of the proposed approach, we test the trained model on IEEE 39-bus system. The results show that the proposed algorithm has more generalization abilities compare to the DRL and can result in higher profit when changing the topology of the system.
    Conditional Antibody Design as 3D Equivariant Graph Translation. (arXiv:2208.06073v1 [q-bio.BM])
    Antibody design is valuable for therapeutic usage and biological research. Existing deep-learning-based methods encounter several key issues: 1) incomplete context for Complementarity-Determining Regions (CDRs) generation; 2) incapable of capturing the entire 3D geometry of the input structure; 3) inefficient prediction of the CDR sequences in an autoregressive manner. In this paper, we propose Multi-channel Equivariant Attention Network (MEAN), an end-to-end model that is able to co-design 1D sequences and 3D structures of CDRs. To be specific, MEAN formulates antibody design as a conditional graph translation problem by importing extra components including the target antigen and the light chain of the antibody. Then, MEAN resorts to E(3)-equivariant message passing along with a proposed attention mechanism to better capture the geometrical correlation between different components. Finally, it outputs both the 1D sequences and 3D structure via a multi-round progressive full-shot scheme, which enjoys more efficiency against previous autoregressive approaches. Our method significantly surpasses state-of-the-art models in sequence and structure modeling, antigen-binding antibody design, and binding affinity optimization. Specifically, the relative improvement to baselines is about 22% in antigen-binding CDR design and 34% for affinity optimization.  ( 2 min )
    Hypergraph Modeling via Spectral Embedding Connection: Hypergraph Cut, Weighted Kernel $k$-means, and Heat Kernel. (arXiv:2203.09888v2 [cs.LG] UPDATED)
    We propose a theoretical framework of multi-way similarity to model real-valued data into hypergraphs for clustering via spectral embedding. For graph cut based spectral clustering, it is common to model real-valued data into graph by modeling pairwise similarities using kernel function. This is because the kernel function has a theoretical connection to the graph cut. For problems where using multi-way similarities are more suitable than pairwise ones, it is natural to model as a hypergraph, which is generalization of a graph. However, although the hypergraph cut is well-studied, there is not yet established a hypergraph cut based framework to model multi-way similarity. In this paper, we formulate multi-way similarities by exploiting the theoretical foundation of kernel function. We show a theoretical connection between our formulation and hypergraph cut in two ways, generalizing both weighted kernel $k$-means and the heat kernel, by which we justify our formulation. We also provide a fast algorithm for spectral clustering. Our algorithm empirically shows better performance than existing graph and other heuristic modeling methods.
    AutoShard: Automated Embedding Table Sharding for Recommender Systems. (arXiv:2208.06399v1 [cs.LG])
    Embedding learning is an important technique in deep recommendation models to map categorical features to dense vectors. However, the embedding tables often demand an extremely large number of parameters, which become the storage and efficiency bottlenecks. Distributed training solutions have been adopted to partition the embedding tables into multiple devices. However, the embedding tables can easily lead to imbalances if not carefully partitioned. This is a significant design challenge of distributed systems named embedding table sharding, i.e., how we should partition the embedding tables to balance the costs across devices, which is a non-trivial task because 1) it is hard to efficiently and precisely measure the cost, and 2) the partition problem is known to be NP-hard. In this work, we introduce our novel practice in Meta, namely AutoShard, which uses a neural cost model to directly predict the multi-table costs and leverages deep reinforcement learning to solve the partition problem. Experimental results on an open-sourced large-scale synthetic dataset and Meta's production dataset demonstrate the superiority of AutoShard over the heuristics. Moreover, the learned policy of AutoShard can transfer to sharding tasks with various numbers of tables and different ratios of the unseen tables without any fine-tuning. Furthermore, AutoShard can efficiently shard hundreds of tables in seconds. The effectiveness, transferability, and efficiency of AutoShard make it desirable for production use. Our algorithms have been deployed in Meta production environment. A prototype is available at https://github.com/daochenzha/autoshard
    Explainable Identification of Dementia from Transcripts using Transformer Networks. (arXiv:2109.06980v3 [cs.CL] UPDATED)
    Alzheimer's disease (AD) is the main cause of dementia which is accompanied by loss of memory and may lead to severe consequences in peoples' everyday life if not diagnosed on time. Very few works have exploited transformer-based networks and despite the high accuracy achieved, little work has been done in terms of model interpretability. In addition, although Mini-Mental State Exam (MMSE) scores are inextricably linked with the identification of dementia, research works face the task of dementia identification and the task of the prediction of MMSE scores as two separate tasks. In order to address these limitations, we employ several transformer-based models, with BERT achieving the highest accuracy accounting for 87.50%. Concurrently, we propose an interpretable method to detect AD patients based on siamese networks reaching accuracy up to 83.75%. Next, we introduce two multi-task learning models, where the main task refers to the identification of dementia (binary classification), while the auxiliary one corresponds to the identification of the severity of dementia (multiclass classification). Our model obtains accuracy equal to 86.25% on the detection of AD patients in the multi-task learning setting. Finally, we present some new methods to identify the linguistic patterns used by AD patients and non-AD ones, including text statistics, vocabulary uniqueness, word usage, correlations via a detailed linguistic analysis, and explainability techniques (LIME). Findings indicate significant differences in language between AD and non-AD patients.
    Towards a Grounded Theory of Causation for Embodied AI. (arXiv:2206.13973v2 [cs.AI] UPDATED)
    There exist well-developed frameworks for causal modelling, but these require rather a lot of human domain expertise to define causal variables and perform interventions. In order to enable autonomous agents to learn abstract causal models through interactive experience, the existing theoretical foundations need to be extended and clarified. Existing frameworks give no guidance regarding variable choice / representation, and more importantly, give no indication as to which behaviour policies or physical transformations of state space shall count as interventions. The framework sketched in this paper describes actions as transformations of state space, for instance induced by an agent running a policy. This makes it possible to describe in a uniform way both transformations of the micro-state space and abstract models thereof, and say when the latter is veridical / grounded / natural. We then introduce (causal) variables, define a mechanism as an invariant predictor, and say when an action can be viewed as a ``surgical intervention'', thus bringing the objective of causal representation \& intervention skill learning into clearer focus.
    An investigation on selecting audio pre-trained models for audio captioning. (arXiv:2208.06127v1 [cs.SD])
    Audio captioning is a task that generates description of audio based on content. Pre-trained models are widely used in audio captioning due to high complexity. Unless a comprehensive system is re-trained, it is hard to determine how well pre-trained models contribute to audio captioning system. To prevent the time consuming and energy consuming process of retraining, it is necessary to propose a preditor of performance for the pre-trained model in audio captioning. In this paper, a series of pre-trained models are investigated for the correlation between extracted audio features and the performance of audio captioning. A couple of predictor is proposed based on the experiment results.The result demonstrates that the kurtosis and skewness of audio features extracted may act as an indicator of the performance of audio captioning systems for pre-trained audio due to the high correlation between kurtosis and skewness of audio features and the performance of audio captioning systems.
    Anatomy-XNet: An Anatomy Aware Convolutional Neural Network for Thoracic Disease Classification in Chest X-rays. (arXiv:2106.05915v3 [eess.IV] UPDATED)
    Thoracic disease detection from chest radiographs using deep learning methods has been an active area of research in the last decade. Most previous methods attempt to focus on the diseased organs of the image by identifying spatial regions responsible for significant contributions to the model's prediction. In contrast, expert radiologists first locate the prominent anatomical structures before determining if those regions are anomalous. Therefore, integrating anatomical knowledge within deep learning models could bring substantial improvement in automatic disease classification. Motivated by this, we propose Anatomy-XNet, an anatomy-aware attention-based thoracic disease classification network that prioritizes the spatial features guided by the pre-identified anatomy regions. We adopt a semi-supervised learning method by utilizing available small-scale organ-level annotations to locate the anatomy regions in large-scale datasets where the organ-level annotations are absent. The proposed Anatomy-XNet uses the pre-trained DenseNet-121 as the backbone network with two corresponding structured modules, the Anatomy Aware Attention (A$^3$) and Probabilistic Weighted Average Pooling (PWAP), in a cohesive framework for anatomical attention learning. We experimentally show that our proposed method sets a new state-of-the-art benchmark by achieving an AUC score of 85.78%, 92.07%, and, 84.04% on three publicly available large-scale CXR datasets--NIH, Stanford CheXpert, and MIMIC-CXR, respectively. This not only proves the efficacy of utilizing the anatomy segmentation knowledge to improve the thoracic disease classification but also demonstrates the generalizability of the proposed framework.
    A Scalable Probabilistic Model for Reward Optimizing Slate Recommendation. (arXiv:2208.06263v1 [cs.IR])
    We introduce Probabilistic Rank and Reward model (PRR), a scalable probabilistic model for personalized slate recommendation. Our model allows state-of-the-art estimation of user interests in the following ubiquitous recommender system scenario: A user is shown a slate of K recommendations and the user chooses at most one of these K items. It is the goal of the recommender system to find the K items of most interest to a user in order to maximize the probability that the user interacts with the slate. Our contribution is to show that we can learn more effectively the probability of the recommendations being successful by combining the reward - whether the slate was clicked or not - and the rank - the item on the slate that was selected. Our method learns more efficiently than bandit methods that use only the reward, and user preference methods that use only the rank. It also provides similar or better estimation performance to independent inverse-propensity-score methods and is far more scalable. Our method is state of the art in terms of both speed and accuracy on massive datasets with up to 1 million items. Finally, our method allows fast delivery of recommendations powered by maximum inner product search (MIPS), making it suitable in extremely low latency domains such as computational advertising.  ( 3 min )
    Root-aligned SMILES: A Tight Representation for Chemical Reaction Prediction. (arXiv:2203.11444v5 [cs.LG] UPDATED)
    Chemical reaction prediction, involving forward synthesis and retrosynthesis prediction, is a fundamental problem in organic synthesis. A popular computational paradigm formulates synthesis prediction as a sequence-to-sequence translation problem, where the typical SMILES is adopted for molecule representations. However, the general-purpose SMILES neglects the characteristics of chemical reactions, where the molecular graph topology is largely unaltered from reactants to products, resulting in the suboptimal performance of SMILES if straightforwardly applied. In this article, we propose the root-aligned SMILES (R-SMILES), which specifies a tightly aligned one-to-one mapping between the product and the reactant SMILES for more efficient synthesis prediction. Due to the strict one-to-one mapping and reduced edit distance, the computational model is largely relieved from learning the complex syntax and dedicated to learning the chemical knowledge for reactions. We compare the proposed R-SMILES with various state-of-the-art baselines and show that it significantly outperforms them all, demonstrating the superiority of the proposed method.
    Safety and Performance, Why not Both? Bi-Objective Optimized Model Compression toward AI Software Deployment. (arXiv:2208.05969v1 [cs.LG])
    The size of deep learning models in artificial intelligence (AI) software is increasing rapidly, which hinders the large-scale deployment on resource-restricted devices (e.g., smartphones). To mitigate this issue, AI software compression plays a crucial role, which aims to compress model size while keeping high performance. However, the intrinsic defects in the big model may be inherited by the compressed one. Such defects may be easily leveraged by attackers, since the compressed models are usually deployed in a large number of devices without adequate protection. In this paper, we try to address the safe model compression problem from a safety-performance co-optimization perspective. Specifically, inspired by the test-driven development (TDD) paradigm in software engineering, we propose a test-driven sparse training framework called SafeCompress. By simulating the attack mechanism as the safety test, SafeCompress can automatically compress a big model to a small one following the dynamic sparse training paradigm. Further, considering a representative attack, i.e., membership inference attack (MIA), we develop a concrete safe model compression mechanism, called MIA-SafeCompress. Extensive experiments are conducted to evaluate MIA-SafeCompress on five datasets for both computer vision and natural language processing tasks. The results verify the effectiveness and generalization of our method. We also discuss how to adapt SafeCompress to other attacks besides MIA, demonstrating the flexibility of SafeCompress.
    Transformers Can Do Bayesian Inference. (arXiv:2112.10510v5 [cs.LG] UPDATED)
    Currently, it is hard to reap the benefits of deep learning for Bayesian methods, which allow the explicit specification of prior knowledge and accurately capture model uncertainty. We present Prior-Data Fitted Networks (PFNs). PFNs leverage large-scale machine learning techniques to approximate a large set of posteriors. The only requirement for PFNs to work is the ability to sample from a prior distribution over supervised learning tasks (or functions). Our method restates the objective of posterior approximation as a supervised classification problem with a set-valued input: it repeatedly draws a task (or function) from the prior, draws a set of data points and their labels from it, masks one of the labels and learns to make probabilistic predictions for it based on the set-valued input of the rest of the data points. Presented with a set of samples from a new supervised learning task as input, PFNs make probabilistic predictions for arbitrary other data points in a single forward propagation, having learned to approximate Bayesian inference. We demonstrate that PFNs can near-perfectly mimic Gaussian processes and also enable efficient Bayesian inference for intractable problems, with over 200-fold speedups in multiple setups compared to current methods. We obtain strong results in very diverse areas such as Gaussian process regression, Bayesian neural networks, classification for small tabular data sets, and few-shot image classification, demonstrating the generality of PFNs. Code and trained PFNs are released at https://github.com/automl/TransformersCanDoBayesianInference.
    Enhancing Oceanic Variables Forecast in the Santos Channel by Estimating Model Error with Random Forests. (arXiv:2208.05966v1 [physics.ao-ph])
    In this work we improve forecasting of Sea Surface Height (SSH) and current velocity (speed and direction) in oceanic scenarios. We do so by resorting to Random Forests so as to predict the error of a numerical forecasting system developed for the Santos Channel in Brazil. We have used the Santos Operational Forecasting System (SOFS) and data collected in situ between the years of 2019 and 2021. In previous studies we have applied similar methods for current velocity in the channel entrance, in this work we expand the application to improve the SHH forecast and include four other stations in the channel. We have obtained an average reduction of 11.9% in forecasting Root-Mean Square Error (RMSE) and 38.7% in bias with our approach. We also obtained an increase of Agreement (IOA) in 10 of the 14 combinations of forecasted variables and stations.
    MILAN: Masked Image Pretraining on Language Assisted Representation. (arXiv:2208.06049v1 [cs.CV])
    Self-attention based transformer models have been dominating many computer vision tasks in the past few years. Their superb model qualities heavily depend on the excessively large labeled image datasets. In order to reduce the reliance on large labeled datasets, reconstruction based masked autoencoders are gaining popularity, which learn high quality transferable representations from unlabeled images. For the same purpose, recent weakly supervised image pretraining methods explore language supervision from text captions accompanying the images. In this work, we propose masked image pretraining on language assisted representation, dubbed as MILAN. Instead of predicting raw pixels or low level features, our pretraining objective is to reconstruct the image features with substantial semantic signals that are obtained using caption supervision. Moreover, to accommodate our reconstruction target, we propose a more efficient prompting decoder architecture and a semantic aware mask sampling mechanism, which further advance the transfer performance of the pretrained model. Experimental results demonstrate that MILAN delivers higher accuracy than the previous works. When the masked autoencoder is pretrained and finetuned on ImageNet-1K dataset with an input resolution of 224x224, MILAN achieves a top-1 accuracy of 85.4% on ViTB/16, surpassing previous state-of-the-arts by 1%. In the downstream semantic segmentation task, MILAN achieves 52.7 mIoU using ViT-B/16 backbone on ADE20K dataset, outperforming previous masked pretraining results by 4 points.
    Response Component Analysis for Sea State Estimation Using Artificial Neural Networks and Vessel Response Spectral Data. (arXiv:2205.02375v2 [cs.LG] UPDATED)
    The use of the `ship as a wave buoy analogy' (SAWB) provides a novel means to estimate sea states, where relationships are established between causal wave properties and vessel motion response information. This study focuses on a model-free machine learning approach to SAWB-based sea state estimation (SSE), using neural networks (NNs) to map vessel response spectral data to statistical wave properties for a small uninhabited surface vessel. Results showed a strong correlation between heave responses and significant wave height estimates, whilst the accuracy of mean wave period and wave heading predictions were observed to improve considerably when data from multiple vessel degrees of freedom (DOFs) was utilized. Overall, 3-DOF (heave, pitch and roll) NNs for SSE were shown to perform well when compared to existing SSE approaches that use similar simulation setups. One advantage of using small vessels for SAWB was shown as SSE accuracy was reasonable even when motion responses were low (in high-frequency, low wave height sea states). Given the information-dense statistical representation of vessel motion responses in spectral form, as well as the ability of NNs to effectively model complex relationships between variables, the designed SSE method shows promise for future adaptation to mobile SSE systems using the SAWB approach.
    A Probabilistic Framework for Mutation Testing in Deep Neural Networks. (arXiv:2208.06018v1 [cs.SE])
    Context: Mutation Testing (MT) is an important tool in traditional Software Engineering (SE) white-box testing. It aims to artificially inject faults in a system to evaluate a test suite's capability to detect them, assuming that the test suite defects finding capability will then translate to real faults. If MT has long been used in SE, it is only recently that it started gaining the attention of the Deep Learning (DL) community, with researchers adapting it to improve the testability of DL models and improve the trustworthiness of DL systems. Objective: If several techniques have been proposed for MT, most of them neglected the stochasticity inherent to DL resulting from the training phase. Even the latest MT approaches in DL, which propose to tackle MT through a statistical approach, might give inconsistent results. Indeed, as their statistic is based on a fixed set of sampled training instances, it can lead to different results across instances set when results should be consistent for any instance. Methods: In this work, we propose a Probabilistic Mutation Testing (PMT) approach that alleviates the inconsistency problem and allows for a more consistent decision on whether a mutant is killed or not. Results: We show that PMT effectively allows a more consistent and informed decision on mutations through evaluation using three models and eight mutation operators used in previously proposed MT methods. We also analyze the trade-off between the approximation error and the cost of our method, showing that relatively small error can be achieved for a manageable cost. Conclusion: Our results showed the limitation of current MT practices in DNN and the need to rethink them. We believe PMT is the first step in that direction which effectively removes the lack of consistency across test executions of previous methods caused by the stochasticity of DNN training.
    Variational Quantum Approximate Support Vector Machine With Inference Transfer. (arXiv:2206.14507v2 [quant-ph] UPDATED)
    A kernel-based quantum classifier is the most interesting and powerful quantum machine learning technique for hyperlinear classification of complex data, which can be easily realized in shallow-depth quantum circuits such as a SWAP test classifier. A variational quantum approximate support vector machine (VQASVM) can be realized inherently and explicitly on these circuits by introduction of a variational scheme to map the quadratic optimization problem of the support vector machine theory to a quantum-classical variational optimization problem. Probability weight modulation in index qubits of a classifier can designate support vectors among training vectors, which can be achieved with a parameterized quantum circuit (PQC). The classical parameters of PQC is then transferred to many copies of other decision inference circuits. Our VQASVM algorithm is experimented with toy example data sets on cloud-based quantum machines for feasibility evaluation, and numerically investigated to evaluate its performance on a standard iris flower and MNIST data set. The empirical run-time complexity of VQASVM is estimated to be sub-quadratic on the training data set size, while that of the classical solver is quadratic.
    Dropout is NOT All You Need to Prevent Gradient Leakage. (arXiv:2208.06163v1 [cs.LG])
    Gradient inversion attacks on federated learning systems reconstruct client training data from exchanged gradient information. To defend against such attacks, a variety of defense mechanisms were proposed. However, they usually lead to an unacceptable trade-off between privacy and model utility. Recent observations suggest that dropout could mitigate gradient leakage and improve model utility if added to neural networks. Unfortunately, this phenomenon has not been systematically researched yet. In this work, we thoroughly analyze the effect of dropout on iterative gradient inversion attacks. We find that state of the art attacks are not able to reconstruct the client data due to the stochasticity induced by dropout during model training. Nonetheless, we argue that dropout does not offer reliable protection if the dropout induced stochasticity is adequately modeled during attack optimization. Consequently, we propose a novel Dropout Inversion Attack (DIA) that jointly optimizes for client data and dropout masks to approximate the stochastic client model. We conduct an extensive systematic evaluation of our attack on four seminal model architectures and three image classification datasets of increasing complexity. We find that our proposed attack bypasses the protection seemingly induced by dropout and reconstructs client data with high fidelity. Our work demonstrates that privacy inducing changes to model architectures alone cannot be assumed to reliably protect from gradient leakage and therefore should be combined with complementary defense mechanisms.
    Power Flow Balancing with Decentralized Graph Neural Networks. (arXiv:2111.02169v2 [cs.LG] UPDATED)
    We propose an end-to-end framework based on a Graph Neural Network (GNN) to balance the power flows in energy grids. The balancing is framed as a supervised vertex regression task, where the GNN is trained to predict the current and power injections at each grid branch that yield a power flow balance. By representing the power grid as a line graph with branches as vertices, we can train a GNN that is accurate and robust to changes in topology. In addition, by using specialized GNN layers, we are able to build a very deep architecture that accounts for large neighborhoods on the graph, while implementing only localized operations. We perform three different experiments to evaluate: i) the benefits of using localized rather than global operations and the tendency of deep GNN models to oversmooth the quantities on the nodes; ii) the resilience to perturbations in the graph topology; and iii) the capability to train the model simultaneously on multiple grid topologies and the consequential improvement in generalization to new, unseen grids. The proposed framework is efficient and, compared to other solvers based on deep learning, is robust to perturbations not only to the physical quantities on the grid components, but also to the topology.
    A multi-scale sampling method for accurate and robust deep neural network to predict combustion chemical kinetics. (arXiv:2201.03549v2 [physics.chem-ph] UPDATED)
    Machine learning has long been considered as a black box for predicting combustion chemical kinetics due to the extremely large number of parameters and the lack of evaluation standards and reproducibility. The current work aims to understand two basic questions regarding the deep neural network (DNN) method: what data the DNN needs and how general the DNN method can be. Sampling and preprocessing determine the DNN training dataset, further affect DNN prediction ability. The current work proposes using Box-Cox transformation (BCT) to preprocess the combustion data. In addition, this work compares different sampling methods with or without preprocessing, including the Monte Carlo method, manifold sampling, generative neural network method (cycle-GAN), and newly-proposed multi-scale sampling. Our results reveal that the DNN trained by the manifold data can capture the chemical kinetics in limited configurations but cannot remain robust toward perturbation, which is inevitable for the DNN coupled with the flow field. The Monte Carlo and cycle-GAN samplings can cover a wider phase space but fail to capture small-scale intermediate species, producing poor prediction results. A three-hidden-layer DNN, based on the multi-scale method without specific flame simulation data, allows predicting chemical kinetics in various scenarios and being stable during the temporal evolutions. This single DNN is readily implemented with several CFD codes and validated in various combustors, including (1). zero-dimensional autoignition, (2). one-dimensional freely propagating flame, (3). two-dimensional jet flame with triple-flame structure, and (4). three-dimensional turbulent lifted flames. The results demonstrate the satisfying accuracy and generalization ability of the pre-trained DNN. The Fortran and Python versions of DNN and example code are attached in the supplementary for reproducibility.
    Data Banzhaf: A Data Valuation Framework with Maximal Robustness to Learning Stochasticity. (arXiv:2205.15466v4 [cs.LG] UPDATED)
    This paper studies the robustness of data valuation to noisy model performance scores. Particularly, we find that the inherent randomness of the widely used stochastic gradient descent can cause existing data value notions (e.g., the Shapley value and the Leave-one-out error) to produce inconsistent data value rankings across different runs. To address this challenge, we first pose a formal framework within which one can measure the robustness of a data value notion. We show that the Banzhaf value, a value notion originated from cooperative game theory literature, achieves the maximal robustness among all semivalues -- a class of value notions that satisfy crucial properties entailed by ML applications. We propose an algorithm to efficiently estimate the Banzhaf value based on the Maximum Sample Reuse (MSR) principle. We derive the lower bound sample complexity for Banzhaf value estimation, and we show that our MSR algorithm's sample complexity is close to the lower bound. Our evaluation demonstrates that the Banzhaf value outperforms the existing semivalue-based data value notions on several downstream ML tasks such as learning with weighted samples and noisy label detection. Overall, our study suggests that when the underlying ML algorithm is stochastic, the Banzhaf value is a promising alternative to the semivalue-based data value schemes given its computational advantage and ability to robustly differentiate data quality.  ( 3 min )
    Multi-Model Probabilistic Programming. (arXiv:2208.06329v1 [cs.PL])
    Probabilistic programming makes it easy to represent a probabilistic model as a program. Building an individual model, however, is only one step of probabilistic modeling. The broader challenge of probabilistic modeling is in understanding and navigating spaces of alternative models. There is currently no good way to represent these spaces of alternative models, despite their central role. We present an extension of probabilistic programming that lets each program represent a network of interrelated probabilistic models. We give a formal semantics for these multi-model probabilistic programs, a collection of efficient algorithms for network-of-model operations, and an example implementation built on top of the popular probabilistic programming language Stan. This network-of-models representation opens many doors, including search and automation in model-space, tracking and communication of model development, and explicit modeler degrees of freedom to mitigate issues like p-hacking. We demonstrate automatic model search and model development tracking using our Stan implementation, and we propose many more possible applications.  ( 2 min )
    On the Convergence of Shallow Neural Network Training with Randomly Masked Neurons. (arXiv:2112.02668v2 [cs.LG] UPDATED)
    With the motive of training all the parameters of a neural network, we study why and when one can achieve this by iteratively creating, training, and combining randomly selected subnetworks. Such scenarios have either implicitly or explicitly emerged in the recent literature: see e.g., the Dropout family of regularization techniques, or some distributed ML training protocols that reduce communication/computation complexities, such as the Independent Subnet Training protocol. While these methods are studied empirically and utilized in practice, they often enjoy partial or no theoretical support, especially when applied on neural network-based objectives. In this manuscript, our focus is on overparameterized single hidden layer neural networks with ReLU activations in the lazy training regime. By carefully analyzing $i)$ the subnetworks' neural tangent kernel, $ii)$ the surrogate functions' gradient, and $iii)$ how we sample and combine the surrogate functions, we prove linear convergence rate of the training error -- up to a neighborhood around the optimal point -- for an overparameterized single-hidden layer perceptron with a regression loss. Our analysis reveals a dependency of the size of the neighborhood around the optimal point on the number of surrogate models and the number of local training steps for each selected subnetwork. Moreover, the considered framework generalizes and provides new insights on dropout training, multi-sample dropout training, as well as Independent Subnet Training; for each case, we provide convergence results as corollaries of our main theorem.  ( 3 min )
    Trustworthy Recommender Systems. (arXiv:2208.06265v1 [cs.IR])
    Recommender systems (RSs) aim to help users to effectively retrieve items of their interests from a large catalogue. For a quite long period of time, researchers and practitioners have been focusing on developing accurate RSs. Recent years have witnessed an increasing number of threats to RSs, coming from attacks, system and user generated noise, system bias. As a result, it has become clear that a strict focus on RS accuracy is limited and the research must consider other important factors, e.g., trustworthiness. For end users, a trustworthy RS (TRS) should not only be accurate, but also transparent, unbiased and fair as well as robust to noise or attacks. These observations actually led to a paradigm shift of the research on RSs: from accuracy-oriented RSs to TRSs. However, researchers lack a systematic overview and discussion of the literature in this novel and fast developing field of TRSs. To this end, in this paper, we provide an overview of TRSs, including a discussion of the motivation and basic concepts of TRSs, a presentation of the challenges in building TRSs, and a perspective on the future directions in this area. We also provide a novel conceptual framework to support the construction of TRSs.  ( 2 min )
    Joint Optimization of Ranking and Calibration with Contextualized Hybrid Model. (arXiv:2208.06164v1 [cs.IR])
    Despite the development of ranking optimization techniques, the pointwise model remains the dominating approach for click-through rate (CTR) prediction. It can be attributed to the calibration ability of the pointwise model since the prediction can be viewed as the click probability. In practice, a CTR prediction model is also commonly assessed with the ranking ability, for which prediction models based on ranking losses (e.g., pairwise or listwise loss) usually achieve better performances than the pointwise loss. Previous studies have experimented with a direct combination of the two losses to obtain the benefit from both losses and observed an improved performance. However, previous studies break the meaning of output logit as the click-through rate, which may lead to sub-optimal solutions. To address this issue, we propose an approach that can Jointly optimize the Ranking and Calibration abilities (JRC for short). JRC improves the ranking ability by contrasting the logit value for the sample with different labels and constrains the predicted probability to be a function of the logit subtraction. We further show that JRC consolidates the interpretation of logits, where the logits model the joint distribution. With such an interpretation, we prove that JRC approximately optimizes the contextualized hybrid discriminative-generative objective. Experiments on public and industrial datasets and online A/B testing show that our approach improves both ranking and calibration abilities. Since May 2022, JRC has been deployed on the display advertising platform of Alibaba and has obtained significant performance improvements.  ( 3 min )
    Coarse to Fine Two-Stage Approach to Robust Tensor Completion of Visual Data. (arXiv:2106.10422v4 [cs.LG] UPDATED)
    Tensor completion is the problem of estimating the missing values of high-order data from partially observed entries. Data corruption due to prevailing outliers poses major challenges to traditional tensor completion algorithms, which catalyzed the development of robust algorithms that alleviate the effect of outliers. However, existing robust methods largely presume that the corruption is sparse, which may not hold in practice. In this paper, we develop a two-stage robust tensor completion approach to deal with tensor completion of visual data with a large amount of gross corruption. A novel coarse-to-fine framework is proposed which uses a global coarse completion result to guide a local patch refinement process. To efficiently mitigate the effect of a large number of outliers on tensor recovery, we develop a new M-estimator-based robust tensor ring recovery method which can adaptively identify the outliers and alleviate their negative effect in the optimization. The experimental results demonstrate the superior performance of the proposed approach over state-of-the-art robust algorithms for tensor completion.  ( 3 min )
    Predicting Electricity Infrastructure Induced Wildfire Risk in California. (arXiv:2206.02930v2 [eess.SY] UPDATED)
    This paper examines the use of risk models to predict the timing and location of wildfires caused by electricity infrastructure. Our data include historical ignition and wire-down points triggered by grid infrastructure collected between 2015 to 2019 in Pacific Gas & Electricity territory along with various weather, vegetation, and very high resolution data on grid infrastructure including location, age, materials. With these data we explore a range of machine learning methods and strategies to manage training data imbalance. The best area under the receiver operating characteristic we obtain is 0.776 for distribution feeder ignitions and 0.824 for transmission line wire-down events, both using the histogram-based gradient boosting tree algorithm (HGB) with under-sampling. We then use these models to identify which information provides the most predictive value. After line length, we find that weather and vegetation features dominate the list of top important features for ignition or wire-down risk. Distribution ignition models show more dependence on slow-varying vegetation variables such as burn index, energy release content, and tree height, whereas transmission wire-down models rely more on primary weather variables such as wind speed and precipitation. These results point to the importance of improved vegetation modeling for feeder ignition risk models, and improved weather forecasting for transmission wire-down models. We observe that infrastructure features make small but meaningful improvements to risk model predictive power.  ( 3 min )
    Emergence of sensory attenuation based upon the free-energy principle. (arXiv:2111.02666v3 [q-bio.NC] UPDATED)
    The brain attenuates its responses to self-produced exteroceptions (e.g., we cannot tickle ourselves). Is this phenomenon, known as sensory attenuation, enabled innately, or acquired through learning? Here, our simulation study using a multimodal hierarchical recurrent neural network model, based on variational free-energy minimization, shows that a mechanism for sensory attenuation can develop through learning of two distinct types of sensorimotor experience, involving self-produced or externally produced exteroceptions. For each sensorimotor context, a particular free-energy state emerged through interaction between top-down prediction with precision and bottom-up sensory prediction error from each sensory area. The executive area in the network served as an information hub. Consequently, shifts between the two sensorimotor contexts triggered transitions from one free-energy state to another in the network via executive control, which caused shifts between attenuating and amplifying prediction-error-induced responses in the sensory areas. This study situates emergence of sensory attenuation (or self-other distinction) in development of distinct free-energy states in the dynamic hierarchical neural system.  ( 2 min )
    Unifying local and global model explanations by functional decomposition of low dimensional structures. (arXiv:2208.06151v1 [cs.LG])
    We consider a global explanation of a regression or classification function by decomposing it into the sum of main components and interaction components of arbitrary order. When adding an identification constraint that is motivated by a causal interpretation, we find q-interaction SHAP to be the unique solution to that constraint. Here, q denotes the highest order of interaction present in the decomposition. Our result provides a new perspective on SHAP values with various practical and theoretical implications: If SHAP values are decomposed into main and all interaction effects, they provide a global explanation with causal interpretation. In principle, the decomposition can be applied to any machine learning model. However, since the number of possible interactions grows exponentially with the number of features, exact calculation is only feasible for methods that fit low dimensional structures or ensembles of those. We provide an algorithm and efficient implementation for gradient boosted trees (xgboost and random planted forests that calculates this decomposition. Conducted experiments suggest that our method provides meaningful explanations and reveals interactions of higher orders. We also investigate further potential of our new insights by utilizing the global explanation for motivating a new measure of feature importance, and for reducing direct and indirect bias by post-hoc component removal.  ( 3 min )
    3D Graph Contrastive Learning for Molecular Property Prediction. (arXiv:2208.06360v1 [q-bio.BM])
    Self-supervised learning (SSL) is a method that learns the data representation by utilizing supervision inherent in the data. This learning method is in the spotlight in the drug field, lacking annotated data due to time-consuming and expensive experiments. SSL using enormous unlabeled data has shown excellent performance for molecular property prediction, but a few issues exist. (1) Existing SSL models are large-scale; there is a limitation to implementing SSL where the computing resource is insufficient. (2) In most cases, they do not utilize 3D structural information for molecular representation learning. The activity of a drug is closely related to the structure of the drug molecule. Nevertheless, most current models do not use 3D information or use it partially. (3) Previous models that apply contrastive learning to molecules use the augmentation of permuting atoms and bonds. Therefore, molecules having different characteristics can be in the same positive samples. We propose a novel contrastive learning framework, small-scale 3D Graph Contrastive Learning (3DGCL) for molecular property prediction, to solve the above problems. 3DGCL learns the molecular representation by reflecting the molecule's structure through the pre-training process that does not change the semantics of the drug. Using only 1,128 samples for pre-train data and 1 million model parameters, we achieved the state-of-the-art or comparable performance in four regression benchmark datasets. Extensive experiments demonstrate that 3D structural information based on chemical knowledge is essential to molecular representation learning for property prediction.  ( 3 min )
    Feature-Based Time-Series Analysis in R using the theft Package. (arXiv:2208.06146v1 [stat.ML])
    Time series are measured and analyzed across the sciences. One method of quantifying the structure of time series is by calculating a set of summary statistics or `features', and then representing a time series in terms of its properties as a feature vector. The resulting feature space is interpretable and informative, and enables conventional statistical learning approaches, including clustering, regression, and classification, to be applied to time-series datasets. Many open-source software packages for computing sets of time-series features exist across multiple programming languages, including catch22 (22 features: Matlab, R, Python, Julia), feasts (42 features: R), tsfeatures (63 features: R), Kats (40 features: Python), tsfresh (779 features: Python), and TSFEL (390 features: Python). However, there are several issues: (i) a singular access point to these packages is not currently available; (ii) to access all feature sets, users must be fluent in multiple languages; and (iii) these feature-extraction packages lack extensive accompanying methodological pipelines for performing feature-based time-series analysis, such as applications to time-series classification. Here we introduce a solution to these issues in an R software package called theft: Tools for Handling Extraction of Features from Time series. theft is a unified and extendable framework for computing features from the six open-source time-series feature sets listed above. It also includes a suite of functions for processing and interpreting the performance of extracted features, including extensive data-visualization templates, low-dimensional projections, and time-series classification operations. With an increasing volume and complexity of time-series datasets in the sciences and industry, theft provides a standardized framework for comprehensively quantifying and interpreting informative structure in time series.  ( 3 min )
    Function Classes for Identifiable Nonlinear Independent Component Analysis. (arXiv:2208.06406v1 [stat.ML])
    Unsupervised learning of latent variable models (LVMs) is widely used to represent data in machine learning. When such models reflect the ground truth factors and the mechanisms mapping them to observations, there is reason to expect that they allow generalization in downstream tasks. It is however well known that such identifiability guaranties are typically not achievable without putting constraints on the model class. This is notably the case for nonlinear Independent Component Analysis, in which the LVM maps statistically independent variables to observations via a deterministic nonlinear function. Several families of spurious solutions fitting perfectly the data, but that do not correspond to the ground truth factors can be constructed in generic settings. However, recent work suggests that constraining the function class of such models may promote identifiability. Specifically, function classes with constraints on their partial derivatives, gathered in the Jacobian matrix, have been proposed, such as orthogonal coordinate transformations (OCT), which impose orthogonality of the Jacobian columns. In the present work, we prove that a subclass of these transformations, conformal maps, is identifiable and provide novel theoretical results suggesting that OCTs have properties that prevent families of spurious solutions to spoil identifiability in a generic setting.  ( 2 min )
    Multiplex Heterogeneous Graph Convolutional Network. (arXiv:2208.06129v1 [cs.SI])
    Heterogeneous graph convolutional networks have gained great popularity in tackling various network analytical tasks on heterogeneous network data, ranging from link prediction to node classification. However, most existing works ignore the relation heterogeneity with multiplex network between multi-typed nodes and different importance of relations in meta-paths for node embedding, which can hardly capture the heterogeneous structure signals across different relations. To tackle this challenge, this work proposes a Multiplex Heterogeneous Graph Convolutional Network (MHGCN) for heterogeneous network embedding. Our MHGCN can automatically learn the useful heterogeneous meta-path interactions of different lengths in multiplex heterogeneous networks through multi-layer convolution aggregation. Additionally, we effectively integrate both multi-relation structural signals and attribute semantics into the learned node embeddings with both unsupervised and semi-supervised learning paradigms. Extensive experiments on five real-world datasets with various network analytical tasks demonstrate the significant superiority of MHGCN against state-of-the-art embedding baselines in terms of all evaluation metrics.  ( 2 min )
    Mitigating barren plateaus of variational quantum eigensolvers. (arXiv:2205.13539v2 [quant-ph] UPDATED)
    Variational quantum algorithms (VQAs) are expected to establish valuable applications on near-term quantum computers. However, recent works have pointed out that the performance of VQAs greatly relies on the expressibility of the ansatzes and is seriously limited by optimization issues such as barren plateaus (i.e., vanishing gradients). This work proposes the state efficient ansatz (SEA) for accurate ground state preparation with improved trainability. We show that the SEA can generate an arbitrary pure state with much fewer parameters than a universal ansatz, making it efficient for tasks like ground state estimation. Then, we prove that barren plateaus can be efficiently mitigated by the SEA and the trainability can be further improved most quadratically by flexibly adjusting the entangling capability of the SEA. Finally, we investigate a plethora of examples in ground state estimation where we obtain significant improvements in the magnitude of cost gradient and the convergence speed.  ( 2 min )
    ALS: Augmented Lagrangian Sketching Methods for Linear Systems. (arXiv:2208.06152v1 [math.OC])
    We develop two fundamental stochastic sketching techniques; Penalty Sketching (PS) and Augmented Lagrangian Sketching (ALS) for solving consistent linear systems. The proposed PS and ALS techniques extend and generalize the scope of Sketch & Project (SP) method by introducing Lagrangian penalty sketches. In doing so, we recover SP methods as special cases and furthermore develop a family of new stochastic iterative methods. By varying sketch parameters in the proposed PS method, we recover novel stochastic methods such as Penalty Newton Descent, Penalty Kaczmarz, Penalty Stochastic Descent, Penalty Coordinate Descent, Penalty Gaussian Pursuit, and Penalty Block Kaczmarz. Furthermore, the proposed ALS method synthesizes a wide variety of new stochastic methods such as Augmented Newton Descent, Augmented Kaczmarz, Augmented Stochastic Descent, Augmented Coordinate Descent, Augmented Gaussian Pursuit, and Augmented Block Kaczmarz into one framework. Moreover, we show that the developed PS and ALS frameworks can be used to reformulate the original linear system into equivalent stochastic optimization problems namely the Penalty Stochastic Reformulation and Augmented Stochastic Reformulation. We prove global convergence rates for the PS and ALS methods as well as sub-linear $\mathcal{O}(\frac{1}{k})$ rates for the Cesaro average of iterates. The proposed convergence results hold for a wide family of distributions of random matrices, which provides the opportunity of fine-tuning the randomness of the method suitable for specific applications. Finally, we perform computational experiments that demonstrate the efficiency of our methods compared to the existing SP methods.  ( 3 min )
    UniNet: A Unified Scene Understanding Network and Exploring Multi-Task Relationships through the Lens of Adversarial Attacks. (arXiv:2108.04584v2 [cs.CV] UPDATED)
    Scene understanding is crucial for autonomous systems which intend to operate in the real world. Single task vision networks extract information only based on some aspects of the scene. In multi-task learning (MTL), on the other hand, these single tasks are jointly learned, thereby providing an opportunity for tasks to share information and obtain a more comprehensive understanding. To this end, we develop UniNet, a unified scene understanding network that accurately and efficiently infers vital vision tasks including object detection, semantic segmentation, instance segmentation, monocular depth estimation, and monocular instance depth prediction. As these tasks look at different semantic and geometric information, they can either complement or conflict with each other. Therefore, understanding inter-task relationships can provide useful cues to enable complementary information sharing. We evaluate the task relationships in UniNet through the lens of adversarial attacks based on the notion that they can exploit learned biases and task interactions in the neural network. Extensive experiments on the Cityscapes dataset, using untargeted and targeted attacks reveal that semantic tasks strongly interact amongst themselves, and the same holds for geometric tasks. Additionally, we show that the relationship between semantic and geometric tasks is asymmetric and their interaction becomes weaker as we move towards higher-level representations.  ( 3 min )
    Understanding the stochastic dynamics of sequential decision-making processes: A path-integral analysis of Multi-armed Bandits. (arXiv:2208.06245v1 [cs.LG])
    The multi-armed bandit (MAB) model is one of the most classical models to study decision-making in an uncertain environment. In this model, a player needs to choose one of K possible arms of a bandit machine to play at each time step, where the corresponding arm returns a random reward to the player, potentially from a specific unknown distribution. The target of the player is to collect as much rewards as possible during the process. Despite its simplicity, the MAB model offers an excellent playground for studying the trade-off between exploration versus exploitation and designing effective algorithms for sequential decision-making under uncertainty. Although many asymptotically optimal algorithms have been established, the finite-time behaviours of the stochastic dynamics of the MAB model appears much more difficult to analyze, due to the intertwining between the decision-making and the rewards being collected. In this paper, we employ techniques in statistical physics to analyze the MAB model, which facilitates to characterize the distribution of cumulative regrets at a finite short time, the central quantity of interest in an MAB algorithm, as well as the intricate dynamical behaviours of the model.  ( 2 min )
    Image Translation Based Nuclei Segmentation for Immunohistochemistry Images. (arXiv:2208.06202v1 [cs.CV])
    Numerous deep learning based methods have been developed for nuclei segmentation for H&E images and have achieved close to human performance. However, direct application of such methods to another modality of images, such as Immunohistochemistry (IHC) images, may not achieve satisfactory performance. Thus, we developed a Generative Adversarial Network (GAN) based approach to translate an IHC image to an H&E image while preserving nuclei location and morphology and then apply pre-trained nuclei segmentation models to the virtual H&E image. We demonstrated that the proposed methods work better than several baseline methods including direct application of state of the art nuclei segmentation methods such as Cellpose and HoVer-Net, trained on H&E and a generative method, DeepLIIF, using two public IHC image datasets.  ( 2 min )
    Patch Tracking-based Streaming Tensor Ring Completion for Visual Data Recovery. (arXiv:2105.14620v3 [cs.CV] UPDATED)
    Tensor completion aims to recover the missing entries of a partially observed tensor by exploiting its low-rank structure, and has been applied to visual data recovery. In applications where the data arrives sequentially such as streaming video completion, the missing entries of the tensor need to be dynamically recovered in a streaming fashion. Traditional streaming tensor completion algorithms treat the entire visual data as a tensor, which may not work satisfactorily when there is a big change in the tensor subspace along the temporal dimension, such as due to strong motion across the video frames. In this paper, we develop a novel patch tracking-based streaming tensor ring completion framework for visual data recovery. Given a newly incoming frame, small patches are tracked from the previous frame. Meanwhile, for each tracked patch, a patch tensor is constructed by stacking similar patches from the new frame. Patch tensors are then completed using a streaming tensor ring completion algorithm, and the incoming frame is recovered using the completed patch tensors. We propose a new patch tracking strategy that can accurately and efficiently track the patches with missing data. Further, a new streaming tensor ring completion algorithm is proposed which can efficiently and accurately update the latent core tensors and complete the missing entries of the patch tensors. Extensive experimental results demonstrate the superior performance of the proposed algorithms compared with both batch and streaming state-of-the-art tensor completion methods.  ( 3 min )
    Low Emission Building Control with Zero-Shot Reinforcement Learning. (arXiv:2208.06385v1 [cs.LG])
    Heating and cooling systems in buildings account for 31\% of global energy use, much of which are regulated by Rule Based Controllers (RBCs) that neither maximise energy efficiency nor minimise emissions by interacting optimally with the grid. Control via Reinforcement Learning (RL) has been shown to significantly improve building energy efficiency, but existing solutions require access to building-specific simulators or data that cannot be expected for every building in the world. In response, we show it is possible to obtain emission-reducing policies without such knowledge a priori--a paradigm we call zero-shot building control. We combine ideas from system identification and model-based RL to create PEARL (Probabilistic Emission-Abating Reinforcement Learning) and show that a short period of active exploration is all that is required to build a performant model. In experiments across three varied building energy simulations, we show PEARL outperforms an existing RBC once, and popular RL baselines in all cases, reducing building emissions by as much as 31\% whilst maintaining thermal comfort. Our source code is available online via https://enjeeneer.io/projects/pearl .  ( 2 min )
    Unifying Gradients to Improve Real-world Robustness for Deep Networks. (arXiv:2208.06228v1 [stat.ML])
    The wide application of deep neural networks (DNNs) demands an increasing amount of attention to their real-world robustness, i.e., whether a DNN resists black-box adversarial attacks, among them score-based query attacks (SQAs) are the most threatening ones because of their practicalities and effectiveness: the attackers only need dozens of queries on model outputs to seriously hurt a victim network. Defending against SQAs requires a slight but artful variation of outputs due to the service purpose for users, who share the same output information with attackers. In this paper, we propose a real-world defense, called Unifying Gradients (UniG), to unify gradients of different data so that attackers could only probe a much weaker attack direction that is similar for different samples. Since such universal attack perturbations have been validated as less aggressive than the input-specific perturbations, UniG protects real-world DNNs by indicating attackers a twisted and less informative attack direction. To enhance UniG's practical significance in real-world applications, we implement it as a Hadamard product module that is computationally-efficient and readily plugged into any model. According to extensive experiments on 5 SQAs and 4 defense baselines, UniG significantly improves real-world robustness without hurting clean accuracy on CIFAR10 and ImageNet. For instance, UniG maintains a CIFAR-10 model of 77.80% accuracy under 2500-query Square attack while the state-of-the-art adversarially-trained model only has 67.34% on CIFAR10. Simultaneously, UniG greatly surpasses all compared baselines in clean accuracy and the modification degree of outputs. The code would be released.  ( 3 min )
    Scholastic: Graphical Human-Al Collaboration for Inductive and Interpretive Text Analysis. (arXiv:2208.06133v1 [cs.HC])
    Interpretive scholars generate knowledge from text corpora by manually sampling documents, applying codes, and refining and collating codes into categories until meaningful themes emerge. Given a large corpus, machine learning could help scale this data sampling and analysis, but prior research shows that experts are generally concerned about algorithms potentially disrupting or driving interpretive scholarship. We take a human-centered design approach to addressing concerns around machine-assisted interpretive research to build Scholastic, which incorporates a machine-in-the-loop clustering algorithm to scaffold interpretive text analysis. As a scholar applies codes to documents and refines them, the resulting coding schema serves as structured metadata which constrains hierarchical document and word clusters inferred from the corpus. Interactive visualizations of these clusters can help scholars strategically sample documents further toward insights. Scholastic demonstrates how human-centered algorithm design and visualizations employing familiar metaphors can support inductive and interpretive research methodologies through interactive topic modeling and document clustering.  ( 2 min )
    A Modular Framework for Reinforcement Learning Optimal Execution. (arXiv:2208.06244v1 [cs.CE])
    In this article, we develop a modular framework for the application of Reinforcement Learning to the problem of Optimal Trade Execution. The framework is designed with flexibility in mind, in order to ease the implementation of different simulation setups. Rather than focusing on agents and optimization methods, we focus on the environment and break down the necessary requirements to simulate an Optimal Trade Execution under a Reinforcement Learning framework such as data pre-processing, construction of observations, action processing, child order execution, simulation of benchmarks, reward calculations etc. We give examples of each component, explore the difficulties their individual implementations \& the interactions between them entail, and discuss the different phenomena that each component induces in the simulation, highlighting the divergences between the simulation and the behavior of a real market. We showcase our modular implementation through a setup that, following a Time-Weighted Average Price (TWAP) order submission schedule, allows the agent to exclusively place limit orders, simulates their execution via iterating over snapshots of the Limit Order Book (LOB), and calculates rewards as the \$ improvement over the price achieved by a TWAP benchmark algorithm following the same schedule. We also develop evaluation procedures that incorporate iterative re-training and evaluation of a given agent over intervals of a training horizon, mimicking how an agent may behave when being continuously retrained as new market data becomes available and emulating the monitoring practices that algorithm providers are bound to perform under current regulatory frameworks.  ( 3 min )
    Private Domain Adaptation from a Public Source. (arXiv:2208.06135v1 [cs.LG])
    A key problem in a variety of applications is that of domain adaptation from a public source domain, for which a relatively large amount of labeled data with no privacy constraints is at one's disposal, to a private target domain, for which a private sample is available with very few or no labeled data. In regression problems with no privacy constraints on the source or target data, a discrepancy minimization algorithm based on several theoretical guarantees was shown to outperform a number of other adaptation algorithm baselines. Building on that approach, we design differentially private discrepancy-based algorithms for adaptation from a source domain with public labeled data to a target domain with unlabeled private data. The design and analysis of our private algorithms critically hinge upon several key properties we prove for a smooth approximation of the weighted discrepancy, such as its smoothness with respect to the $\ell_1$-norm and the sensitivity of its gradient. Our solutions are based on private variants of Frank-Wolfe and Mirror-Descent algorithms. We show that our adaptation algorithms benefit from strong generalization and privacy guarantees and report the results of experiments demonstrating their effectiveness.  ( 2 min )
    Measuring incompatibility and clustering quantum observables with a quantum switch. (arXiv:2208.06210v1 [quant-ph])
    The existence of incompatible observables is a cornerstone of quantum mechanics and a valuable resource in quantum technologies. Here we introduce a measure of incompatibility, called the mutual eigenspace disturbance (MED), which quantifies the amount of disturbance induced by the measurement of a sharp observable on the eigenspaces of another. The MED is a faithful measure of incompatibility for sharp observables and provides a metric on the space of von Neumann measurements. It can be efficiently estimated by letting the measurements act in an indefinite order, using a setup known as the quantum switch. Thanks to these features, the MED can be used in quantum machine learning tasks, such as clustering quantum measurement devices based on their mutual compatibility. We demonstrate this application by providing an unsupervised algorithm that clusters unknown von Neumann measurements. Our algorithm is robust to noise can be used to identify groups of observers that share approximately the same measurement context.  ( 2 min )
    Accurate Action Recommendation for Smart Home via Two-Level Encoders and Commonsense Knowledge. (arXiv:2208.06089v1 [cs.AI])
    How can we accurately recommend actions for users to control their devices at home? Action recommendation for smart home has attracted increasing attention due to its potential impact on the markets of virtual assistants and Internet of Things (IoT). However, designing an effective action recommender system for smart home is challenging because it requires handling context correlations, considering both queried contexts and previous histories of users, and dealing with capricious intentions in history. In this work, we propose SmartSense, an accurate action recommendation method for smart home. For individual action, SmartSense summarizes its device control and its temporal contexts in a self-attentive manner, to reflect the importance of the correlation between them. SmartSense then summarizes sequences of users considering queried contexts in a query-attentive manner to extract the query-related patterns from the sequential actions. SmartSense also transfers the commonsense knowledge from routine data to better handle intentions in action sequences. As a result, SmartSense addresses all three main challenges of action recommendation for smart home, and achieves the state-of-the-art performance giving up to 9.8% higher mAP@1 than the best competitor.  ( 2 min )
    Zeus: Understanding and Optimizing GPU Energy Consumption of DNN Training. (arXiv:2208.06102v1 [cs.LG])
    Training deep neural networks (DNNs) is becoming more and more resource- and energy-intensive every year. Unfortunately, existing works primarily focus on optimizing DNN training for faster completion, often without considering the impact on energy efficiency. In this paper, we observe that common practices to improve training performance can often lead to inefficient energy usage. More importantly, we demonstrate that there is a tradeoff between energy consumption and performance optimization. To this end, we propose an optimization framework, Zeus, to navigate this tradeoff by automatically finding optimal job- and GPU-level configurations for recurring DNN training jobs. Zeus uses an online exploration-exploitation approach in conjunction with just-in-time energy profiling, averting the need for expensive offline measurements, while adapting to data drifts over time. Our evaluation shows that Zeus can improve the energy efficiency of DNN training by 15.3%--75.8% for diverse workloads.  ( 2 min )
    DDX7: Differentiable FM Synthesis of Musical Instrument Sounds. (arXiv:2208.06169v1 [cs.SD])
    FM Synthesis is a well-known algorithm used to generate complex timbre from a compact set of design primitives. Typically featuring a MIDI interface, it is usually impractical to control it from an audio source. On the other hand, Differentiable Digital Signal Processing (DDSP) has enabled nuanced audio rendering by Deep Neural Networks (DNNs) that learn to control differentiable synthesis layers from arbitrary sound inputs. The training process involves a corpus of audio for supervision, and spectral reconstruction loss functions. Such functions, while being great to match spectral amplitudes, present a lack of pitch direction which can hinder the joint optimization of the parameters of FM synthesizers. In this paper, we take steps towards enabling continuous control of a well-established FM synthesis architecture from an audio input. Firstly, we discuss a set of design constraints that ease spectral optimization of a differentiable FM synthesizer via a standard reconstruction loss. Next, we present Differentiable DX7 (DDX7), a lightweight architecture for neural FM resynthesis of musical instrument sounds in terms of a compact set of parameters. We train the model on instrument samples extracted from the URMP dataset, and quantitatively demonstrate its comparable audio quality against selected benchmarks.  ( 3 min )
    An Accelerated Doubly Stochastic Gradient Method with Faster Explicit Model Identification. (arXiv:2208.06058v1 [cs.LG])
    Sparsity regularized loss minimization problems play an important role in various fields including machine learning, data mining, and modern statistics. Proximal gradient descent method and coordinate descent method are the most popular approaches to solving the minimization problem. Although existing methods can achieve implicit model identification, aka support set identification, in a finite number of iterations, these methods still suffer from huge computational costs and memory burdens in high-dimensional scenarios. The reason is that the support set identification in these methods is implicit and thus cannot explicitly identify the low-complexity structure in practice, namely, they cannot discard useless coefficients of the associated features to achieve algorithmic acceleration via dimension reduction. To address this challenge, we propose a novel accelerated doubly stochastic gradient descent (ADSGD) method for sparsity regularized loss minimization problems, which can reduce the number of block iterations by eliminating inactive coefficients during the optimization process and eventually achieve faster explicit model identification and improve the algorithm efficiency. Theoretically, we first prove that ADSGD can achieve a linear convergence rate and lower overall computational complexity. More importantly, we prove that ADSGD can achieve a linear rate of explicit model identification. Numerically, experimental results on benchmark datasets confirm the efficiency of our proposed method.  ( 2 min )
    WeightMom: Learning Sparse Networks using Iterative Momentum-based pruning. (arXiv:2208.05970v1 [cs.LG])
    Deep Neural Networks have been used in a wide variety of applications with significant success. However, their highly complex nature owing to comprising millions of parameters has lead to problems during deployment in pipelines with low latency requirements. As a result, it is more desirable to obtain lightweight neural networks which have the same performance during inference time. In this work, we propose a weight based pruning approach in which the weights are pruned gradually based on their momentum of the previous iterations. Each layer of the neural network is assigned an importance value based on their relative sparsity, followed by the magnitude of the weight in the previous iterations. We evaluate our approach on networks such as AlexNet, VGG16 and ResNet50 with image classification datasets such as CIFAR-10 and CIFAR-100. We found that the results outperformed the previous approaches with respect to accuracy and compression ratio. Our method is able to obtain a compression of 15% for the same degradation in accuracy on both the datasets.  ( 2 min )
    Optimal transport for vector Gaussian mixture models. (arXiv:2012.09226v2 [stat.ML] UPDATED)
    Vector Gaussian mixture models form an important special subset of vector-valued distributions. Any physical entity that can mutate or transit among alternative manifestations distributed in a given space falls into this category. A key example is color imagery. In this note, we vectorize the Gaussian mixture model and study different optimal mass transport related problems for such models. The benefits of using vector Gaussian mixture for optimal mass transport include computational efficiency and the ability to preserve structure.  ( 2 min )
    Developing moral AI to support antimicrobial decision making. (arXiv:2208.06327v1 [cs.CY])
    Artificial intelligence (AI) assisting with antimicrobial prescribing raises significant moral questions. Utilising ethical frameworks alongside AI-driven systems, while considering infection specific complexities, can support moral decision making to tackle antimicrobial resistance.  ( 2 min )
    Improving Human Decision-Making with Machine Learning. (arXiv:2108.08454v3 [cs.LG] UPDATED)
    Workers spend a significant amount of time learning how to make good decisions. Evaluating the efficacy of a given decision, however, can be complicated -- e.g., decision outcomes are often long-term and relate to the original decision in complex ways. Surprisingly, even though learning good decision-making strategies is difficult, they can often be expressed in simple and concise forms. Focusing on sequential decision-making, we design a novel machine learning algorithm that is capable of extracting "best practices" from trace data and conveying its insights to humans in the form of interpretable "tips". Our algorithm selects the tip that best bridges the gap between the actions taken by the human workers and those taken by the optimal policy in a way that accounts for which actions are consequential for achieving higher performance. We evaluate our approach through a series of randomized controlled experiments where participants manage a virtual kitchen. Our experiments show that the tips generated by our algorithm can significantly improve human performance relative to intuitive baselines. In addition, we discuss a number of empirical insights that can help inform the design of algorithms intended for human-AI interfaces. For instance, we find evidence that participants do not simply blindly follow our tips; instead, they combine them with their own experience to discover additional strategies for improving performance.  ( 3 min )
    R\'enyiCL: Contrastive Representation Learning with Skew R\'enyi Divergence. (arXiv:2208.06270v1 [stat.ML])
    Contrastive representation learning seeks to acquire useful representations by estimating the shared information between multiple views of data. Here, the choice of data augmentation is sensitive to the quality of learned representations: as harder the data augmentations are applied, the views share more task-relevant information, but also task-irrelevant one that can hinder the generalization capability of representation. Motivated by this, we present a new robust contrastive learning scheme, coined R\'enyiCL, which can effectively manage harder augmentations by utilizing R\'enyi divergence. Our method is built upon the variational lower bound of R\'enyi divergence, but a na\"ive usage of a variational method is impractical due to the large variance. To tackle this challenge, we propose a novel contrastive objective that conducts variational estimation of a skew R\'enyi divergence and provide a theoretical guarantee on how variational estimation of skew divergence leads to stable training. We show that R\'enyi contrastive learning objectives perform innate hard negative sampling and easy positive sampling simultaneously so that it can selectively learn useful features and ignore nuisance features. Through experiments on ImageNet, we show that R\'enyi contrastive learning with stronger augmentations outperforms other self-supervised methods without extra regularization or computational overhead. Moreover, we also validate our method on other domains such as graph and tabular, showing empirical gain over other contrastive methods.  ( 2 min )
    Comparing Baseline Shapley and Integrated Gradients for Local Explanation: Some Additional Insights. (arXiv:2208.06096v1 [cs.LG])
    There are many different methods in the literature for local explanation of machine learning results. However, the methods differ in their approaches and often do not provide same explanations. In this paper, we consider two recent methods: Integrated Gradients (Sundararajan, Taly, & Yan, 2017) and Baseline Shapley (Sundararajan and Najmi, 2020). The original authors have already studied the axiomatic properties of the two methods and provided some comparisons. Our work provides some additional insights on their comparative behavior for tabular data. We discuss common situations where the two provide identical explanations and where they differ. We also use simulation studies to examine the differences when neural networks with ReLU activation function is used to fit the models.  ( 2 min )
    Gradient Estimation for Binary Latent Variables via Gradient Variance Clipping. (arXiv:2208.06124v1 [cs.LG])
    Gradient estimation is often necessary for fitting generative models with discrete latent variables, in contexts such as reinforcement learning and variational autoencoder (VAE) training. The DisARM estimator (Yin et al. 2020; Dong, Mnih, and Tucker 2020) achieves state of the art gradient variance for Bernoulli latent variable models in many contexts. However, DisARM and other estimators have potentially exploding variance near the boundary of the parameter space, where solutions tend to lie. To ameliorate this issue, we propose a new gradient estimator \textit{bitflip}-1 that has lower variance at the boundaries of the parameter space. As bitflip-1 has complementary properties to existing estimators, we introduce an aggregated estimator, \textit{unbiased gradient variance clipping} (UGC) that uses either a bitflip-1 or a DisARM gradient update for each coordinate. We theoretically prove that UGC has uniformly lower variance than DisARM. Empirically, we observe that UGC achieves the optimal value of the optimization objectives in toy experiments, discrete VAE training, and in a best subset selection problem.  ( 2 min )
    A Case for Rejection in Low Resource ML Deployment. (arXiv:2208.06359v1 [cs.LG])
    Building reliable AI decision support systems requires a robust set of data on which to train models; both with respect to quantity and diversity. Obtaining such datasets can be difficult in resource limited settings, or for applications in early stages of deployment. Sample rejection is one way to work around this challenge, however much of the existing work in this area is ill-suited for such scenarios. This paper substantiates that position and proposes a simple solution as a proof of concept baseline.  ( 2 min )
    Bayesian Inference with Latent Hamiltonian Neural Networks. (arXiv:2208.06120v1 [cs.LG])
    When sampling for Bayesian inference, one popular approach is to use Hamiltonian Monte Carlo (HMC) and specifically the No-U-Turn Sampler (NUTS) which automatically decides the end time of the Hamiltonian trajectory. However, HMC and NUTS can require numerous numerical gradients of the target density, and can prove slow in practice. We propose Hamiltonian neural networks (HNNs) with HMC and NUTS for solving Bayesian inference problems. Once trained, HNNs do not require numerical gradients of the target density during sampling. Moreover, they satisfy important properties such as perfect time reversibility and Hamiltonian conservation, making them well-suited for use within HMC and NUTS because stationarity can be shown. We also propose an HNN extension called latent HNNs (L-HNNs), which are capable of predicting latent variable outputs. Compared to HNNs, L-HNNs offer improved expressivity and reduced integration errors. Finally, we employ L-HNNs in NUTS with an online error monitoring scheme to prevent sample degeneracy in regions of low probability density. We demonstrate L-HNNs in NUTS with online error monitoring on several examples involving complex, heavy-tailed, and high-local-curvature probability densities. Overall, L-HNNs in NUTS with online error monitoring satisfactorily inferred these probability densities. Compared to traditional NUTS, L-HNNs in NUTS with online error monitoring required 1--2 orders of magnitude fewer numerical gradients of the target density and improved the effective sample size (ESS) per gradient by an order of magnitude.  ( 3 min )
    GSim: A Graph Neural Network based Relevance Measure for Heterogeneous Graphs. (arXiv:2208.06144v1 [cs.IR])
    Heterogeneous graphs, which contain nodes and edges of multiple types, are prevalent in various domains, including bibliographic networks, social media, and knowledge graphs. As a fundamental task in analyzing heterogeneous graphs, relevance measure aims to calculate the relevance between two objects of different types, which has been used in many applications such as web search, recommendation, and community detection. Most of existing relevance measures focus on homogeneous networks where objects are of the same type, and a few measures are developed for heterogeneous graphs, but they often need the pre-defined meta-path. Defining meaningful meta-paths requires much domain knowledge, which largely limits their applications, especially on schema-rich heterogeneous graphs like knowledge graphs. Recently, the Graph Neural Network (GNN) has been widely applied in many graph mining tasks, but it has not been applied for measuring relevance yet. To address the aforementioned problems, we propose a novel GNN-based relevance measure, namely GSim. Specifically, we first theoretically analyze and show that GNN is effective for measuring the relevance of nodes in the graph. We then propose a context path-based graph neural network (CP-GNN) to automatically leverage the semantics in heterogeneous graphs. Moreover, we exploit CP-GNN to support relevance measures between two objects of any type. Extensive experiments demonstrate that GSim outperforms existing measures.  ( 3 min )
    On deceiving malware classification with section injection. (arXiv:2208.06092v1 [cs.CR])
    We investigate how to modify executable files to deceive malware classification systems. This work's main contribution is a methodology to inject bytes across a malware file randomly and use it both as an attack to decrease classification accuracy but also as a defensive method, augmenting the data available for training. It respects the operating system file format to make sure the malware will still execute after our injection and will not change its behavior. We reproduced five state-of-the-art malware classification approaches to evaluate our injection scheme: one based on GIST+KNN, three CNN variations and one Gated CNN. We performed our experiments on a public dataset with 9,339 malware samples from 25 different families. Our results show that a mere increase of 7% in the malware size causes an accuracy drop between 25% and 40% for malware family classification. They show that a automatic malware classification system may not be as trustworthy as initially reported in the literature. We also evaluate using modified malwares alongside the original ones to increase networks robustness against mentioned attacks. Results show that a combination of reordering malware sections and injecting random data can improve overall performance of the classification. Code available at https://github.com/adeilsonsilva/malware-injection.  ( 2 min )
    Algebraic Reduction of Hidden Markov Models. (arXiv:2208.05968v1 [cs.LG])
    The problem of reducing a Hidden Markov Model (HMM) to a one of smaller dimension that exactly reproduces the same marginals is tackled by using a system-theoretic approach, adapted to HMMs by leveraging on a suitable algebraic representation of probability spaces. We propose two algorithms that return coarse-grained equivalent HMMs obtained by stochastic projection operators: the first returns models that reproduce the single-time distribution of a given output process, while in the second the full (multi-time) distribution is preserved. The reduction method exploits not only the structure of the observed output, but also its initial condition, whenever the latter is known or belongs to a given subclass. Optimal algorithms are derived for a class of HMM, namely observable ones. In the general case, we propose algorithms that have produced minimal models for all the examples we analyzed, and conjecture their optimality.  ( 2 min )
    BSAC: Bayesian Strategy Network Based Soft Actor-Critic in Deep Reinforcement Learning. (arXiv:2208.06033v1 [cs.AI])
    Adopting reasonable strategies is challenging but crucial for an intelligent agent with limited resources working in hazardous, unstructured, and dynamic environments to improve the system utility, decrease the overall cost, and increase mission success probability. Deep Reinforcement Learning (DRL) helps organize agents' behaviors and actions based on their state and represents complex strategies (composition of actions). This paper proposes a novel hierarchical strategy decomposition approach based on Bayesian chaining to separate an intricate policy into several simple sub-policies and organize their relationships as Bayesian strategy networks (BSN). We integrate this approach into the state-of-the-art DRL method, soft actor-critic (SAC), and build the corresponding Bayesian soft actor-critic (BSAC) model by organizing several sub-policies as a joint policy. We compare the proposed BSAC method with the SAC and other state-of-the-art approaches such as TD3, DDPG, and PPO on the standard continuous control benchmarks -- Hopper-v2, Walker2d-v2, and Humanoid-v2 -- in MuJoCo with the OpenAI Gym environment. The results demonstrate that the promising potential of the BSAC method significantly improves training efficiency. The open sourced codes for BSAC can be accessed at https://github.com/herolab-uga/bsac.  ( 2 min )
    Forecasting the production of Distillate Fuel Oil Refinery and Propane Blender net production by using Time Series Algorithms. (arXiv:2208.05964v1 [cs.LG])
    Oil production forecasting is an important step in controlling the cost-effect and monitoring the functioning of petroleum reservoirs. As a result, oil production forecasting makes it easier for reservoir engineers to develop feasible projects, which helps to avoid risky investments and achieve long-term growth. As a result, reliable petroleum reservoir forecasting is critical for controlling and managing the effective cost of oil reservoirs. Oil production is influenced by reservoir qualities such as porosity, permeability, compressibility, fluid saturation, and other well operational parameters. Three-time series algorithms i.e., Seasonal Naive method, Exponential Smoothening and ARIMA to forecast the Distillate Fuel Oil Refinery and Propane Blender net production for the next two years.  ( 2 min )
    Interaction Decompositions for Tensor Network Regression. (arXiv:2208.06029v1 [cs.LG])
    It is well known that tensor network regression models operate on an exponentially large feature space, but questions remain as to how effectively they are able to utilize this space. Using the polynomial featurization from Novikov et al., we propose the interaction decomposition as a tool that can assess the relative importance of different regressors as a function of their polynomial degree. We apply this decomposition to tensor ring and tree tensor network models trained on the MNIST and Fashion MNIST datasets, and find that up to 75% of interaction degrees are contributing meaningfully to these models. We also introduce a new type of tensor network model that is explicitly trained on only a small subset of interaction degrees, and find that these models are able to match or even outperform the full models using only a fraction of the exponential feature space. This suggests that standard tensor network models utilize their polynomial regressors in an inefficient manner, with the lower degree terms being vastly under-utilized.  ( 2 min )
    Personalizing or Not: Dynamically Personalized Federated Learning with Incentives. (arXiv:2208.06192v1 [cs.LG])
    Personalized federated learning (FL) facilitates collaborations between multiple clients to learn personalized models without sharing private data. The mechanism mitigates the statistical heterogeneity commonly encountered in the system, i.e., non-IID data over different clients. Existing personalized algorithms generally assume all clients volunteer for personalization. However, potential participants might still be reluctant to personalize models since they might not work well. In this case, clients choose to use the global model instead. To avoid making unrealistic assumptions, we introduce the personalization rate, measured as the fraction of clients willing to train personalized models, into federated settings and propose DyPFL. This dynamically personalized FL technique incentivizes clients to participate in personalizing local models while allowing the adoption of the global model when it performs better. We show that the algorithmic pipeline in DyPFL guarantees good convergence performance, allowing it to outperform alternative personalized methods in a broad range of conditions, including variation in heterogeneity, number of clients, local epochs, and batch sizes.  ( 2 min )
    Sequential Causal Imitation Learning with Unobserved Confounders. (arXiv:2208.06276v1 [cs.LG])
    "Monkey see monkey do" is an age-old adage, referring to na\"ive imitation without a deep understanding of a system's underlying mechanics. Indeed, if a demonstrator has access to information unavailable to the imitator (monkey), such as a different set of sensors, then no matter how perfectly the imitator models its perceived environment (See), attempting to reproduce the demonstrator's behavior (Do) can lead to poor outcomes. Imitation learning in the presence of a mismatch between demonstrator and imitator has been studied in the literature under the rubric of causal imitation learning (Zhang et al., 2020), but existing solutions are limited to single-stage decision-making. This paper investigates the problem of causal imitation learning in sequential settings, where the imitator must make multiple decisions per episode. We develop a graphical criterion that is necessary and sufficient for determining the feasibility of causal imitation, providing conditions when an imitator can match a demonstrator's performance despite differing capabilities. Finally, we provide an efficient algorithm for determining imitability and corroborate our theory with simulations.  ( 2 min )
  • Open

    Toward a Better Monitoring Statistic for Profile Monitoring via Variational Autoencoders. (arXiv:1911.00482v2 [cs.LG] UPDATED)
    Wide accessibility of imaging and profile sensors in modern industrial systems created an abundance of high-dimensional sensing variables. This led to a a growing interest in the research of high-dimensional process monitoring. However, most of the approaches in the literature assume the in-control population to lie on a linear manifold with a given basis (i.e., spline, wavelet, kernel, etc) or an unknown basis (i.e., principal component analysis and its variants), which cannot be used to efficiently model profiles with a nonlinear manifold which is common in many real-life cases. We propose deep probabilistic autoencoders as a viable unsupervised learning approach to model such manifolds. To do so, we formulate nonlinear and probabilistic extensions of the monitoring statistics from classical approaches as the expected reconstruction error (ERE) and the KL-divergence (KLD) based monitoring statistics. Through extensive simulation study, we provide insights on why latent-space based statistics are unreliable and why residual-space based ones typically perform much better for deep learning based approaches. Finally, we demonstrate the superiority of deep probabilistic models via both simulation study and a real-life case study involving images of defects from a hot steel rolling process.  ( 3 min )
    Data-Driven Fault Diagnosis Analysis and Open-Set Classification of Time-Series Data. (arXiv:2009.04756v2 [stat.ML] UPDATED)
    Fault diagnosis of dynamic systems is done by detecting changes in time-series data, for example residuals, caused by system degradation and faulty components. The use of general-purpose multi-class classification methods for fault diagnosis is complicated by imbalanced training data and unknown fault classes. Another complicating factor is that different fault classes can result in similar residual outputs, especially for small faults, which causes classification ambiguities. In this work, a framework for data-driven analysis and open-set classification is developed for fault diagnosis applications using the Kullback-Leibler divergence. A data-driven fault classification algorithm is proposed which can handle imbalanced datasets, class overlapping, and unknown faults. In addition, an algorithm is proposed to estimate the size of the fault when training data contains information from known fault realizations. An advantage of the proposed framework is that it can also be used for quantitative analysis of fault diagnosis performance, for example, to analyze how easy it is to classify faults of different magnitudes. To evaluate the usefulness of the proposed methods, multiple datasets from different fault scenarios have been collected from an internal combustion engine test bench to illustrate the design process of a data-driven diagnosis system, including quantitative fault diagnosis analysis and evaluation of the developed open set fault classification algorithm.  ( 3 min )
    Towards a Grounded Theory of Causation for Embodied AI. (arXiv:2206.13973v2 [cs.AI] UPDATED)
    There exist well-developed frameworks for causal modelling, but these require rather a lot of human domain expertise to define causal variables and perform interventions. In order to enable autonomous agents to learn abstract causal models through interactive experience, the existing theoretical foundations need to be extended and clarified. Existing frameworks give no guidance regarding variable choice / representation, and more importantly, give no indication as to which behaviour policies or physical transformations of state space shall count as interventions. The framework sketched in this paper describes actions as transformations of state space, for instance induced by an agent running a policy. This makes it possible to describe in a uniform way both transformations of the micro-state space and abstract models thereof, and say when the latter is veridical / grounded / natural. We then introduce (causal) variables, define a mechanism as an invariant predictor, and say when an action can be viewed as a ``surgical intervention'', thus bringing the objective of causal representation \& intervention skill learning into clearer focus.  ( 2 min )
    Comparing Baseline Shapley and Integrated Gradients for Local Explanation: Some Additional Insights. (arXiv:2208.06096v1 [cs.LG])
    There are many different methods in the literature for local explanation of machine learning results. However, the methods differ in their approaches and often do not provide same explanations. In this paper, we consider two recent methods: Integrated Gradients (Sundararajan, Taly, & Yan, 2017) and Baseline Shapley (Sundararajan and Najmi, 2020). The original authors have already studied the axiomatic properties of the two methods and provided some comparisons. Our work provides some additional insights on their comparative behavior for tabular data. We discuss common situations where the two provide identical explanations and where they differ. We also use simulation studies to examine the differences when neural networks with ReLU activation function is used to fit the models.  ( 2 min )
    Understanding the stochastic dynamics of sequential decision-making processes: A path-integral analysis of Multi-armed Bandits. (arXiv:2208.06245v1 [cs.LG])
    The multi-armed bandit (MAB) model is one of the most classical models to study decision-making in an uncertain environment. In this model, a player needs to choose one of K possible arms of a bandit machine to play at each time step, where the corresponding arm returns a random reward to the player, potentially from a specific unknown distribution. The target of the player is to collect as much rewards as possible during the process. Despite its simplicity, the MAB model offers an excellent playground for studying the trade-off between exploration versus exploitation and designing effective algorithms for sequential decision-making under uncertainty. Although many asymptotically optimal algorithms have been established, the finite-time behaviours of the stochastic dynamics of the MAB model appears much more difficult to analyze, due to the intertwining between the decision-making and the rewards being collected. In this paper, we employ techniques in statistical physics to analyze the MAB model, which facilitates to characterize the distribution of cumulative regrets at a finite short time, the central quantity of interest in an MAB algorithm, as well as the intricate dynamical behaviours of the model.
    Hypergraph Modeling via Spectral Embedding Connection: Hypergraph Cut, Weighted Kernel $k$-means, and Heat Kernel. (arXiv:2203.09888v2 [cs.LG] UPDATED)
    We propose a theoretical framework of multi-way similarity to model real-valued data into hypergraphs for clustering via spectral embedding. For graph cut based spectral clustering, it is common to model real-valued data into graph by modeling pairwise similarities using kernel function. This is because the kernel function has a theoretical connection to the graph cut. For problems where using multi-way similarities are more suitable than pairwise ones, it is natural to model as a hypergraph, which is generalization of a graph. However, although the hypergraph cut is well-studied, there is not yet established a hypergraph cut based framework to model multi-way similarity. In this paper, we formulate multi-way similarities by exploiting the theoretical foundation of kernel function. We show a theoretical connection between our formulation and hypergraph cut in two ways, generalizing both weighted kernel $k$-means and the heat kernel, by which we justify our formulation. We also provide a fast algorithm for spectral clustering. Our algorithm empirically shows better performance than existing graph and other heuristic modeling methods.
    R\'enyiCL: Contrastive Representation Learning with Skew R\'enyi Divergence. (arXiv:2208.06270v1 [stat.ML])
    Contrastive representation learning seeks to acquire useful representations by estimating the shared information between multiple views of data. Here, the choice of data augmentation is sensitive to the quality of learned representations: as harder the data augmentations are applied, the views share more task-relevant information, but also task-irrelevant one that can hinder the generalization capability of representation. Motivated by this, we present a new robust contrastive learning scheme, coined R\'enyiCL, which can effectively manage harder augmentations by utilizing R\'enyi divergence. Our method is built upon the variational lower bound of R\'enyi divergence, but a na\"ive usage of a variational method is impractical due to the large variance. To tackle this challenge, we propose a novel contrastive objective that conducts variational estimation of a skew R\'enyi divergence and provide a theoretical guarantee on how variational estimation of skew divergence leads to stable training. We show that R\'enyi contrastive learning objectives perform innate hard negative sampling and easy positive sampling simultaneously so that it can selectively learn useful features and ignore nuisance features. Through experiments on ImageNet, we show that R\'enyi contrastive learning with stronger augmentations outperforms other self-supervised methods without extra regularization or computational overhead. Moreover, we also validate our method on other domains such as graph and tabular, showing empirical gain over other contrastive methods.
    Optimal transport for vector Gaussian mixture models. (arXiv:2012.09226v2 [stat.ML] UPDATED)
    Vector Gaussian mixture models form an important special subset of vector-valued distributions. Any physical entity that can mutate or transit among alternative manifestations distributed in a given space falls into this category. A key example is color imagery. In this note, we vectorize the Gaussian mixture model and study different optimal mass transport related problems for such models. The benefits of using vector Gaussian mixture for optimal mass transport include computational efficiency and the ability to preserve structure.
    Markov Observation Models. (arXiv:2208.06368v1 [stat.ML])
    Herein, the Hidden Markov Model is expanded to allow for Markov chain observations. In particular, the observations are assumed to be a Markov chain whose one step transition probabilities depend upon the hidden Markov chain. An Expectation-Maximization analog to the Baum-Welch algorithm is developed for this more general model to estimate the transition probabilities for both the hidden state and for the observations as well as to estimate the probabilities for the initial joint hidden-state-observation distribution. A believe state or filter recursion to track the hidden state then arises from the calculations of this Expectation-Maximization algorithm. A dynamic programming analog to the Viterbi algorithm is also developed to estimate the most likely sequence of hidden states given the sequence of observations.
    EEGNN: Edge Enhanced Graph Neural Networks. (arXiv:2208.06322v1 [stat.ML])
    Training deep graph neural networks (GNNs) poses a challenging task, as the performance of GNNs may suffer from the number of hidden message-passing layers. The literature has focused on the proposals of over-smoothing and under-reaching to explain the performance deterioration of deep GNNs. In this paper, we propose a new explanation for such deteriorated performance phenomenon, mis-simplification, that is, mistakenly simplifying graphs by preventing self-loops and forcing edges to be unweighted. We show that such simplifying can reduce the potential of message-passing layers to capture the structural information of graphs. In view of this, we propose a new framework, edge enhanced graph neural network(EEGNN). EEGNN uses the structural information extracted from the proposed Dirichlet mixture Poisson graph model, a Bayesian nonparametric model for graphs, to improve the performance of various deep message-passing GNNs. Experiments over different datasets show that our method achieves considerable performance increase compared to baselines.
    Function Classes for Identifiable Nonlinear Independent Component Analysis. (arXiv:2208.06406v1 [stat.ML])
    Unsupervised learning of latent variable models (LVMs) is widely used to represent data in machine learning. When such models reflect the ground truth factors and the mechanisms mapping them to observations, there is reason to expect that they allow generalization in downstream tasks. It is however well known that such identifiability guaranties are typically not achievable without putting constraints on the model class. This is notably the case for nonlinear Independent Component Analysis, in which the LVM maps statistically independent variables to observations via a deterministic nonlinear function. Several families of spurious solutions fitting perfectly the data, but that do not correspond to the ground truth factors can be constructed in generic settings. However, recent work suggests that constraining the function class of such models may promote identifiability. Specifically, function classes with constraints on their partial derivatives, gathered in the Jacobian matrix, have been proposed, such as orthogonal coordinate transformations (OCT), which impose orthogonality of the Jacobian columns. In the present work, we prove that a subclass of these transformations, conformal maps, is identifiable and provide novel theoretical results suggesting that OCTs have properties that prevent families of spurious solutions to spoil identifiability in a generic setting.
    Private Domain Adaptation from a Public Source. (arXiv:2208.06135v1 [cs.LG])
    A key problem in a variety of applications is that of domain adaptation from a public source domain, for which a relatively large amount of labeled data with no privacy constraints is at one's disposal, to a private target domain, for which a private sample is available with very few or no labeled data. In regression problems with no privacy constraints on the source or target data, a discrepancy minimization algorithm based on several theoretical guarantees was shown to outperform a number of other adaptation algorithm baselines. Building on that approach, we design differentially private discrepancy-based algorithms for adaptation from a source domain with public labeled data to a target domain with unlabeled private data. The design and analysis of our private algorithms critically hinge upon several key properties we prove for a smooth approximation of the weighted discrepancy, such as its smoothness with respect to the $\ell_1$-norm and the sensitivity of its gradient. Our solutions are based on private variants of Frank-Wolfe and Mirror-Descent algorithms. We show that our adaptation algorithms benefit from strong generalization and privacy guarantees and report the results of experiments demonstrating their effectiveness.
    A Scalable Probabilistic Model for Reward Optimizing Slate Recommendation. (arXiv:2208.06263v1 [cs.IR])
    We introduce Probabilistic Rank and Reward model (PRR), a scalable probabilistic model for personalized slate recommendation. Our model allows state-of-the-art estimation of user interests in the following ubiquitous recommender system scenario: A user is shown a slate of K recommendations and the user chooses at most one of these K items. It is the goal of the recommender system to find the K items of most interest to a user in order to maximize the probability that the user interacts with the slate. Our contribution is to show that we can learn more effectively the probability of the recommendations being successful by combining the reward - whether the slate was clicked or not - and the rank - the item on the slate that was selected. Our method learns more efficiently than bandit methods that use only the reward, and user preference methods that use only the rank. It also provides similar or better estimation performance to independent inverse-propensity-score methods and is far more scalable. Our method is state of the art in terms of both speed and accuracy on massive datasets with up to 1 million items. Finally, our method allows fast delivery of recommendations powered by maximum inner product search (MIPS), making it suitable in extremely low latency domains such as computational advertising.
    Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning. (arXiv:2208.06193v1 [cs.LG])
    Offline reinforcement learning (RL), which aims to learn an optimal policy using a previously collected static dataset, is an important paradigm of RL. Standard RL methods often perform poorly at this task due to the function approximation errors on out-of-distribution actions. While a variety of regularization methods have been proposed to mitigate this issue, they are often constrained by policy classes with limited expressiveness and sometimes result in substantially suboptimal solutions. In this paper, we propose Diffusion-QL that utilizes a conditional diffusion model as a highly expressive policy class for behavior cloning and policy regularization. In our approach, we learn an action-value function and we add a term maximizing action-values into the training loss of a conditional diffusion model, which results in a loss that seeks optimal actions that are near the behavior policy. We show the expressiveness of the diffusion model-based policy and the coupling of the behavior cloning and policy improvement under the diffusion model both contribute to the outstanding performance of Diffusion-QL. We illustrate our method and prior work in a simple 2D bandit example with a multimodal behavior policy. We then show that our method can achieve state-of-the-art performance on the majority of the D4RL benchmark tasks for offline RL.
    Incompleteness of graph convolutional neural networks for points clouds in three dimensions. (arXiv:2201.07136v3 [stat.ML] UPDATED)
    Graph neural networks (GNN) are very popular methods in machine learning and have been applied very successfully to the prediction of the properties of molecules and materials. First-order GNNs are well known to be incomplete, i.e., there exist graphs that are distinct but appear identical when seen through the lens of the GNN. More complicated schemes have thus been designed to increase their resolving power. Applications to molecules (and more generally, point clouds), however, add a geometric dimension to the problem. The most straightforward and prevalent approach to construct graph representation for molecules regards atoms as vertices in a graph and draws a bond between each pair of atoms within a chosen cutoff. Bonds can be decorated with the distance between atoms, and the resulting "distance graph NNs" (dGNN) have empirically demonstrated excellent resolving power and are widely used in chemical ML, with all known indistinguishable graphs being resolved in the fully-connected limit. Here we show that even for the restricted case of fully-connected graphs induced by 3D atom clouds dGNNs are not complete. We construct pairs of distinct point clouds that generate graphs that, for any cutoff radius, are equivalent based on a first-order Weisfeiler-Lehman test. This class of degenerate structures includes chemically-plausible configurations, setting an ultimate limit to the expressive power of some of the well-established GNN architectures for atomistic machine learning. Models that explicitly use angular or directional information in the description of atomic environments can resolve these degeneracies.
    Gradient Estimation for Binary Latent Variables via Gradient Variance Clipping. (arXiv:2208.06124v1 [cs.LG])
    Gradient estimation is often necessary for fitting generative models with discrete latent variables, in contexts such as reinforcement learning and variational autoencoder (VAE) training. The DisARM estimator (Yin et al. 2020; Dong, Mnih, and Tucker 2020) achieves state of the art gradient variance for Bernoulli latent variable models in many contexts. However, DisARM and other estimators have potentially exploding variance near the boundary of the parameter space, where solutions tend to lie. To ameliorate this issue, we propose a new gradient estimator \textit{bitflip}-1 that has lower variance at the boundaries of the parameter space. As bitflip-1 has complementary properties to existing estimators, we introduce an aggregated estimator, \textit{unbiased gradient variance clipping} (UGC) that uses either a bitflip-1 or a DisARM gradient update for each coordinate. We theoretically prove that UGC has uniformly lower variance than DisARM. Empirically, we observe that UGC achieves the optimal value of the optimization objectives in toy experiments, discrete VAE training, and in a best subset selection problem.
    A sub-sampling algorithm preventing outliers. (arXiv:2208.06218v1 [stat.ME])
    Nowadays, in many different fields, massive data are available and for several reasons, it might be convenient to analyze just a subset of the data. The application of the D-optimality criterion can be helpful to optimally select a subsample of observations. However, it is well known that D-optimal support points lie on the boundary of the design space and if they go hand in hand with extreme response values, they can have a severe influence on the estimated linear model (leverage points with high influence). To overcome this problem, firstly, we propose an unsupervised exchange procedure that enables us to select a nearly D-optimal subset of observations without high leverage values. Then, we provide a supervised version of this exchange procedure, where besides high leverage points also the outliers in the responses (that are not associated to high leverage points) are avoided. This is possible because, unlike other design situations, in subsampling from big datasets the response values may be available. Finally, both the unsupervised and the supervised selection procedures are generalized to I-optimality, with the goal of getting accurate predictions.
    Feature-Based Time-Series Analysis in R using the theft Package. (arXiv:2208.06146v1 [stat.ML])
    Time series are measured and analyzed across the sciences. One method of quantifying the structure of time series is by calculating a set of summary statistics or `features', and then representing a time series in terms of its properties as a feature vector. The resulting feature space is interpretable and informative, and enables conventional statistical learning approaches, including clustering, regression, and classification, to be applied to time-series datasets. Many open-source software packages for computing sets of time-series features exist across multiple programming languages, including catch22 (22 features: Matlab, R, Python, Julia), feasts (42 features: R), tsfeatures (63 features: R), Kats (40 features: Python), tsfresh (779 features: Python), and TSFEL (390 features: Python). However, there are several issues: (i) a singular access point to these packages is not currently available; (ii) to access all feature sets, users must be fluent in multiple languages; and (iii) these feature-extraction packages lack extensive accompanying methodological pipelines for performing feature-based time-series analysis, such as applications to time-series classification. Here we introduce a solution to these issues in an R software package called theft: Tools for Handling Extraction of Features from Time series. theft is a unified and extendable framework for computing features from the six open-source time-series feature sets listed above. It also includes a suite of functions for processing and interpreting the performance of extracted features, including extensive data-visualization templates, low-dimensional projections, and time-series classification operations. With an increasing volume and complexity of time-series datasets in the sciences and industry, theft provides a standardized framework for comprehensively quantifying and interpreting informative structure in time series.
    Coarse to Fine Two-Stage Approach to Robust Tensor Completion of Visual Data. (arXiv:2106.10422v4 [cs.LG] UPDATED)
    Tensor completion is the problem of estimating the missing values of high-order data from partially observed entries. Data corruption due to prevailing outliers poses major challenges to traditional tensor completion algorithms, which catalyzed the development of robust algorithms that alleviate the effect of outliers. However, existing robust methods largely presume that the corruption is sparse, which may not hold in practice. In this paper, we develop a two-stage robust tensor completion approach to deal with tensor completion of visual data with a large amount of gross corruption. A novel coarse-to-fine framework is proposed which uses a global coarse completion result to guide a local patch refinement process. To efficiently mitigate the effect of a large number of outliers on tensor recovery, we develop a new M-estimator-based robust tensor ring recovery method which can adaptively identify the outliers and alleviate their negative effect in the optimization. The experimental results demonstrate the superior performance of the proposed approach over state-of-the-art robust algorithms for tensor completion.
    Transformers Can Do Bayesian Inference. (arXiv:2112.10510v5 [cs.LG] UPDATED)
    Currently, it is hard to reap the benefits of deep learning for Bayesian methods, which allow the explicit specification of prior knowledge and accurately capture model uncertainty. We present Prior-Data Fitted Networks (PFNs). PFNs leverage large-scale machine learning techniques to approximate a large set of posteriors. The only requirement for PFNs to work is the ability to sample from a prior distribution over supervised learning tasks (or functions). Our method restates the objective of posterior approximation as a supervised classification problem with a set-valued input: it repeatedly draws a task (or function) from the prior, draws a set of data points and their labels from it, masks one of the labels and learns to make probabilistic predictions for it based on the set-valued input of the rest of the data points. Presented with a set of samples from a new supervised learning task as input, PFNs make probabilistic predictions for arbitrary other data points in a single forward propagation, having learned to approximate Bayesian inference. We demonstrate that PFNs can near-perfectly mimic Gaussian processes and also enable efficient Bayesian inference for intractable problems, with over 200-fold speedups in multiple setups compared to current methods. We obtain strong results in very diverse areas such as Gaussian process regression, Bayesian neural networks, classification for small tabular data sets, and few-shot image classification, demonstrating the generality of PFNs. Code and trained PFNs are released at https://github.com/automl/TransformersCanDoBayesianInference.
    The Limit of the Marginal Distribution Model in Consumer Choice. (arXiv:2208.06115v1 [stat.ML])
    Given data on choices made by consumers for different assortments, a key challenge is to develop parsimonious models that describe and predict consumer choice behavior. One such choice model is the marginal distribution model which requires only the specification of the marginal distributions of the random utilities of the alternatives to explain choice data. In this paper, we develop an exact characterisation of the set of choice probabilities which are representable by the marginal distribution model consistently across any collection of assortments. Allowing for the possibility of alternatives to be grouped based on the marginal distribution of their utilities, we show (a) verifying consistency of choice probability data with this model is possible in polynomial time and (b) finding the closest fit reduces to solving a mixed integer convex program. Our results show that the marginal distribution model provides much better representational power as compared to multinomial logit and much better computational performance as compared to the random utility model.
    Unifying Gradients to Improve Real-world Robustness for Deep Networks. (arXiv:2208.06228v1 [stat.ML])
    The wide application of deep neural networks (DNNs) demands an increasing amount of attention to their real-world robustness, i.e., whether a DNN resists black-box adversarial attacks, among them score-based query attacks (SQAs) are the most threatening ones because of their practicalities and effectiveness: the attackers only need dozens of queries on model outputs to seriously hurt a victim network. Defending against SQAs requires a slight but artful variation of outputs due to the service purpose for users, who share the same output information with attackers. In this paper, we propose a real-world defense, called Unifying Gradients (UniG), to unify gradients of different data so that attackers could only probe a much weaker attack direction that is similar for different samples. Since such universal attack perturbations have been validated as less aggressive than the input-specific perturbations, UniG protects real-world DNNs by indicating attackers a twisted and less informative attack direction. To enhance UniG's practical significance in real-world applications, we implement it as a Hadamard product module that is computationally-efficient and readily plugged into any model. According to extensive experiments on 5 SQAs and 4 defense baselines, UniG significantly improves real-world robustness without hurting clean accuracy on CIFAR10 and ImageNet. For instance, UniG maintains a CIFAR-10 model of 77.80% accuracy under 2500-query Square attack while the state-of-the-art adversarially-trained model only has 67.34% on CIFAR10. Simultaneously, UniG greatly surpasses all compared baselines in clean accuracy and the modification degree of outputs. The code would be released.
    Bayesian Inference with Latent Hamiltonian Neural Networks. (arXiv:2208.06120v1 [cs.LG])
    When sampling for Bayesian inference, one popular approach is to use Hamiltonian Monte Carlo (HMC) and specifically the No-U-Turn Sampler (NUTS) which automatically decides the end time of the Hamiltonian trajectory. However, HMC and NUTS can require numerous numerical gradients of the target density, and can prove slow in practice. We propose Hamiltonian neural networks (HNNs) with HMC and NUTS for solving Bayesian inference problems. Once trained, HNNs do not require numerical gradients of the target density during sampling. Moreover, they satisfy important properties such as perfect time reversibility and Hamiltonian conservation, making them well-suited for use within HMC and NUTS because stationarity can be shown. We also propose an HNN extension called latent HNNs (L-HNNs), which are capable of predicting latent variable outputs. Compared to HNNs, L-HNNs offer improved expressivity and reduced integration errors. Finally, we employ L-HNNs in NUTS with an online error monitoring scheme to prevent sample degeneracy in regions of low probability density. We demonstrate L-HNNs in NUTS with online error monitoring on several examples involving complex, heavy-tailed, and high-local-curvature probability densities. Overall, L-HNNs in NUTS with online error monitoring satisfactorily inferred these probability densities. Compared to traditional NUTS, L-HNNs in NUTS with online error monitoring required 1--2 orders of magnitude fewer numerical gradients of the target density and improved the effective sample size (ESS) per gradient by an order of magnitude.
    Unifying local and global model explanations by functional decomposition of low dimensional structures. (arXiv:2208.06151v1 [cs.LG])
    We consider a global explanation of a regression or classification function by decomposing it into the sum of main components and interaction components of arbitrary order. When adding an identification constraint that is motivated by a causal interpretation, we find q-interaction SHAP to be the unique solution to that constraint. Here, q denotes the highest order of interaction present in the decomposition. Our result provides a new perspective on SHAP values with various practical and theoretical implications: If SHAP values are decomposed into main and all interaction effects, they provide a global explanation with causal interpretation. In principle, the decomposition can be applied to any machine learning model. However, since the number of possible interactions grows exponentially with the number of features, exact calculation is only feasible for methods that fit low dimensional structures or ensembles of those. We provide an algorithm and efficient implementation for gradient boosted trees (xgboost and random planted forests that calculates this decomposition. Conducted experiments suggest that our method provides meaningful explanations and reveals interactions of higher orders. We also investigate further potential of our new insights by utilizing the global explanation for motivating a new measure of feature importance, and for reducing direct and indirect bias by post-hoc component removal.
    Data Banzhaf: A Data Valuation Framework with Maximal Robustness to Learning Stochasticity. (arXiv:2205.15466v4 [cs.LG] UPDATED)
    This paper studies the robustness of data valuation to noisy model performance scores. Particularly, we find that the inherent randomness of the widely used stochastic gradient descent can cause existing data value notions (e.g., the Shapley value and the Leave-one-out error) to produce inconsistent data value rankings across different runs. To address this challenge, we first pose a formal framework within which one can measure the robustness of a data value notion. We show that the Banzhaf value, a value notion originated from cooperative game theory literature, achieves the maximal robustness among all semivalues -- a class of value notions that satisfy crucial properties entailed by ML applications. We propose an algorithm to efficiently estimate the Banzhaf value based on the Maximum Sample Reuse (MSR) principle. We derive the lower bound sample complexity for Banzhaf value estimation, and we show that our MSR algorithm's sample complexity is close to the lower bound. Our evaluation demonstrates that the Banzhaf value outperforms the existing semivalue-based data value notions on several downstream ML tasks such as learning with weighted samples and noisy label detection. Overall, our study suggests that when the underlying ML algorithm is stochastic, the Banzhaf value is a promising alternative to the semivalue-based data value schemes given its computational advantage and ability to robustly differentiate data quality.
    Shape Proportions and Sphericity in n Dimensions. (arXiv:2208.06292v1 [cs.CV])
    Shape metrics for objects in high dimensions remain sparse. Those that do exist, such as hyper-volume, remain limited to objects that are better understood such as Platonic solids and $n$-Cubes. Further, understanding objects of ill-defined shapes in higher dimensions is ambiguous at best. Past work does not provide a single number to give a qualitative understanding of an object. For example, the eigenvalues from principal component analysis results in $n$ metrics to describe the shape of an object. Therefore, we need a single number which can discriminate objects with different shape from one another. Previous work has developed shape metrics for specific dimensions such as two or three dimensions. However, there is an opportunity to develop metrics for any desired dimension. To that end, we present two new shape metrics for objects in a given number of dimensions: hyper-Sphericity and hyper-Shape Proportion (SP). We explore the proprieties of these metrics on a number of different shapes including $n$-balls. We then connect these metrics to applications of analyzing the shape of multidimensional data such as the popular Iris dataset.
    Gaussian process surrogate models for neural networks. (arXiv:2208.06028v1 [cs.LG])
    The lack of insight into deep learning systems hinders their systematic design. In science and engineering, modeling is a methodology used to understand complex systems whose internal processes are opaque. Modeling replaces a complex system with a simpler surrogate that is more amenable to interpretation. Drawing inspiration from this, we construct a class of surrogate models for neural networks using Gaussian processes. Rather than deriving the kernels for certain limiting cases of neural networks, we learn the kernels of the Gaussian process empirically from the naturalistic behavior of neural networks. We first evaluate our approach with two case studies inspired by previous theoretical studies of neural network behavior in which we capture neural network preferences for learning low frequencies and identify pathological behavior in deep neural networks. In two further practical case studies, we use the learned kernel to predict the generalization properties of neural networks.

  • Open

    [D] Which out of the popular existing models would be the best for predictions of Graph Edit Distance on molecule graphs?
    (GNN/GeometricML question) I’m training a SageGNN to learn molecular representations, fitted with an MLP to then predict the GED between pairs of molecules. However, the nature of the Sage sample neighbourhood makes me worry it’s not quite efficient enough for molecular learning, as the majority of my graphs are similar when it comes to node attributes, only differing in a couple of nodes for each graph. My graphs are also quite small. The nodes only have three possible features (element, charge, HydrogenNumber) and the graphs are all made up of the same elements so idek if I should bother to include element as an attribute or not :/ But yeah. I wanted to ask, which GNNs are best for molecular representation learning of relatively small graphs without many different node features per graph? (Sorry if this question is way too specific lol but hoping someone out there can guide me!) submitted by /u/BanMutsang [link] [comments]  ( 102 min )
    [P] Any good resources which can help me with Multivariate Time Series Forecasting using Probabilistic Machine Learning?
    I'm looking for resources which focus on implementation more than the theory. submitted by /u/HariVamshi [link] [comments]  ( 87 min )
    [D] This post from Andrej Karpathy is nearly 10 years old. How has the state of AI + computer vision progressed since 2012 and do any of you share his feelings of sadness about its current outlook?
    submitted by /u/andrewgarrison [link] [comments]  ( 88 min )
    [R] [P] Paper "Retrieval-Augmented Diffusion Models" + a web app at website Replicate.com. The web app is a text-to-image system that generates 768x768 pixel images from a text description. Example: "a photo of an astronaut riding a horse on the planet Mars". Details are in a comment.
    submitted by /u/Wiskkey [link] [comments]  ( 88 min )
    [D] Differentiable approximations to common probability distributions for count and ordered categorical variables.
    I'm looking for literature on differentiable approximations to certain probability distributions - a categorical distribution has the gumbel-softmax for example - but what about differentiable approximations for the poisson or negative binomial or other distributions for count variables? Or for a categorical variable with an ordering? I haven't found any literature on differentiable approximations on these. Does anyone know of any papers collecting such approximations? Neither tensorflow probability or torch distributions seems to have anything. submitted by /u/WigglyHypersurface [link] [comments]  ( 91 min )
    [D]What are some "important" problems in machine learning/AI?
    I am not talking about "hot stuff" like self driving cars or anything, but topics important to the field( like maybe interpretability of machine learning? ) which is fundamental to the advancement of the field. submitted by /u/Netero1999 [link] [comments]  ( 94 min )
    [D] Did Schmiduber invent diffusion models?
    On this website, Schmiduber describes curiosity-driven, creative agents that try to predict the next input of a sequence of a data generating process while learning to ignore uninteresting details of white noise. I have heard now from a few people that they believe Schmidhuber invented diffusion models. Although diffusion originates in parts from physics (stochastic processes and PDEs), his work seems to introduce the concept of learning (in the form of a learned predictor) to this domain. Most of the papers referenced on this page are from the early 90s. Therefore, it seems to me that he did introduce diffusion models (or at least made significant contributions to it). What is the consensus or divergent opinions on this topic? submitted by /u/yusuf-bengio [link] [comments]  ( 88 min )
    [D] Simple Questions Thread
    Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead! Thread will stay alive until next one so keep posting after the date in the title. Thanks to everyone for answering questions in the previous thread! submitted by /u/AutoModerator [link] [comments]  ( 88 min )
    [R] Highlights for every KDD-2022 paper
    Here is the list of all >400 KDD 2022 (ACM SIGKDD Conference on Knowledge Discovery and Data Mining) papers, and a short highlight for each of them. More than 120 of them also have their code published. KDD 2022 will be held at Washington DC from 08/14/2022. https://www.paperdigest.org/2022/08/kdd-2022-highlights/ submitted by /u/biandangou [link] [comments]  ( 88 min )
  • Open

    [P]OneFlow v0.8.0 Came Out!
    submitted by /u/Just0by [link] [comments]  ( 90 min )
    NNAISENSE Open-Sources ‘EvoTorch’: An Evolutionary Algorithm Library for the Machine Learning Community
    An Advanced Evolutionary library or algorithm has been a dream of scientists and AI/ML enthusiasts since the concept was introduced. This vision has come true thanks to the scientists at NNAISENSE, a Switzerland-based AI Enterprise. They created an open-source platform called EvoTorch. When operated in combination with Machine Learning, it can solve complex operational problems in a fraction of time, with lower costs, and at a larger scale. Evolutionary algorithms act as a step toward solving cascading problems that occur when the problem’s size and complexity increase. Evolutionary algorithms make the situations easier to handle the complexity without adding to the cost, they are also much easier to connect through GPUs and CPUs parallelly to ease up the calculation time and the complexity associated with it, that the only limit to your computational power becomes your budget. The evolutionary algorithms are built in the open framework EvoTorch. Continue reading| Github| Tool submitted by /u/ai-lover [link] [comments]  ( 87 min )
    [P]OneFlow v0.8.0 Came Out!
    submitted by /u/Just0by [link] [comments]  ( 90 min )
    Which is currently the best publicly available Text-to-Image service on the internet?
    Can be free or paid. But the quality should be good enough and it must be available for everyone. No invitations, no waitlist etc.. It also must be available as a service - without the users having to install anything anywhere themselves. Thanks! PS: I know there are other threads and articles all over the internet about this topic. But I'm not interested in what was best months ago, but right now. submitted by /u/amanano [link] [comments]  ( 87 min )
    Can DALL·E Generate DOOM?
    submitted by /u/mutsuto [link] [comments]  ( 86 min )
    I made a music video using 700 images from Dall-E Mini
    submitted by /u/billybjork [link] [comments]  ( 86 min )
    I made an AI-powered basketball referee
    submitted by /u/_ayushp_ [link] [comments]  ( 86 min )
    Interpretable Natural Language Processing Workshop in 5 days!
    submitted by /u/akolonin [link] [comments]  ( 86 min )
    Can I Call Myself an Artist Now? (Midjourney)
    submitted by /u/kbf_ [link] [comments]  ( 89 min )
    Open-source rival for OpenAI's DALL-E runs on your graphics card
    submitted by /u/Zirius_Sadfaces [link] [comments]  ( 89 min )
  • Open

    YYZ and Morse code
    The song YYZ by Rush opens with a theme based on the rhythm of “YYZ” in Morse code: -.--  -.--  --.. YYZ is the designation for the Toronto Pearson International Airport, the main airport serving Toronto. The idea for the song came from hearing the airport identifier in Morse code. However, the song puts no […] YYZ and Morse code first appeared on John D. Cook.  ( 5 min )
  • Open

    learn neural networks in a week.
    I need to learn neural networks and machine learning from complete scratch, with no math and a little proggraming knowledge. Can anyone recommend any recourses? submitted by /u/subtodempen [link] [comments]  ( 87 min )
  • Open

    Programming Deep Q Learning in Python - Part 2 in DQN series!
    submitted by /u/Si1veRonReddit [link] [comments]  ( 98 min )

  • Open

    Using Depthwise Separable Convolutions in Tensorflow
    Looking at all of the very large convolutional neural networks such as ResNets, VGGs, and the like, it begs the question on how we can make all of these networks smaller with less parameters while still maintaining the same level of accuracy or even improving generalization of the model using a smaller amount of parameters. […] The post Using Depthwise Separable Convolutions in Tensorflow appeared first on Machine Learning Mastery.  ( 21 min )

  • Open

    Mastering MLOps: Live Model Deployment & Inference Course with Stefan Krawczyk
    Sponsored Post AI & Machine Learning now power most product experiences even beyond those of the big technology companies. Today, your models must perform and function correctly to ultimately deliver business value. The cost of deploying a slow or bad model, or not detecting undesirable behavior quickly, could significantly impact customer experience and the business’ […] The post Mastering MLOps: Live Model Deployment & Inference Course with Stefan Krawczyk appeared first on Machine Learning Mastery.  ( 10 min )

  • Open

    Tepper Wants to Nerd Out On Data With You
    Sponsored Post There are many practical reasons why you should choose an online Masters in Business Analytics from the Tepper School of Business at Carnegie Mellon University. We can list facts like: our alumni average $103,000 in starting salary and 84% of our grads secured a promotion or new position within three months of graduation. […] The post Tepper Wants to Nerd Out On Data With You appeared first on Machine Learning Mastery.  ( 10 min )
2022-09-13T01:14:11.279Z osmosfeed 1.15.1